API¶
Reading data in FASTGenomics¶
-
fgread.
ds_info
(ds: Optional[str] = None, pretty: bool = None, output: bool = None, data_dir: pathlib.Path = PosixPath('/fastgenomics/data')) → pandas.core.frame.DataFrame[source]¶ Get information on all available datasets in this analysis.
Parameters: - ds (Optional[str], optional) – A single dataset ID or dataset title. If set, only this dataset will be displayed. Recommended to use with
pretty
, by default None - pretty (bool, optional) – Whether to display some nicely formatted output, by default True
- output (bool, optional) – Whether to return a DataFrame or not, by default True
- data_dir (Path, optional) – Directory containing the datasets, e.g.
fastgenomics/data
, by default DATA_DIR
Returns: A pandas DataFrame containing all, or a single dataset (depends on
ds
)Return type: pd.DataFrame
- ds (Optional[str], optional) – A single dataset ID or dataset title. If set, only this dataset will be displayed. Recommended to use with
-
fgread.
load_data
(ds: Optional[str] = None, data_dir: pathlib.Path = PosixPath('/fastgenomics/data'), additional_readers: dict = {}, expression_file: Optional[str] = None, as_format: Optional[str] = None)[source]¶ This function loads a single dataset into an AnnData object. If there are multiple datasets available you need to specify one by setting
ds
to a dataset id or dataset title. To get an overview of availabe dataset useds_info()
Parameters: - ds (str, optional) – A single dataset ID or dataset title to select a dataset to be loaded. If only one dataset is available you do not need to set this parameter, by default None
- data_dir (Path, optional) – Directory containing the datasets, e.g.
fastgenomics/data
, by default DATA_DIR - additional_readers (dict, optional) – Used to specify your own readers for the specific data set format. Dict key needs to be file extension (e.g., h5ad), dict value a function. Still experimental, by default {}
- expression_file (str, Optional) – The name of the expression file to load. Only needed when there are multiple expression files in a dataset.
- as_format (str, optional) – Specifies which reader should be uses for this dataset. Overwrites the auto-detection
of the format. Possible parameters are the file extensions of our supported data
formats:
h5ad
,h5
,hdf5
,loom
,rds
,csv
,tsv
.
Returns: A single AnnData object with dataset id in obs and all dataset metadata in uns
Return type: AnnData Object
Examples
To use a custom reader for files with the extension “.fg”, you have to define a function first:
>>> def my_loader(file): ... anndata = magic_file_loading(file) ... return anndata
You can then use this reader like this:
>>> fgread.load_data("my_dataset", additional_readers={"fg": my_loader})
Readers for supported formats¶
-
fgread.readers.
read_10xhdf5_to_anndata
(ds_file: pathlib.Path)[source]¶ Reads a dataset in the 10x hdf5 format into the AnnData format.
-
fgread.readers.
read_10xmtx_to_anndata
(ds_file: pathlib.Path)[source]¶ Reads a dataset in the 10x mtx format into the AnnData format.
-
fgread.readers.
read_anndata_to_anndata
(ds_file: pathlib.Path)[source]¶ Reads a dataset in the AnnData format into the AnnData format.
-
fgread.readers.
read_densecsv_to_anndata
(ds_file: pathlib.Path)[source]¶ Reads a dense text file in csv format into the AnnData format.
-
fgread.readers.
read_densemat_to_anndata
(ds_file: pathlib.Path, sep=None)[source]¶ Helper function to read dense text files in tsv and csv format. The separator (tab or comma) is passed by the corresponding function.
-
fgread.readers.
read_densetsv_to_anndata
(ds_file: pathlib.Path)[source]¶ Reads a dense text file in tsv format into the AnnData format.