API

Reading data in FASTGenomics

fgread.ds_info(ds: Optional[str] = None, pretty: bool = None, output: bool = None, data_dir: pathlib.Path = PosixPath('/fastgenomics/data')) → pandas.core.frame.DataFrame[source]

Get information on all available datasets in this analysis.

Parameters:
  • ds (Optional[str], optional) – A single dataset ID or dataset title. If set, only this dataset will be displayed. Recommended to use with pretty, by default None
  • pretty (bool, optional) – Whether to display some nicely formatted output, by default True
  • output (bool, optional) – Whether to return a DataFrame or not, by default True
  • data_dir (Path, optional) – Directory containing the datasets, e.g. fastgenomics/data, by default DATA_DIR
Returns:

A pandas DataFrame containing all, or a single dataset (depends on ds)

Return type:

pd.DataFrame

fgread.load_data(ds: Optional[str] = None, data_dir: pathlib.Path = PosixPath('/fastgenomics/data'), additional_readers: dict = {}, expression_file: Optional[str] = None, as_format: Optional[str] = None)[source]

This function loads a single dataset into an AnnData object. If there are multiple datasets available you need to specify one by setting ds to a dataset id or dataset title. To get an overview of availabe dataset use ds_info()

Parameters:
  • ds (str, optional) – A single dataset ID or dataset title to select a dataset to be loaded. If only one dataset is available you do not need to set this parameter, by default None
  • data_dir (Path, optional) – Directory containing the datasets, e.g. fastgenomics/data, by default DATA_DIR
  • additional_readers (dict, optional) – Used to specify your own readers for the specific data set format. Dict key needs to be file extension (e.g., h5ad), dict value a function. Still experimental, by default {}
  • expression_file (str, Optional) – The name of the expression file to load. Only needed when there are multiple expression files in a dataset.
  • as_format (str, optional) – Specifies which reader should be uses for this dataset. Overwrites the auto-detection of the format. Possible parameters are the file extensions of our supported data formats: h5ad, h5, hdf5, loom, rds, csv, tsv.
Returns:

A single AnnData object with dataset id in obs and all dataset metadata in uns

Return type:

AnnData Object

Examples

To use a custom reader for files with the extension “.fg”, you have to define a function first:

>>> def my_loader(file):
...     anndata = magic_file_loading(file)
...     return anndata

You can then use this reader like this:

>>> fgread.load_data("my_dataset", additional_readers={"fg": my_loader})

Readers for supported formats

fgread.readers.read_10xhdf5_to_anndata(ds_file: pathlib.Path)[source]

Reads a dataset in the 10x hdf5 format into the AnnData format.

fgread.readers.read_10xmtx_to_anndata(ds_file: pathlib.Path)[source]

Reads a dataset in the 10x mtx format into the AnnData format.

fgread.readers.read_anndata_to_anndata(ds_file: pathlib.Path)[source]

Reads a dataset in the AnnData format into the AnnData format.

fgread.readers.read_densecsv_to_anndata(ds_file: pathlib.Path)[source]

Reads a dense text file in csv format into the AnnData format.

fgread.readers.read_densemat_to_anndata(ds_file: pathlib.Path, sep=None)[source]

Helper function to read dense text files in tsv and csv format. The separator (tab or comma) is passed by the corresponding function.

fgread.readers.read_densetsv_to_anndata(ds_file: pathlib.Path)[source]

Reads a dense text file in tsv format into the AnnData format.

fgread.readers.read_loom_to_anndata(ds_file: pathlib.Path)[source]

Reads a dataset in the loom format into the AnnData format.

fgread.readers.read_seurat_to_anndata(ds_file: pathlib.Path)[source]

Reads a dataset in the Seurat format into the AnnData format (not implemented).