danrerlib.KEGG

This module provides functions for retrieving gene information associated with KEGG pathways and diseases for various organisms, including human, zebrafish, and mapped zebrafish.

Functions:
  • get_genes_in_pathway: Retrieve genes associated with a specific KEGG pathway.

  • get_genes_in_disease: Retrieve genes associated with a disease for a specified organism.

Constants:
  • NCBI_ID: Identifier for NCBI Gene ID.

  • ZFIN_ID: Identifier for ZFIN ID.

  • ENS_ID: Identifier for Ensembl ID.

  • SYMBOL: Identifier for gene Symbol.

  • HUMAN_ID: Identifier for Human NCBI Gene ID.

Database Rebuild Functions:
  • build_kegg_database: Build or update the KEGG pathway and disease database.

Notes:
  • This module is designed for accessing gene information from KEGG pathways and diseases.

  • It provides functions to retrieve gene data associated with specific pathways and diseases.

  • Data is obtained either through direct API calls or by reading pre-processed files.

  • Organism options include ‘dre’ (Danio rerio), ‘hsa’ (Human), or ‘dreM’ (Mapped Danio rerio from Human).

Example:

To retrieve genes associated with a specific KEGG pathway: ` pathway_genes = get_genes_in_pathway('hsa04010', 'hsa') `

For detailed information on each function and their usage, please refer to the documentation. For more examples of full functionality, please refer to tutorials.

Module Contents

Functions

get_genes_in_pathway(pathway_id[, org, do_check])

Retrieve genes associated with a specific pathway from KEGG.

get_genes_in_disease(→ pandas.DataFrame)

Retrieve genes associated with a disease for a specified organism.

build_kegg_database()

Build or update the KEGG pathway and disease database.

Attributes

kegg_background_gene_set

danrerlib.KEGG.kegg_background_gene_set
danrerlib.KEGG.get_genes_in_pathway(pathway_id: str, org=None, do_check=True)

Retrieve genes associated with a specific pathway from KEGG.

Parameters:
  • pathway_id (str): The KEGG pathway ID for the pathway of interest.

  • org (str, optional): The organism for which to retrieve pathway information. Options include

    • ‘dre’: Danio rerio (Zebrafish) pathways from KEGG.

    • ‘hsa’: Human pathways from KEGG.

    • ‘dreM’: Mapped Danio rerio pathways from human.

    • (Default is None, which will use the provided ‘org’ or ‘hsa’ if ‘org’ is None.)

Returns:
  • df (pd.DataFrame): A pandas DataFrame containing genes associated with the specified pathway.

Notes:
  • This function retrieves gene information associated with a specific KEGG pathway.

  • The pathway_id parameter should be a valid KEGG pathway identifier.

  • The org parameter specifies the organism for which to retrieve pathway information.

  • Organism options include ‘dre’ (Danio rerio), ‘hsa’ (Human), or ‘dreM’ (Mapped Danio rerio from Human).

danrerlib.KEGG.get_genes_in_disease(disease_id: str, org: str, do_check=True, API: bool = False) pandas.DataFrame

Retrieve genes associated with a disease for a specified organism.

Parameters:
  • disease_id (str): The disease ID for which genes are to be retrieved.

  • org (str): The organism for which genes should be retrieved. Options include

    • ‘hsa’: Returns human genes associated with the disease.

    • ‘dre’ or ‘dreM’: Returns human genes mapped to Zebrafish genes (same result, as Zebrafish disease not characterized on KEGG).

  • API (bool, optional): Whether to use the KEGG REST API for retrieval. Default is False.

Returns:
  • genes (pd.DataFrame): A pandas DataFrame containing genes associated with the disease for the specified organism.

Notes:
  • If ‘API’ is set to True, the function fetches gene information using the KEGG REST API.

  • If ‘API’ is False, the function reads gene information from pre-processed files.

danrerlib.KEGG.build_kegg_database()

Build or update the KEGG pathway and disease database.

Notes:
  • This function is intended for database creation or update and should be run only during version updates.

  • The process may take some time, so exercise caution when running it.

  • The function performs the following steps:

    1. Downloads zebrafish pathway IDs and stores them in a specified file.

    2. Downloads human pathway IDs and stores them in a specified file.

    3. Creates mapped zebrafish pathway IDs from human pathways and stores them in a specified file.

    4. Downloads genes associated with true KEGG pathways for zebrafish.

    5. Downloads genes associated with true KEGG pathways for humans.

    6. Builds mapped zebrafish KEGG pathways from human pathways and stores them in a specified directory.

    7. Downloads KEGG disease IDs and stores them in a specified file.

  • Running this function should be done carefully, as it involves downloading and processing data from KEGG.