Skip to content

GFF entity hierarchy

Based on analysis of a complete genomic.gff file (GCF_000001215.4), here is the hierarchical schema representing Parent= relationships between different genomic entities:

Hierarchical Structure

1. Protein-Coding Genes

gene (root)
├── mRNA (Parent=gene)
    ├── exon (Parent=mRNA)
    └── CDS (Parent=mRNA)

Example:

  • gene-Dmel_CG17636 (gene)
  • rna-NM_001103384.3 (mRNA, Parent=gene-Dmel_CG17636)
    • exon-NM_001103384.3-1 (exon, Parent=rna-NM_001103384.3)
    • exon-NM_001103384.3-2 (exon, Parent=rna-NM_001103384.3)
    • cds-NP_001096854.1 (CDS, Parent=rna-NM_001103384.3)

2. Long Non-Coding RNA (lncRNA) Genes

gene (root)
├── lnc_RNA (Parent=gene)
    └── exon (Parent=lnc_RNA)

Example:

  • gene-Dmel_CR40469 (gene)
  • rna-NR_003723.2 (lnc_RNA, Parent=gene-Dmel_CR40469)
    • exon-NR_003723.2-1 (exon, Parent=rna-NR_003723.2)

3. MicroRNA (miRNA) Genes

gene (root)
├── primary_transcript (Parent=gene)
    ├── exon (Parent=primary_transcript)
    └── miRNA (Parent=primary_transcript)
        └── exon (Parent=miRNA)

Example:

  • gene-Dmel_CR43552 (gene, gene_biotype=miRNA)
  • rna-NR_047712.1 (primary_transcript, Parent=gene-Dmel_CR43552)
    • exon-NR_047712.1-1 (exon, Parent=rna-NR_047712.1)
    • rna-gnl|FlyBase|CR43552-RA (miRNA, Parent=rna-NR_047712.1)
    • exon-gnl|FlyBase|CR43552-RA-1 (exon, Parent=rna-gnl|FlyBase|CR43552-RA)

4. Antisense RNA Genes

gene (root)
├── antisense_RNA (Parent=gene)
    └── exon (Parent=antisense_RNA)

5. Small Nucleolar RNA (snoRNA) Genes

gene (root)
├── snoRNA (Parent=gene)
    └── exon (Parent=snoRNA)

Example:

  • gene-Dmel_CR34590 (gene, gene_biotype=snoRNA)
  • rna-NR_003724.1 (snoRNA, Parent=gene-Dmel_CR34590)
    • exon-NR_003724.1-1 (exon, Parent=rna-NR_003724.1)

6. Transfer RNA (tRNA) Genes

gene (root)
├── tRNA (Parent=gene)
    └── exon (Parent=tRNA)

Example:

  • gene-Dmel_CR32493 (gene, gene_biotype=tRNA)
  • rna-Dmel_CR32493 (tRNA, Parent=gene-Dmel_CR32493)
    • exon-Dmel_CR32493-1 (exon, Parent=rna-Dmel_CR32493)

7. Ribosomal RNA (rRNA) Genes

gene (root)
├── rRNA (Parent=gene)
    └── exon (Parent=rRNA)

Example:

  • gene-Dmel_CR45853 (gene, gene_biotype=rRNA_pseudogene)
  • rna-gnl|FlyBase|CR45853-RA (rRNA, Parent=gene-Dmel_CR45853)
    • exon-gnl|FlyBase|CR45853-RA-1 (exon, Parent=rna-gnl|FlyBase|CR45853-RA)

8. Small Nuclear RNA (snRNA) Genes

gene (root)
├── snRNA (Parent=gene)
    └── exon (Parent=snRNA)

Example:

  • gene-Dmel_CR32914 (gene, gene_biotype=snRNA)
  • rna-NR_002129.1 (snRNA, Parent=gene-Dmel_CR32914)
    • exon-NR_002129.1-1 (exon, Parent=rna-NR_002129.1)

9. Signal Recognition Particle RNA (SRP_RNA) Genes

gene (root)
├── SRP_RNA (Parent=gene)
    └── exon (Parent=SRP_RNA)

Example:

  • gene-Dmel_CR42652 (gene, gene_biotype=SRP_RNA)
  • rna-NR_037753.2 (SRP_RNA, Parent=gene-Dmel_CR42652)
    • exon-NR_037753.2-1 (exon, Parent=rna-NR_037753.2)

10. RNase P RNA Genes

gene (root)
├── RNase_P_RNA (Parent=gene)
    └── exon (Parent=RNase_P_RNA)

Example:

  • gene-Dmel_CR32868 (gene, gene_biotype=RNase_P_RNA)
  • rna-NR_002092.1 (RNase_P_RNA, Parent=gene-Dmel_CR32868)
    • exon-NR_002092.1-1 (exon, Parent=rna-NR_002092.1)

11. RNase MRP RNA Genes

gene (root)
├── RNase_MRP_RNA (Parent=gene)
    └── exon (Parent=RNase_MRP_RNA)

Example:

  • gene-Dmel_CR33682 (gene, gene_biotype=RNase_MRP_RNA)
  • rna-NR_002501.2 (RNase_MRP_RNA, Parent=gene-Dmel_CR33682)
    • exon-NR_002501.2-1 (exon, Parent=rna-NR_002501.2)

12. Pseudogenes

gene (root, pseudo=true)
└── exon (Parent=gene)

Example:

  • gene-Dmel_CR18275 (pseudogene, pseudo=true)
  • id-Dmel_CR18275-1 (exon, Parent=gene-Dmel_CR18275)
  • id-Dmel_CR18275-2 (exon, Parent=gene-Dmel_CR18275)

Entity Types and Their Relationships

Entity Type Can be Parent of Can be Child of Notes
gene mRNA, lnc_RNA, primary_transcript, antisense_RNA, snoRNA, tRNA, rRNA, snRNA, SRP_RNA, RNase_P_RNA, RNase_MRP_RNA, exon* None (root level) Top-level genomic feature
mRNA exon, CDS gene Protein-coding transcript
lnc_RNA exon gene Long non-coding RNA transcript
primary_transcript exon, miRNA gene Precursor RNA (e.g., pre-miRNA)
miRNA exon primary_transcript Mature microRNA
antisense_RNA exon gene Antisense RNA transcript
snoRNA exon gene Small nucleolar RNA transcript
tRNA exon gene Transfer RNA transcript
rRNA exon gene Ribosomal RNA transcript
snRNA exon gene Small nuclear RNA transcript
SRP_RNA exon gene Signal recognition particle RNA
RNase_P_RNA exon gene RNase P RNA transcript
RNase_MRP_RNA exon gene RNase MRP RNA transcript
pseudogene exon None (root level) Non-functional gene copy
exon None (leaf level) mRNA, lnc_RNA, primary_transcript, miRNA, antisense_RNA, snoRNA, tRNA, rRNA, snRNA, SRP_RNA, RNase_P_RNA, RNase_MRP_RNA, gene* Exonic regions
CDS None (leaf level) mRNA Protein-coding sequences

*Note: Pseudogenes have exons as direct children without an intermediate transcript level.

Key Observations

  1. Gene as Root: All hierarchies start with a gene feature that has no Parent attribute
  2. Alternative Splicing: Multiple mRNA isoforms can share the same parent gene (e.g., transcript variants A, B, C)
  3. Shared CDS IDs: Multiple CDS features can share the same ID but have different Parent mRNA transcripts
  4. miRNA Processing: miRNAs follow a gene → primary_transcript → miRNA → exon hierarchy
  5. Pseudogene Structure: Pseudogenes have a simplified structure with exons directly attached to the gene (no intermediate transcript)
  6. snoRNA Structure: Small nucleolar RNAs follow the same pattern as lncRNAs: gene → snoRNA → exon
  7. tRNA Structure: Transfer RNAs follow the same pattern: gene → tRNA → exon
  8. rRNA Structure: Ribosomal RNAs follow the same pattern: gene → rRNA → exon (includes pseudogenes)
  9. snRNA Structure: Small nuclear RNAs follow the same pattern: gene → snRNA → exon
  10. Specialized RNA Types: SRP_RNA, RNase_P_RNA, and RNase_MRP_RNA follow the same pattern: gene → RNA_type → exon
  11. RNA Pseudogenes: Some RNA genes (tRNA, rRNA) can be pseudogenes with gene_biotype=tRNA_pseudogene or rRNA_pseudogene
  12. Exon Numbering: Exons are numbered sequentially within each transcript (e.g., exon-1, exon-2)
  13. Pseudogene Markers: Pseudogenes are marked with pseudo=true attribute and appropriate gene_biotype
  14. Sequence Features: Non-coding regions like A+T control regions are marked as sequence_feature (no parent relationships)

ID Naming Conventions

  • Genes: gene-{locus_tag} (e.g., gene-Dmel_CG17636)
  • mRNAs: rna-{accession} (e.g., rna-NM_001103384.3)
  • Exons: exon-{transcript_id}-{number} (e.g., exon-NM_001103384.3-1)
  • CDS: cds-{protein_id} (e.g., cds-NP_001096854.1)
  • miRNAs: rna-gnl|FlyBase|{transcript_id} (e.g., rna-gnl|FlyBase|CR43552-RA)
  • snoRNAs: rna-{accession} (e.g., rna-NR_003724.1)
  • tRNAs: rna-{locus_tag} (e.g., rna-Dmel_CR32493)
  • rRNAs: rna-gnl|FlyBase|{transcript_id} (e.g., rna-gnl|FlyBase|CR45853-RA)
  • snRNAs: rna-{accession} (e.g., rna-NR_002129.1)
  • SRP_RNAs: rna-{accession} (e.g., rna-NR_037753.2)
  • RNase_P_RNAs: rna-{accession} (e.g., rna-NR_002092.1)
  • RNase_MRP_RNAs: rna-{accession} (e.g., rna-NR_002501.2)
  • Pseudogene exons: id-{locus_tag}-{number} (e.g., id-Dmel_CR18275-1)
  • tRNA exons: exon-{locus_tag}-{number} (e.g., exon-Dmel_CR32493-1)

This schema represents the hierarchical organization of genomic features in the Drosophila melanogaster genome annotation.