GFF entity hierarchy¶
Based on analysis of a complete genomic.gff file (GCF_000001215.4), here is the hierarchical schema representing Parent= relationships between different genomic entities:
Hierarchical Structure¶
1. Protein-Coding Genes¶
gene (root)
├── mRNA (Parent=gene)
├── exon (Parent=mRNA)
└── CDS (Parent=mRNA)
Example:
gene-Dmel_CG17636(gene)rna-NM_001103384.3(mRNA, Parent=gene-Dmel_CG17636)exon-NM_001103384.3-1(exon, Parent=rna-NM_001103384.3)exon-NM_001103384.3-2(exon, Parent=rna-NM_001103384.3)cds-NP_001096854.1(CDS, Parent=rna-NM_001103384.3)
2. Long Non-Coding RNA (lncRNA) Genes¶
gene (root)
├── lnc_RNA (Parent=gene)
└── exon (Parent=lnc_RNA)
Example:
gene-Dmel_CR40469(gene)rna-NR_003723.2(lnc_RNA, Parent=gene-Dmel_CR40469)exon-NR_003723.2-1(exon, Parent=rna-NR_003723.2)
3. MicroRNA (miRNA) Genes¶
gene (root)
├── primary_transcript (Parent=gene)
├── exon (Parent=primary_transcript)
└── miRNA (Parent=primary_transcript)
└── exon (Parent=miRNA)
Example:
gene-Dmel_CR43552(gene, gene_biotype=miRNA)rna-NR_047712.1(primary_transcript, Parent=gene-Dmel_CR43552)exon-NR_047712.1-1(exon, Parent=rna-NR_047712.1)rna-gnl|FlyBase|CR43552-RA(miRNA, Parent=rna-NR_047712.1)exon-gnl|FlyBase|CR43552-RA-1(exon, Parent=rna-gnl|FlyBase|CR43552-RA)
4. Antisense RNA Genes¶
gene (root)
├── antisense_RNA (Parent=gene)
└── exon (Parent=antisense_RNA)
5. Small Nucleolar RNA (snoRNA) Genes¶
gene (root)
├── snoRNA (Parent=gene)
└── exon (Parent=snoRNA)
Example:
gene-Dmel_CR34590(gene, gene_biotype=snoRNA)rna-NR_003724.1(snoRNA, Parent=gene-Dmel_CR34590)exon-NR_003724.1-1(exon, Parent=rna-NR_003724.1)
6. Transfer RNA (tRNA) Genes¶
gene (root)
├── tRNA (Parent=gene)
└── exon (Parent=tRNA)
Example:
gene-Dmel_CR32493(gene, gene_biotype=tRNA)rna-Dmel_CR32493(tRNA, Parent=gene-Dmel_CR32493)exon-Dmel_CR32493-1(exon, Parent=rna-Dmel_CR32493)
7. Ribosomal RNA (rRNA) Genes¶
gene (root)
├── rRNA (Parent=gene)
└── exon (Parent=rRNA)
Example:
gene-Dmel_CR45853(gene, gene_biotype=rRNA_pseudogene)rna-gnl|FlyBase|CR45853-RA(rRNA, Parent=gene-Dmel_CR45853)exon-gnl|FlyBase|CR45853-RA-1(exon, Parent=rna-gnl|FlyBase|CR45853-RA)
8. Small Nuclear RNA (snRNA) Genes¶
gene (root)
├── snRNA (Parent=gene)
└── exon (Parent=snRNA)
Example:
gene-Dmel_CR32914(gene, gene_biotype=snRNA)rna-NR_002129.1(snRNA, Parent=gene-Dmel_CR32914)exon-NR_002129.1-1(exon, Parent=rna-NR_002129.1)
9. Signal Recognition Particle RNA (SRP_RNA) Genes¶
gene (root)
├── SRP_RNA (Parent=gene)
└── exon (Parent=SRP_RNA)
Example:
gene-Dmel_CR42652(gene, gene_biotype=SRP_RNA)rna-NR_037753.2(SRP_RNA, Parent=gene-Dmel_CR42652)exon-NR_037753.2-1(exon, Parent=rna-NR_037753.2)
10. RNase P RNA Genes¶
gene (root)
├── RNase_P_RNA (Parent=gene)
└── exon (Parent=RNase_P_RNA)
Example:
gene-Dmel_CR32868(gene, gene_biotype=RNase_P_RNA)rna-NR_002092.1(RNase_P_RNA, Parent=gene-Dmel_CR32868)exon-NR_002092.1-1(exon, Parent=rna-NR_002092.1)
11. RNase MRP RNA Genes¶
gene (root)
├── RNase_MRP_RNA (Parent=gene)
└── exon (Parent=RNase_MRP_RNA)
Example:
gene-Dmel_CR33682(gene, gene_biotype=RNase_MRP_RNA)rna-NR_002501.2(RNase_MRP_RNA, Parent=gene-Dmel_CR33682)exon-NR_002501.2-1(exon, Parent=rna-NR_002501.2)
12. Pseudogenes¶
gene (root, pseudo=true)
└── exon (Parent=gene)
Example:
gene-Dmel_CR18275(pseudogene, pseudo=true)id-Dmel_CR18275-1(exon, Parent=gene-Dmel_CR18275)id-Dmel_CR18275-2(exon, Parent=gene-Dmel_CR18275)
Entity Types and Their Relationships¶
| Entity Type | Can be Parent of | Can be Child of | Notes |
|---|---|---|---|
gene |
mRNA, lnc_RNA, primary_transcript, antisense_RNA, snoRNA, tRNA, rRNA, snRNA, SRP_RNA, RNase_P_RNA, RNase_MRP_RNA, exon* | None (root level) | Top-level genomic feature |
mRNA |
exon, CDS | gene | Protein-coding transcript |
lnc_RNA |
exon | gene | Long non-coding RNA transcript |
primary_transcript |
exon, miRNA | gene | Precursor RNA (e.g., pre-miRNA) |
miRNA |
exon | primary_transcript | Mature microRNA |
antisense_RNA |
exon | gene | Antisense RNA transcript |
snoRNA |
exon | gene | Small nucleolar RNA transcript |
tRNA |
exon | gene | Transfer RNA transcript |
rRNA |
exon | gene | Ribosomal RNA transcript |
snRNA |
exon | gene | Small nuclear RNA transcript |
SRP_RNA |
exon | gene | Signal recognition particle RNA |
RNase_P_RNA |
exon | gene | RNase P RNA transcript |
RNase_MRP_RNA |
exon | gene | RNase MRP RNA transcript |
pseudogene |
exon | None (root level) | Non-functional gene copy |
exon |
None (leaf level) | mRNA, lnc_RNA, primary_transcript, miRNA, antisense_RNA, snoRNA, tRNA, rRNA, snRNA, SRP_RNA, RNase_P_RNA, RNase_MRP_RNA, gene* | Exonic regions |
CDS |
None (leaf level) | mRNA | Protein-coding sequences |
*Note: Pseudogenes have exons as direct children without an intermediate transcript level.
Key Observations¶
- Gene as Root: All hierarchies start with a
genefeature that has no Parent attribute - Alternative Splicing: Multiple mRNA isoforms can share the same parent gene (e.g., transcript variants A, B, C)
- Shared CDS IDs: Multiple CDS features can share the same ID but have different Parent mRNA transcripts
- miRNA Processing: miRNAs follow a gene → primary_transcript → miRNA → exon hierarchy
- Pseudogene Structure: Pseudogenes have a simplified structure with exons directly attached to the gene (no intermediate transcript)
- snoRNA Structure: Small nucleolar RNAs follow the same pattern as lncRNAs: gene → snoRNA → exon
- tRNA Structure: Transfer RNAs follow the same pattern: gene → tRNA → exon
- rRNA Structure: Ribosomal RNAs follow the same pattern: gene → rRNA → exon (includes pseudogenes)
- snRNA Structure: Small nuclear RNAs follow the same pattern: gene → snRNA → exon
- Specialized RNA Types: SRP_RNA, RNase_P_RNA, and RNase_MRP_RNA follow the same pattern: gene → RNA_type → exon
- RNA Pseudogenes: Some RNA genes (tRNA, rRNA) can be pseudogenes with
gene_biotype=tRNA_pseudogeneorrRNA_pseudogene - Exon Numbering: Exons are numbered sequentially within each transcript (e.g., exon-1, exon-2)
- Pseudogene Markers: Pseudogenes are marked with
pseudo=trueattribute and appropriategene_biotype - Sequence Features: Non-coding regions like A+T control regions are marked as
sequence_feature(no parent relationships)
ID Naming Conventions¶
- Genes:
gene-{locus_tag}(e.g.,gene-Dmel_CG17636) - mRNAs:
rna-{accession}(e.g.,rna-NM_001103384.3) - Exons:
exon-{transcript_id}-{number}(e.g.,exon-NM_001103384.3-1) - CDS:
cds-{protein_id}(e.g.,cds-NP_001096854.1) - miRNAs:
rna-gnl|FlyBase|{transcript_id}(e.g.,rna-gnl|FlyBase|CR43552-RA) - snoRNAs:
rna-{accession}(e.g.,rna-NR_003724.1) - tRNAs:
rna-{locus_tag}(e.g.,rna-Dmel_CR32493) - rRNAs:
rna-gnl|FlyBase|{transcript_id}(e.g.,rna-gnl|FlyBase|CR45853-RA) - snRNAs:
rna-{accession}(e.g.,rna-NR_002129.1) - SRP_RNAs:
rna-{accession}(e.g.,rna-NR_037753.2) - RNase_P_RNAs:
rna-{accession}(e.g.,rna-NR_002092.1) - RNase_MRP_RNAs:
rna-{accession}(e.g.,rna-NR_002501.2) - Pseudogene exons:
id-{locus_tag}-{number}(e.g.,id-Dmel_CR18275-1) - tRNA exons:
exon-{locus_tag}-{number}(e.g.,exon-Dmel_CR32493-1)
This schema represents the hierarchical organization of genomic features in the Drosophila melanogaster genome annotation.