User talk:Darek Kedra/sandbox 26
GFF Comparison
GFF, in particular GFF3, is a fairly common standard to store information in text files. For description see: http://www.sequenceontology.org/gff3.shtml
In the process of genome annotation using multiple tools there is a need of comparing the output of i.e. gene prediction programs, ESTs/protein mapping. Given two GFF files (A and B) with gene models, one can compare them on various levels, such as:
- nucleotide level:
how many nucleotides annotated as features, i.e. nucleotides in exons are in both sets
- splice junction level
- exon level
how many exact exons on the same strand do overlap
- gene level
how many genes are identical
For more information read Evan Keibler's (autor of eval) master thesis: http://mblab.wustl.edu/software/download/eval-documentation.pdf
CAVEAT: tools listed below are often fairly simple. Some do not take into account "type" (#3 column), therefore one can compare exons from one file with a combined set of genes, exons and introns from another. Some programs smuggle extra information about primary/last exons into type" field, so all exons from one file will be compared with not all exons from the other. Always check if GFF data is compatible.
Perl scripts collection
link: http://biowiki.org/GffTools/
Tested: gffsort.pl (sorts GFF streams by sequence name and startpoint)
Python efforts
- Brad Chapman's GFF parser:
https://github.com/chapmanb/bcbb/tree/master/gff
- GFFutils by Ryan Dale:
https://github.com/daler/GFFutils
- Pygr
main link: http://code.google.com/p/pygr/
discussion about gff/annotation parsing: http://www.mail-archive.com/pygr-dev@googlegroups.com/msg01551.html
- bpbio
http://code.google.com/p/bpbio/
- bx-python
http://bitbucket.org/james_taylor/bx-python/overview
Ruby
- BioRuby library:
http://www.bioruby.org/rdoc/classes/Bio/GFF/GFF3.html
Java
Biojava module: http://www.biojava.org/docs/api/org/biojava/bio/program/gff/GFFTools.html
Stand alone programs
- Eval
link: http://mblab.wustl.edu/software/eval/ version: 2.2.8
Perl program with GUI.
- GPFE
GFPE: gene-finding program evaluation Bioinformatics (2003) 19 (13): 1712-1713. doi: 10.1093/bioinformatics/btg216
link: ftp://anonymous@iubio.bio.indiana.edu/molbio/genefind/ Program in java.
- overlap
link: http://big.crg.cat/services/overlap author: Sarah Djebali