Wikiomics:Repeat finding
From OpenWetWare
To simplify, this page assumes eucakariotic genomic DNA repeat finding.
Repeat finding can be divided into two tasks, depending on availability of repeat library:
A) Library exists for a given (or possibly closely related species)
or
B) you construct such library de novo.
Task A is usually a prerequisite step for genome annotation and even blast searches. For newly sequences genomes one should start with B (constructing species specific repeat library).
Detecting known repeats
De novo repeat library construction
For review see: Saha et al. Empirical comparison of ab initio repeat finding programs (2008)
RepeatScout
command line only, requires compilation
Site: http://bix.ucsd.edu/repeatscout/
current version (2010-03): 1.05
Documentation:
- http://bix.ucsd.edu/repeatscout/readme.1.0.5.txt
- PPT presentation presenting algorithm: http://bix.ucsd.edu/repeatscout/repeatscout-ismb.ppt
- publication (PDF)De novo identification of repeat families in large genomes 2005
Simplest run:
build_lmer_table -sequence input_sequence.fas -freq output_lmer.frequency RepeatScout -sequence input_sequence.fas -output output_repeats -freq output_lmer.frequency