Wikiomics:Repeat finding: Difference between revisions
Darek Kedra (talk | contribs) |
Darek Kedra (talk | contribs) |
||
Line 14: | Line 14: | ||
=Detecting known repeats= | =Detecting known repeats= | ||
Most comonly used: Repeatmasker | |||
==RepeatMasker== | |||
* web site: http://www.repeatmasker.org/ | |||
* current version (checked on 2010-03.22): 3.2.8 | |||
* documentation: http://www.repeatmasker.org/webrepeatmaskerhelp.html | |||
* Online web server [http://www.repeatmasker.org/cgi-bin/WEBRepeatMasker] | |||
* command line | |||
You have to have a FastA file (it can be multiple FastA). Type: | |||
<pre> | |||
repmask your_sequence_in_fasta_format | |||
</pre> | |||
You will get a file: your_sequence_in_fasta_format.masked --- name tells all | |||
species options (choose only one): | |||
<pre> | |||
-m(us) masks rodent specific and mammalian wide repeats | |||
-rod(ent) same as -mus | |||
-mam(mal) masks repeats found in non-primate, non-rodent mammals | |||
-ar(abidopsis) masks repeats found in Arabidopsis | |||
-dr(osophila) masks repeats found in Drosophilas | |||
-el(egans) masks repeats found in C. elegans | |||
</pre> | |||
=De novo repeat library construction= | =De novo repeat library construction= |
Revision as of 07:48, 22 March 2010
To simplify, this page assumes eucakariotic genomic DNA repeat finding.
Repeat finding can be divided into two tasks, depending on availability of repeat library:
A) Library exists for a given (or possibly closely related species)
or
B) you construct such library de novo.
Task A is usually a prerequisite step for genome annotation and even blast searches. For newly sequences genomes one should start with B (constructing species specific repeat library).
Detecting known repeats
Most comonly used: Repeatmasker
RepeatMasker
- web site: http://www.repeatmasker.org/
- current version (checked on 2010-03.22): 3.2.8
- documentation: http://www.repeatmasker.org/webrepeatmaskerhelp.html
- Online web server [1]
- command line
You have to have a FastA file (it can be multiple FastA). Type:
repmask your_sequence_in_fasta_format
You will get a file: your_sequence_in_fasta_format.masked --- name tells all
species options (choose only one):
-m(us) masks rodent specific and mammalian wide repeats -rod(ent) same as -mus -mam(mal) masks repeats found in non-primate, non-rodent mammals -ar(abidopsis) masks repeats found in Arabidopsis -dr(osophila) masks repeats found in Drosophilas -el(egans) masks repeats found in C. elegans
De novo repeat library construction
For review see: Saha et al. Empirical comparison of ab initio repeat finding programs (2008)
RepeatScout
command line only, requires compilation
Site: http://bix.ucsd.edu/repeatscout/
current version (2010-03): 1.05
Documentation:
- http://bix.ucsd.edu/repeatscout/readme.1.0.5.txt
- PPT presentation presenting algorithm: http://bix.ucsd.edu/repeatscout/repeatscout-ismb.ppt
- publication (PDF)De novo identification of repeat families in large genomes 2005
Simplest run:
build_lmer_table -sequence input_sequence.fas -freq output_lmer.frequency RepeatScout -sequence input_sequence.fas -output output_repeats -freq output_lmer.frequency