Wikiomics:Repeat finding

From OpenWetWare

(Difference between revisions)
Jump to: navigation, search
(De novo repeat library construction)
(Detecting known repeats)
Line 14: Line 14:
=Detecting known repeats=
=Detecting known repeats=
 +
Most comonly used: Repeatmasker
 +
 +
==RepeatMasker==
 +
 +
* web site: http://www.repeatmasker.org/
 +
* current version (checked on 2010-03.22): 3.2.8
 +
* documentation: http://www.repeatmasker.org/webrepeatmaskerhelp.html
 +
 +
 +
* Online web server [http://www.repeatmasker.org/cgi-bin/WEBRepeatMasker]
 +
 +
* command line   
 +
 +
You have to have a FastA file (it can be multiple FastA). Type:
 +
 +
<pre>
 +
repmask your_sequence_in_fasta_format
 +
</pre>
 +
 +
You will get a file: your_sequence_in_fasta_format.masked --- name tells all
 +
 +
species options (choose only one):
 +
<pre>
 +
-m(us) masks rodent specific and mammalian wide repeats
 +
-rod(ent) same as -mus
 +
-mam(mal) masks repeats found in non-primate, non-rodent mammals
 +
-ar(abidopsis) masks repeats found in Arabidopsis
 +
-dr(osophila) masks repeats found in Drosophilas
 +
-el(egans) masks repeats found in C. elegans
 +
</pre>
=De novo repeat library construction=
=De novo repeat library construction=

Revision as of 09:48, 22 March 2010

To simplify, this page assumes eucakariotic genomic DNA repeat finding.

Repeat finding can be divided into two tasks, depending on availability of repeat library:

A) Library exists for a given (or possibly closely related species)

or

B) you construct such library de novo.


Task A is usually a prerequisite step for genome annotation and even blast searches. For newly sequences genomes one should start with B (constructing species specific repeat library).


Contents

Detecting known repeats

Most comonly used: Repeatmasker

RepeatMasker


  • Online web server [1]
  • command line

You have to have a FastA file (it can be multiple FastA). Type:

repmask your_sequence_in_fasta_format

You will get a file: your_sequence_in_fasta_format.masked --- name tells all

species options (choose only one):

-m(us) masks rodent specific and mammalian wide repeats
-rod(ent) same as -mus
-mam(mal) masks repeats found in non-primate, non-rodent mammals
-ar(abidopsis) masks repeats found in Arabidopsis
-dr(osophila) masks repeats found in Drosophilas
-el(egans) masks repeats found in C. elegans

De novo repeat library construction

For review see: Saha et al. Empirical comparison of ab initio repeat finding programs (2008)

RepeatScout

command line only, requires compilation

Site: http://bix.ucsd.edu/repeatscout/

current version (2010-03): 1.05

Documentation:

Simplest run:

build_lmer_table -sequence input_sequence.fas -freq output_lmer.frequency
RepeatScout -sequence input_sequence.fas -output output_repeats -freq  output_lmer.frequency
Personal tools