RRedon:Protocols/PINDEL

http://www.ebi.ac.uk/~kye/pindel/

http://bioinformatics.oxfordjournals.org/cgi/content/full/25/21/2865

detects
 * large deletions 1bp - 10kb
 * medium size insertions 1-20pb

=Method= (citing the paper) Suppose we define a database S of genome sequence as

S=ATCAAGTATGCTTAGC

and the pattern

P=ATGCA

The output of the algorithm consists of all substrings (and their locations) starting from the leftmost base of P that appear exactly once in S.

In the first step, we scan the whole database for ‘A’, the first base of pattern P.

The locations of ‘A’

ATCAAGTATGCTTAGC

are stored in a projected database of ‘A’. In the second step, we look for ‘T’ as it is the second base in pattern P at the right side of ‘A's identified previously. The projected database for ‘AT’ then only contains two locations (‘ATCAAGTATGCTTAGC’). When we search for the third base ‘G’ of pattern P at the right sides of ‘AT’, we found that ‘ATG’ appears exactly once in the database S  (‘ATCAAGTATGCTTAGC’). Thus we know that ‘ATG’ is the minimum unique substring of pattern P in the database S. After we examine the fourth and fifth base of pattern P, we notice that ‘ATGC’ is also unique in the database S but ‘ATGCA’ isn't. In this case we know that ‘ATGC’ is the maximum unique substring of pattern P in this particular database S.