RRedon:Protocols/PINDEL

http://www.ebi.ac.uk/~kye/pindel/
http://bioinformatics.oxfordjournals.org/cgi/content/full/25/21/2865
detects
- large deletions 1bp - 10kb
- medium size insertions 1-20pb
Method
(citing the paper) Suppose we define a database S of genome sequence as
S=ATCAAGTATGCTTAGC
and the pattern
P=ATGCA
The output of the algorithm consists of all substrings (and their locations) starting from the leftmost base of P that appear exactly once in S.
In the first step, we scan the whole database for ‘A’, the first base of pattern P.
The locations of ‘A’
ATCAAGTATGCTTAGC
are stored in a projected database of ‘A’. In the second step, we look for ‘T’ as it is the second base in pattern P at the right side of ‘A's identified previously. The projected database for ‘AT’ then only contains two locations (‘ATCAAGTATGCTTAGC’). When we search for the third base ‘G’ of pattern P at the right sides of ‘AT’, we found that ‘ATG’ appears exactly once in the database S (‘ATCAAGTATGCTTAGC’). Thus we know that ‘ATG’ is the minimum unique substring of pattern P in the database S. After we examine the fourth and fifth base of pattern P, we notice that ‘ATGC’ is also unique in the database S but ‘ATGCA’ isn't. In this case we know that ‘ATGC’ is the maximum unique substring of pattern P in this particular database S.