User:Lindenb/Notebook/UMR915/20100802

=Belgium= info about 454 on http://sequence.otago.ac.nz/download/ManualPartC.pdf

AllDiffs
This file contains the list of variations where at least''' 2 reads differ either from the reference sequence or from other reads aligned at a specific location'''. SNPs, insertion-deletion pairs, multi-homopolymer insertion or deletion regions, and single-base overcalls and undercalls are reported.

The full summary lines contain 15 columns of information, of which the first 7 columns are always output, columns eight through eleven are output if gene annotations or known SNP information is given to the GS Reference Mapper


 * 1. Reference Accno - The accession number of the reference sequence in which the difference was detected
 * 2. Start Pos - The start position within the reference sequence, where the difference occurs
 * 3. End Pos - The end position within the reference sequence, where the difference occurs
 * 4. Ref Nuc - The reference nucleotide sequence at the difference location
 * 5. Var Nuc - The differing nucleotide sequence at the difference location
 * 6. Total Depth - The total number of reads that fully span the difference location
 * 7. Var Freq - The percentage of different reads versus total reads that fully span the difference location
 * 8. Ref AA - The reference amino acid sequence at the difference location, if it occurs within the coding region of an annotated gene
 * 9. Var AA - The differing amino acid sequence at the difference location, if it occurs within the coding region of an annotated gene
 * 10. Coding Frame - The gene name at the difference location, if it occurs with the region of an annotated gene
 * 11. Region name - The list of known SNP IDs that occur at the difference location
 * 12. Known SNP’s - The number of forward reads that include the difference
 * 13. # Fwd w/ Var - The number of forward reads that include the difference (with –fd only)
 * 14. # Rev w/ Var - The number of reverse reads that include the difference (with –fd only)
 * 15. # Fwd Total - The total number of forward reads that fully span the difference location (with–fd only)
 * 16. # Rev Total -The total number of reverse reads that fully span the difference location (with -fd only)

The multiple alignment for a difference shows the difference location plus approximately 30 bases on either side. The alignment is divided into two sections: the first section displays the reads found to contain the difference and the second, the reads not found to have the difference. Each line of the alignment displays the following information for a read:
 * 1. The identifier for the read
 * 2. The number of duplicate reads whose alignment matches this read (when there are duplicates of the read)
 * 3. The position of the read’s first base displayed in the alignment region
 * 4. The orientation of the read in the displayed alignment (“+” is forward orientation, “-” is reverse-complement orientation, relative to the reference)
 * 5. The aligned bases of the read
 * 6. The position of the read’s last base displayed in the alignment region

AllStructVar
show rearrangement points and rearrangement regions observed in the reads of the data set analyzed, relative to the reference.

The columns found on the summary line for each variation are described below. Some columns (the Region Name columns) require that annotations be supplied to the project. Some other columns are output only if the “–fd” option is specified.
 * 1. Ref Accno1 – the accession number of the reference sequence on one side of the variation
 * 2. Ref Pos1– the reference position on one side of the variation
 * 3. Var Side1 – a direction arrow “->” or “<-” describing the direction of the variation on the reference (i.e, the 3’ end of the reads or paired-end clones that diverge from the reference occur in this direction)
 * 4. Region Name1 – the gene name, or annotated region name, covering the location in the reference denoted by Ref Accno1 and Ref Pos1 (Gene or Region annotation must be included in the project for this field to contain a value)
 * 5. Ref Accno2 – the accession number of the reference sequence on the other side of the variation, if known. (If only one side of the variation is known, a question mark is given here)
 * 6. Ref Pos2– the reference position on the other side of the variation, or a question mark if only one side is known
 * 7. Var Side2 – a direction arrow “<-” or “->” for the direction on the other side of the variation, or a question mark if only one side is known
 * 8. Region Name2 – the gene name, or annotated region name, covering the location in the reference denoted by Ref Accno2 and Ref Pos2 (Gene or Region annotation must be included in the project for this field to contain a value)
 * 9. Total Depth – the number of reads (for rearrangement points) or pairs (for rearrangement regions) covering the variation location(s)
 * 10. Var Freq – the percentage of the reads/pairs that support the variation
 * 11. Deviation Length – if both sides of the variation occur on the same reference, this is the distance between the two variation locations
 * 12. Type – the string “Point” or “Region” to denote whether the rearrangement is a rearrangement point identified by split-read alignments or a rearrangement region identified by paired-end reads
 * 13. # Fwd w/ var – number of reads on the forward orientation that contain the variation (requires -fd option)
 * 14. # Rev w/ var – number of reads on the reverse orientation that contain the variation (requires -fd option).
 * 15. # Fwd Total – total number of reads in the forward orientation that map to this area of the reference (requires -fd option).
 * 16. # Rev Total – total number of reads in the reverse orientation that map to this area of the reference (requires -fd option).

Example
Ref    Ref     Var     Region     Ref     Ref     Var     Region     Total     Var     Deviation     Type Accno1    Pos1     Side1     Name1     Accno2     Pos2     Side2     Name2     Depth     Freq     Length >chrXX    2954601     -->     XXXX     chrXX     2954783     <--     XXXX     4     75.00     181     Point Other Reads: chrXX                  2954561+ GGGAGGTGGAGGTTGCAGTGACCTGAGATTGCACCACTGCA-C-TCCA-GCCTGGGTGACAGAGCAAGACTCTGTCTCAG 2954637 *    GKF3EFN01DHY8H  (2)         357+ GGGAGGTGGAGGTTGCAGTGACCTGAGATTGCACCACTGCA-C-TCCA-GCCTGGGTGACAGAGCAAGACTCTGTCTCAG 433 *    Reads with Difference: *    GKF3EFN01CZ26M              301- GGGAGGTGGAGGTTGCAGTGACCTGAGATTGCACCACTGCA                                        261 GKF3EFN01AYBQ5             374- GGGAGGTGGAGGTTGCAGTGACCTGAGATTGCACCACTGCA                                        334 GKF3EFN01AOL7K              67+ GGGAGGTGGAGGTTGCAGTGACCTGAGATTGCACCACTGCA                                        107 *    Reads with Difference: chrXX                  2954743+ CCCAGGCTGGAGTGCAGTGGTGCAATCATGACCCACTGCAGCCTCAACCTCCTCCCATGCTCAAGTGATCCTCCCGCCTC 2954822 *    GKF3EFN01AOL7K              108+                                         GCCTCAACCTCCTCCCATGCTCAAGTGATCCTCCCGCCTC 147 GKF3EFN01CZ26M             260-                                         GCCTCAACCTCCTCCCATGCTCAAGTGATCCTCCCGCCTC 221 GKF3EFN01AYBQ5             333-                                         GCCTCAACCTCCTCCCATGCTCAAGTGATCCTCCCGCCTC 294 *    Other Reads: *    GKF3EFN01CIQDH               70+ CCCAGGCTGGAGTGCAGTGGTGCAATCATGACCCACTGCAGCCTCAACCTCCTCCCATGCTCAAGTGATCCTCCCGCCTC 149 GKF3EFN01BR05H             436- CCCAGGCTGGAGTGCAGTGGTGCAATCATGACCCACTGCAGCCTCAACCTCCTCCCATGCTCAAGTGATCCTCCCGCCTC 357 * content of GKF3EFN01BFPR7: