Etchevers:Notebook/STRA6 in eye development/2009/07/21

{| width="800"
 * style="background-color: #EEE"|[[Image:C14.jpg|128px]] Genetics of human eye development
 * style="background-color: #F2F2F2" align="center"|  |Main project page
 * style="background-color: #F2F2F2" align="center"|  |Main project page


 * colspan="2"|
 * colspan="2"|

Parsing Findpeaks
From Anthony's poster FindPeaks-Transcripter, it explains the distribution issue between the "triangle" and "real" distributions. So favor -dist_type 3 in next trial.

I can't figure out why there are no common points in the two Chr 17 files, when the vast majority are common, and there are a number of significant peaks as calculated by Fasteris.
 * Heather 03:18, 21 July 2009 (EDT):

Note this! -max_pet_size 


 * When processing reads from bed files or other sources, it is possible to find erroneous reads which span long sections of the chromosome. These may be the result of inversions or translocations, but are frequently just misalignments to one end of a PET. Leaving these tags in a ChIP-Seq experiment can result in massively long - and incorrect - areas of enrichment. They have also been noted to cause FindPeaks to run out of memory, as they require large segments of chromosomes to be processed as a single area of enrichment. To filter out reads over a certain length, use the -max_pet_size flag.


 * For a ChIP-Seq experiment with reads expected between 100-300 bp, we recommend filtering tags that span more than 1000bp: -max_pet_size 1000

Meanwhile, could compare the peak generation for PAX6 and RAX for Chr 17 each using the two FDR one-sample distribution probability tools. But first run again using the above flag and the native lengths. Although in theory, we are '''not doing paired-end reads.  If you do the LW or MC FDR calculations, will generate files that can then be used as "controls" with the -control''' flag. (As opposed to -compare, I think).

-landerwaterman 


 * Enable the Lander-Waterman based FDR calculation. It is a probabilistic (analytical) approach that usually modelizes a uniform repartition of Poisson like events. We use it to modelise the number of background peak for each height. It output and FDR table per chromosome.


 * The lander waterman parameter requires an FDR threshold value, between zero and 1. A good starting value is 0.01.


 * This parameter should be used only with fixed Xsets (-dist_type 0), as it performs poorly with weighted (-dist_type 1) distributions and the native/PET (-dist_type 3) distributions.

Or: (Monte Carlo FDR calculation)

-iterations [ ] -eff_frac 0.7


 * This command runs the MC FDR for estimating background noise. It is highly suggested that a null control or other control be provided instead using the -compare or -control flags. This method should only be used when there are no other alternatives.


 * The number of iterations used should be in the range of 3-10. More iterations may help, depending on your data set.


 * It currently provides an estimation of the fraction of reads likely to be noise.

Possibly try adding in:

-saturation - but cf. the WorkFlows page which says that it is memory- and time-hungry as a calculation.

-min_ht_process 


 * This flag can be used together with the -minimum flag. If a -minimum peak size is provided, it may be of some value to spend less time processing smaller peaks. This flag will prevent peaks with a size smaller than that provided from having their profile map stored. Profile maps are integer arrays which hold the wig information for each peak, and are used to calculate peak max locations and are dumped to the wig file during subsequent processing. Peak data is retained when this flag is used, but peak max locations may not be accurate for these peaks.


 * For runs where -compare and -control are not used, this flag can be set up to the value provided by -minimum. For runs with -compare or -control, it is suggested that this not be set higher than 3. (At 3, this flag still provides a significant memory saving without significant loss of accuracy of the control/compare methods.)


 * Heather 03:50, 21 July 2009 (EDT):

time java -Xms2G -Xmx11G -jar ~/trunk/jars/fp4/FindPeaks.jar -name PAX6-RAX-FP-Chr17 -input /home/oeil/Documents/Fasteris_7_2009/GDZ5_PAX6_bed/PAX6_Chr17.bed -compare /home/oeil/Documents/Fasteris_7_2009/GDZ4_RAX_bed/RAX_Chr17.bed -aligner bed -dist_type 3 -minimum 20 -min_ht_process 3 -compare /home/oeil/Documents/Fasteris_7_2009/GDZ4_RAX_bed/RAX_Chr17.bed -alpha 0.05 -max_pet_size 1000 -output /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/17-7_PAX6-RAX_chr17


 * Version: Initializing class Log_Buffer                       $Revision: 1145 $
 * Version: Initializing class FindPeaks                        $Revision: 1335 $
 * Info:   Note: all output now goes to log file.
 * Info:   Log file: /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/17-7_PAX6-RAX_chr17/PAX6-RAX-FP-Chr17.log
 * Version: Initializing class Parameters                       $Revision: 1298 $
 * Info:    * MC simulation        : Off
 * Info:    * Chr name prepend     : none
 * Info:    * Min. reported pk ht  : 20
 * Info:    * Minimum ht to process: 3
 * Info:    * Lander-Waterman FDR  : Off
 * Info:    * Output Sequence      : Off
 * Info:    * Output directory     : /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/17-7_PAX6-RAX_chr17/
 * Info:    * Control files in use : Off
 * Info:    * Compare files in use : On
 * Info:    * Peak ht transform    : false
 * Info:    * Compare window size  : 100
 * Info:    * Auto-threshold       : Off
 * Info:    * Filter on PET flags  : Off
 * Info:    * Maximum PET frag size: 1000
 * Info:    * Aligner              : bed
 * Info:    * PET/native mode   : On
 * Info:    * One file per chr.    : Off
 * Info:    * Naming files as      : PAX6-RAX-FP-Chr17
 * Info:    * Sub-peaks            : Off
 * Info:    * Trim                 : Off
 * Info:    * Saturation Analysis  : Off
 * Info:    * Compare alpha value  : 0.05	(Confidence Interval: 95.0)
 * Info:    * Histogram length     : 30
 * Info:    * Histogram precision  : 1
 * Info:    * Peaks File Header    : On
 * Info:    * Bedgraph/Wigfile     : wig file
 * Info:    * R mode               : Off
 * Info:    * Filter Duplicates    : Off
 * Info:    * Filter quality       : Off
 * Version: Initializing class PeakWriter                       $Revision: 1299 $
 * Version: Initializing class Generic_AlignRead_Iterator       $Revision: 1318 $
 * Version: Initializing class BedIterator                      $Revision: 1317 $
 * Info:   Running Peak Processor
 * Version: Initializing class PeakDataSet Peak Locator         $Revision: 1335 $
 * Version: Initializing class PeakStore                        $Revision: 1335 $
 * Version: Initializing class MapStore                         $Revision: 1335 $
 * Info:   Current chromosome : chr17
 * Info:   Reads used: 8601
 * Version: Initializing class PeakStats                        $Revision: 1335 $
 * Version: Initializing class Histogram                        $Revision: 1197 $
 * Info:   Current chromosome : chr17
 * Info:   Reads used: 2735
 * Version: Initializing class ApplyCompare                     $Revision: 1332 $
 * Info:   Linear Regresion: Total:	9
 * Info:   Linear Regresion: Used:	0
 * Version: Initializing class LinearRegressionPerpendicular    $Revision: 1285 $
 * Warning: Can't run a Linear Regression calculation with zero points.
 * Warning: Can not apply filter. A valid slope was not obtained from the analysis.
 * Info:   Linear Regresion: Remaining:	9
 * Version: Initializing class RegionWriter                     $Revision: 1229 $
 * Info:   writing to : /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/17-7_PAX6-RAX_chr17/PAX6-RAX-FP-Chr17_mode_3_standard_chr17.regions
 * Version: Initializing class Wigwriter                        $Revision: 1329 $
 * Info:   writing sample to:  /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/17-7_PAX6-RAX_chr17/PAX6-RAX-FP-Chr17_mode_3_standard_chr17_filtered_sample.wig.gz
 * Info:   writing comtrol to: /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/17-7_PAX6-RAX_chr17/PAX6-RAX-FP-Chr17_mode_3_standard_chr17_filtered_control.wig.gz
 * Info:   writing to: /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/17-7_PAX6-RAX_chr17/PAX6-RAX-FP-Chr17_mode_3_standard_chr17.wig.gz
 * ^C
 * real	189m0.345s
 * user	2m1.724s
 * sys	2m33.370s


 * Heather 07:40, 21 July 2009 (EDT):

time java -Xms2G -Xmx11G -jar ~/trunk/jars/fp4/FindPeaks.jar -name PAX6-FP-Chr17 -input /home/oeil/Documents/Fasteris_7_2009/GDZ5_PAX6_bed/PAX6_Chr17.bed -aligner bed -dist_type 0 71 -landerwaterman 0.01 -auto_threshold 0.05 -eff_frac 0.7 -output /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/21-7_PAX6-LW_chr17


 * Version: Initializing class Log_Buffer                       $Revision: 1145 $
 * Version: Initializing class FindPeaks                        $Revision: 1335 $
 * Info:   Note: all output now goes to log file.
 * Info:   Log file: /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/21-7_PAX6-LW_chr17/PAX6-FP-Chr17.log
 * Version: Initializing class Parameters                       $Revision: 1298 $
 * Info:    * MC simulation        : Off
 * Info:    * Chr name prepend     : none
 * Info:    * Min. reported pk ht  : 0
 * Info:    * Minimum ht to process: Off
 * Info:    * Lander-Waterman FDR  : On (0.01)
 * Info:    * Output Sequence      : Off
 * Info:    * Output directory     : /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/21-7_PAX6-LW_chr17/
 * Info:    * Control files in use : Off
 * Info:    * Compare files in use : Off
 * Error:    -auto_threshold parameter may not be used without a null control or in a saturation run with -iterations.  please provide a control to use this option.


 * real	0m0.156s
 * user	0m0.056s
 * sys	0m0.028s

time java -Xms2G -Xmx11G -jar ~/trunk/jars/fp4/FindPeaks.jar -name PAX6-FP-Chr17 -input /home/oeil/Documents/Fasteris_7_2009/GDZ5_PAX6_bed/PAX6_Chr17.bed -aligner bed -dist_type 0 71 -landerwaterman 0.01 -minimum 20 -eff_frac 0.7 -output /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/21-7_PAX6-LW_chr17


 * Version: Initializing class Log_Buffer                       $Revision: 1145 $
 * Version: Initializing class FindPeaks                        $Revision: 1335 $
 * Info:   Note: all output now goes to log file.
 * Info:   Log file: /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/21-7_PAX6-LW_chr17/PAX6-FP-Chr17.log
 * Version: Initializing class Parameters                       $Revision: 1298 $
 * Info:    * MC simulation        : Off
 * Info:    * Chr name prepend     : none
 * Info:    * Min. reported pk ht  : 20
 * Info:    * Minimum ht to process: Off
 * Info:    * Lander-Waterman FDR  : On (0.01)
 * Info:    * Output Sequence      : Off
 * Info:    * Output directory     : /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/21-7_PAX6-LW_chr17/
 * Info:    * Control files in use : Off
 * Info:    * Compare files in use : Off
 * Info:    * Auto-threshold       : Off
 * Info:    * Filter on PET flags  : Off
 * Info:    * Maximum PET frag size: Off
 * Info:    * Aligner              : bed
 * Info:    * Fixed width dist.    : 71 xset
 * Info:    * One file per chr.    : Off
 * Info:    * Naming files as      : PAX6-FP-Chr17
 * Info:    * Sub-peaks            : Off
 * Info:    * Trim                 : Off
 * Info:    * Saturation Analysis  : Off
 * Info:    * Histogram length     : 30
 * Info:    * Histogram precision  : 1
 * Info:    * Peaks File Header    : On
 * Info:    * Bedgraph/Wigfile     : wig file
 * Info:    * R mode               : Off
 * Info:    * Filter Duplicates    : Off
 * Info:    * Filter quality       : Off
 * Info:   Inititing fixed width with max_len : 71
 * Version: Initializing class PeakWriter                       $Revision: 1299 $
 * Version: Initializing class Generic_AlignRead_Iterator       $Revision: 1318 $
 * Version: Initializing class BedIterator                      $Revision: 1317 $
 * Info:   Running Peak Processor
 * Version: Initializing class PeakDataSet Peak Locator         $Revision: 1335 $
 * Version: Initializing class PeakStore                        $Revision: 1335 $
 * Version: Initializing class MapStore                         $Revision: 1335 $
 * Info:   Current chromosome : chr17
 * Info:   Reads used: 103765
 * Version: Initializing class PeakStats                        $Revision: 1335 $
 * Version: Initializing class Histogram                        $Revision: 1197 $
 * Version: Initializing class FileOut                          $Revision: 468 $
 * Info:   threshold for sample estimated (LW) at: 0.0
 * Version: Initializing class Wigwriter                        $Revision: 1329 $
 * Info:   writing to: /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/21-7_PAX6-LW_chr17/PAX6-FP-Chr17_fixed_71_standard_chr17.wig.gz
 * ^Z
 * [1]+ Stopped            (it was taking forever and nothing was generated)


 * real	116m52.080s
 * user	0m0.000s
 * sys	0m0.000s


 * Adeline Vigouroux 11:30, 21 July 2009 (EDT):


 * }