Etchevers:Notebook/STRA6 in eye development/2009/07/15

{| width="800"
 * style="background-color: #EEE"|[[Image:C14.jpg|128px]] Genetics of human eye development
 * style="background-color: #F2F2F2" align="center"|  |Main project page
 * style="background-color: #F2F2F2" align="center"|  |Main project page

Next entry until links are fixed
 * colspan="2"|
 * colspan="2"|

Running Findpeaks finally
First, we see that all five sequencing results are extremely similar when the .bed files are visualized on the UCSC Genome Browser. So much so we wondered if any of these unintended possibilities happened:


 * Chromatin conformation in this particular preparation made some places more likely to non-specifically IP than others. In which case we are REALLY sorry not to have performed an IgG IP and sequenced that as well. Though could do pair-wise comparisons between banks.


 * All four transcription factors recognize exactly the same thing. Highly unlikely, esp for non-specific background, or in combination with above.


 * Something else was sequenced and the libraries for which we saw and approved the nature of clones were not the same as what was sequenced. Highly unlikely if not impossible.


 * A single IP was separately bar-coded five times and sequenced, then separated with bioinformatics. Highly unlikely.


 * All five samples were mixed before separate labeling with the bar codes. Also highly unlikely.


 * Each sample had all five bar codes. Highly unlikely if not impossible.

Given that Anthony pushes for an IgG control and discourages using the simulated controls, we conclude that the first option is the most likely, but that ChIP-Seq analyses are sufficiently new that this sort of good conceptualization of the experiment from the get-go is not yet a standard.

So, will try to perform the following pair-wise combinations. The first is to compare OTX2-1 and OTX2-2 to see if these two are more similar than to any of the other TF ChIPs. I REALLY hope so.

OTX2-1 vs:


 * OTX2-2
 * RAX
 * SOX2
 * PAX6

RAX vs:
 * OTX2-1
 * OTX2-2
 * SOX2
 * PAX6

SOX2 vs:
 * OTX2-1
 * OTX2-2
 * RAX
 * PAX6

All the combinations won't be performed because we think that most of these TFs will co-IP with PAX6 and possibly SOX2. When I say "pair" I mean use the top one as the "control" and each subsequent bank as the sample.

Questions to ask Fasteris: What is their empirical experience with the comparison of multiple IPs from a given chromatin preparation? Same sort of near-identical distributions of reads on the human genome. Is the range of peak sizes comparable? Is there a way to filter certain parts of the genome, not just repeated sequences, but perhaps exons even, to avoid some regions that are particularly "sticky" and precipitable from one experiment to another, across human chromatin generally?

Are all the coverage-gap-filtered peaks in all five banks actually the same or not?
 * Heather 07:12, 15 July 2009 (EDT):

oeil@cornee:~/trunk/jars/fp4$ time java -Xmx2G -jar FindPeaks.jar -name RAX-PAX6-FP -input /home/oeil/Documents/Fasteris_7_2009/GDZ5_PAX6_bed/2009-07-01_GDZ-5_export_Chr1.bed -output /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/ -aligner bed -one_per -dist_type 1 200 -minimum 2 -compare /home/oeil/Documents/Fasteris_7_2009/GDZ4_RAX_bed/2009-07-01_GDZ-4_export_Chr1.bed -alpha

Version: Initializing class Log_Buffer                       $Revision: 1145 $ Version: Initializing class FindPeaks                        $Revision: 1335 $ Info:   Note: all output now goes to log file. Info:   Log file: /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/RAX-PAX6-FP.log Version: Initializing class Parameters                       $Revision: 1298 $ Info:    * MC simulation        : Off Info:    * Chr name prepend     : none Info:    * Min. reported pk ht  : 2 Info:    * Minimum ht to process: Off Info:    * Lander-Waterman FDR  : Off Info:    * Output Sequence      : Off Info:    * Output directory     : /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/ Info:    * Control files in use : Off Info:    * Compare files in use : On    Info:     * Peak ht transform    : false Info:    * Compare window size  : 100 Info:    * Auto-threshold       : Off Info:    * Filter on PET flags  : Off Info:    * Maximum PET frag size: Off Info:    * Aligner              : bed Info:    * Triangle dist.       : 100 low Info:    * Triangle dist.       : 200 median Info:    * Triangle dist.       : 300 high Info:    * One file per chr. : On   Info:     * Naming files as      : RAX-PAX6-FP Info:    * Sub-peaks            : Off Info:    * Trim                 : Off Info:    * Saturation Analysis  : Off Error:    Unexpected number of parameters for -alpha: real	0m1.077s user	0m0.096s sys	0m0.036s

First try again using a flag for -alpha : 0.05. This went a little further, but,

Version: Initializing class Log_Buffer                       $Revision: 1145 $ Version: Initializing class FindPeaks                        $Revision: 1335 $ Info:   Note: all output now goes to log file. Info:   Log file: /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/RAX-PAX6-FP.log Version: Initializing class Parameters                       $Revision: 1298 $ Info:    * MC simulation        : Off Info:    * Chr name prepend     : none Info:    * Min. reported pk ht  : 2 Info:    * Minimum ht to process: Off Info:    * Lander-Waterman FDR  : Off Info:    * Output Sequence      : Off Info:    * Output directory     : /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/ Info:    * Control files in use : Off Info:    * Compare files in use : On Info:    * Peak ht transform    : false Info:    * Compare window size  : 100 Info:    * Auto-threshold       : Off Info:    * Filter on PET flags  : Off Info:    * Maximum PET frag size: Off Info:    * Aligner              : bed Info:    * Triangle dist.       : 100 low Info:    * Triangle dist.       : 200 median Info:    * Triangle dist.       : 300 high Info:    * One file per chr. : On   Info:     * Naming files as      : RAX-PAX6-FP Info:    * Sub-peaks            : Off Info:    * Trim                 : Off Info:    * Saturation Analysis  : Off Info:    * Compare alpha value  : 0.05	(Confidence Interval: 95.0) Info:    * Histogram length     : 30 Info:    * Histogram precision  : 1 Info:    * Peaks File Header    : On    Info:     * Bedgraph/Wigfile     : wig file Info:    * R mode               : Off Info:    * Filter Duplicates    : Off Info:    * Filter quality       : Off Version: Initializing class PeakWriter                       $Revision: 1299 $ Version: Initializing class Generic_AlignRead_Iterator       $Revision: 1318 $ Version: Initializing class BedIterator                      $Revision: 1317 $ Warning: Not enough fields: browser position chr1:1-1000000 Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1 at src.lib.ioInterfaces.BedIterator.next(BedIterator.java:110) at src.lib.ioInterfaces.BedIterator.next(BedIterator.java:23) at src.lib.ioInterfaces.Generic_AlignRead_Iterator.hasNext(Generic_AlignRead_Iterator.java:103) at src.projects.findPeaks.FindPeaks.core_routine_one_file(FindPeaks.java:458) at src.projects.findPeaks.FindPeaks.main(FindPeaks.java:843)

And nothing else is happening. This created a "RAX-PAX6-FP.log" and an empty "RAX-PAX6-FP_triangle_standard.peaks". The first file contains the info I just put above.
 * Heather 07:38, 15 July 2009 (EDT):

It looked like the format of the .bed files was not good, as there are optional lines you can put in to make it look nicer in the browser. I removed these and renamed the files in gedit (the text editor of Ubuntu) to PAX6_Chr21 and RAX_Chr21 as a test.

oeil@cornee:~$ time java -Xmx2G -jar ~/trunk/jars/fp4/FindPeaks.jar -name RAX-PAX6-FP -input /home/oeil/Documents/Fasteris_7_2009/GDZ5_PAX6_bed/PAX6_Chr21.bed -output /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/ -aligner bed -dist_type 1 200 -minimum 2 -compare /home/oeil/Documents/Fasteris_7_2009/GDZ4_RAX_bed/RAX_Chr21.bed -alpha 0.05

Version: Initializing class Log_Buffer                       $Revision: 1145 $ Version: Initializing class FindPeaks                        $Revision: 1335 $ Info:   Note: all output now goes to log file. Info:   Log file: /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/RAX-PAX6-FP.log Version: Initializing class Parameters                       $Revision: 1298 $ Info:    * MC simulation        : Off Info:    * Chr name prepend     : none Info:    * Min. reported pk ht  : 2 Info:    * Minimum ht to process: Off Info:    * Lander-Waterman FDR  : Off Info:    * Output Sequence      : Off Info:    * Output directory     : /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/ Info:    * Control files in use : Off Info:    * Compare files in use : On    Info:     * Peak ht transform    : false Info:    * Compare window size  : 100 Info:    * Auto-threshold       : Off Info:    * Filter on PET flags  : Off Info:    * Maximum PET frag size: Off Info:    * Aligner              : bed Info:    * Triangle dist.       : 100 low Info:    * Triangle dist.       : 200 median Info:    * Triangle dist.       : 300 high Info:    '''* One file per chr. : Off''' Info:    * Naming files as      : RAX-PAX6-FP Info:    * Sub-peaks            : Off Info:    * Trim                 : Off Info:    * Saturation Analysis  : Off Info:    * Compare alpha value  : 0.05	(Confidence Interval: 95.0) Info:    * Histogram length     : 30 Info:    * Histogram precision  : 1 Info:    * Peaks File Header    : On    Info:     * Bedgraph/Wigfile     : wig file Info:    * R mode               : Off Info:    * Filter Duplicates    : Off Info:    * Filter quality       : Off Version: Initializing class PeakWriter                       $Revision: 1299 $ Version: Initializing class Generic_AlignRead_Iterator       $Revision: 1318 $ Version: Initializing class BedIterator                      $Revision: 1317 $ Info:   Running Peak Processor Version: Initializing class PeakDataSet Peak Locator         $Revision: 1335 $ Version: Initializing class PeakStore                        $Revision: 1335 $ Version: Initializing class MapStore                         $Revision: 1335 $ '''Info:   Current chromosome : chr21 Info:   Reads used: 39684''' Version: Initializing class PeakStats                        $Revision: 1335 $ Version: Initializing class Histogram                        $Revision: 1197 $ Info:   Current chromosome : chr21  Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at src.projects.findPeaks.objects.Map_f. (Map_f.java:9) at src.projects.findPeaks.objects.MapStore.put(MapStore.java:42) at src.projects.findPeaks.PeakDataSetParent.process_float_based(PeakDataSetParent.java:383) at src.projects.findPeaks.PeakDataSetParent.process_peaks_from_iterator2(PeakDataSetParent.java:505) at src.projects.findPeaks.PeakDataSetParent. (PeakDataSetParent.java:156) at src.projects.findPeaks.FindPeaks.core_routine_one_file(FindPeaks.java:619) at src.projects.findPeaks.FindPeaks.main(FindPeaks.java:843) ^C real	1m55.855s user	0m4.124s sys	0m1.492s oeil@cornee:~$

Well, I don't really know what that's about. Check out this solution tomorrow.

Tried with "3G" (seems like a lot!!) and get exactly the same thing with a little more time.

real	0m21.437s user	0m5.164s sys	0m2.260s oeil@cornee:~$


 * Heather 12:29, 15 July 2009 (EDT):


 * }