User:Karmella Haynes/Notebook/PcTF Genomics/2014/02/10

From OpenWetWare
Jump to navigationJump to search
Pc-TF Genomics Main project page
Previous entry      Next entry


02/10/14

  • ChIP-seq analysis: STAR



ChIP-seq analysis via STAR

  • Significance Tester for the Accumulation of Reads (STAR) is a program used by Bramswig et al. to determine enrichments of H3K27me3, since "the broad architecture of the repressive H3K27me3 mark requires more sensitive peak-calling algorithms for precise analysis"
  • Paper reference for STAR is Lefterova MI, et al. Cell-specific determinants of peroxisome proliferator-activated receptor gamma function in adipocytes and macrophages. Mol Cell Biol. 2010;30(9):2078–2089.
  • Website for STAR is http://www.cbil.upenn.edu/STAR/
  • Bramswig et al. used the following setting in their STAR analysis: sliding window of 5,000 bp, step size of 1,000 bp, and a FDR=0.5%.


PROCEDURE

  • Download the zip file. Move unzipped STAR_1.0 folder to Applications. STAR_1.0 folder contains:
    • README
    • STAR.jar
    • STAR.java
    • test_data.txt (Note: running the program with this file name gave an error. Changing the name to test_data1.txt fixes the problem)
  • Mac: Run the Terminal utility. COMMAND: open -a Terminal /applications/STAR_1.0
    • This opens a new Terminal window called STAR-1.0 -- bash -- [size of window]
  • README instructs user to try COMMAND: java -jar STAR.jar 10 1 true .5 10 true test_data1.txt false false test_out 1 true 10 5 0
  • Everything appears to be working


USAGE

  • The COMMAND java -jar STAR.jar 10 1 true .5 10 true test_data1.txt false false test_out 1 true 10 5 0 is a string of parameter values...
Parameter Description Example
<extension length> total fragment length minus the read length 10
<start coordinate infile> 0 or 1 depending on the whether the first position in each chromosome is denoted by 0 or 1 1
<right-end-inclusion infile> means whether the read end is included or not, 'false' for not included and 'true' for included true
<FDR cutoff> ??? .5
<numperms> ??? 10
<remove identical reads> is 'false' if no, 'true' if yes true
<sample file> ??? test_data1.txt
<control> 1) 'false', or 2) the name of the control file, or 3) the name of a file of regions to mask, in this case the name of the file must end in '_mask' false
<repeats file> 'false' if you don't want to mask repeats, otherwise is the name of a file of repeats with (tab delimited) format: chr start end (download here: cbil.upenn.edu/STAR/repeats) false
<output file name> ??? test_out
<start coordinate outfile> ??? 1
<right-end-inclusion outfile> ??? true
<window size> basepairs over which enrichment is scored (aka sliding window size) 10
<window displacement> number of base pairs the window is moved over (step size) 5
<normalize to this many reads> an integer, reads will be thrown out at random until this many reads remain, set to zero to not throw out any reads 0


INPUT FILES

  • Sample file: This is BED data from an immunoprecipitation experiment. Should have at least four tab delimited columns: chr name, start position, end position, strand.
  • Control file: This is BED data from a non-IP input (DNA purified from chromatin). Should have at least four tab delimited columns: chr name, start position, end position, strand.