User:Karmella Haynes/Notebook/PcTF Genomics/2014/02/10

Pc-TF Genomics

02/10/14

ChIP-seq analysis via STAR

Significance Tester for the Accumulation of Reads (STAR) is a program used by Bramswig et al. to determine enrichments of H3K27me3, since "the broad architecture of the repressive H3K27me3 mark requires more sensitive peak-calling algorithms for precise analysis"
Paper reference for STAR is Lefterova MI, et al. Cell-specific determinants of peroxisome proliferator-activated receptor gamma function in adipocytes and macrophages. Mol Cell Biol. 2010;30(9):2078–2089.
Website for STAR is http://www.cbil.upenn.edu/STAR/
Bramswig et al. used the following setting in their STAR analysis: sliding window of 5,000 bp, step size of 1,000 bp, and a FDR=0.5%.

PROCEDURE

Download the zip file. Move unzipped STAR_1.0 folder to Applications. STAR_1.0 folder contains:
- README
- STAR.jar
- STAR.java
- test_data.txt (Note: running the program with this file name gave an error. Changing the name to test_data1.txt fixes the problem)
Mac: Run the Terminal utility. COMMAND: open -a Terminal /applications/STAR_1.0
- This opens a new Terminal window called STAR-1.0 -- bash -- [size of window]
README instructs user to try COMMAND: java -jar STAR.jar 10 1 true .5 10 true test_data1.txt false false test_out 1 true 10 5 0
- This generated two files test_out_spans and test_out_wig in the STAR_1.0 folder
Everything appears to be working

USAGE

The COMMAND java -jar STAR.jar 10 1 true .5 10 true test_data1.txt false false test_out 1 true 10 5 0 is a string of parameter values...

Parameter	Description	Example
<extension length>	total fragment length minus the read length	10
<start coordinate infile>	0 or 1 depending on the whether the first position in each chromosome is denoted by 0 or 1	1
<right-end-inclusion infile>	means whether the read end is included or not, 'false' for not included and 'true' for included	true
<FDR cutoff>	???	.5
<numperms>	???	10
<remove identical reads>	is 'false' if no, 'true' if yes	true
<sample file>	???	test_data1.txt
<control>	1) 'false', or 2) the name of the control file, or 3) the name of a file of regions to mask, in this case the name of the file must end in '_mask'	false
<repeats file>	'false' if you don't want to mask repeats, otherwise is the name of a file of repeats with (tab delimited) format: chr start end (download here: cbil.upenn.edu/STAR/repeats)	false
<output file name>	???	test_out
<start coordinate outfile>	???	1
<right-end-inclusion outfile>	???	true
<window size>	basepairs over which enrichment is scored (aka sliding window size)	10
<window displacement>	number of base pairs the window is moved over (step size)	5
<normalize to this many reads>	an integer, reads will be thrown out at random until this many reads remain, set to zero to not throw out any reads	0

INPUT FILES

Sample file: This is BED data from an immunoprecipitation experiment. Should have at least four tab delimited columns: chr name, start position, end position, strand.
Control file: This is BED data from a non-IP input (DNA purified from chromatin). Should have at least four tab delimited columns: chr name, start position, end position, strand.