02/10/14
ChIP-seq analysis via STAR
- Significance Tester for the Accumulation of Reads (STAR) is a program used by Bramswig et al. to determine enrichments of H3K27me3, since "the broad architecture of the repressive H3K27me3 mark requires more sensitive peak-calling algorithms for precise analysis"
- Paper reference for STAR is Lefterova MI, et al. Cell-specific determinants of peroxisome proliferator-activated receptor gamma function in adipocytes and macrophages. Mol Cell Biol. 2010;30(9):2078–2089.
- Website for STAR is http://www.cbil.upenn.edu/STAR/
- Bramswig et al. used the following setting in their STAR analysis: sliding window of 5,000 bp, step size of 1,000 bp, and a FDR=0.5%.
PROCEDURE
- Download the zip file. Move unzipped STAR_1.0 folder to Applications. STAR_1.0 folder contains:
- README
- STAR.jar
- STAR.java
- test_data.txt (Note: running the program with this file name gave an error. Changing the name to test_data1.txt fixes the problem)
- Mac: Run the Terminal utility. COMMAND: open -a Terminal /applications/STAR_1.0
- This opens a new Terminal window called STAR-1.0 -- bash -- [size of window]
- README instructs user to try COMMAND: java -jar STAR.jar 10 1 true .5 10 true test_data1.txt false false test_out 1 true 10 5 0
- Everything appears to be working
USAGE
- The COMMAND java -jar STAR.jar 10 1 true .5 10 true test_data1.txt false false test_out 1 true 10 5 0 is a string of parameter values...
Parameter |
Description |
Example
|
<extension length> |
total fragment length minus the read length |
10
|
<start coordinate infile> |
0 or 1 depending on the whether the first position in each chromosome is denoted by 0 or 1 |
1
|
<right-end-inclusion infile> |
means whether the read end is included or not, 'false' for not included and 'true' for included |
true
|
<FDR cutoff> |
??? |
.5
|
<numperms> |
??? |
10
|
<remove identical reads> |
is 'false' if no, 'true' if yes |
true
|
<sample file> |
??? |
test_data1.txt
|
<control> |
1) 'false', or 2) the name of the control file, or 3) the name of a file of regions to mask, in this case the name of the file must end in '_mask' |
false
|
<repeats file> |
'false' if you don't want to mask repeats, otherwise is the name of a file of repeats with (tab delimited) format: chr start end (download here: cbil.upenn.edu/STAR/repeats) |
false
|
<output file name> |
??? |
test_out
|
<start coordinate outfile> |
??? |
1
|
<right-end-inclusion outfile> |
??? |
true
|
<window size> |
basepairs over which enrichment is scored (aka sliding window size) |
10
|
<window displacement> |
number of base pairs the window is moved over (step size) |
5
|
<normalize to this many reads> |
an integer, reads will be thrown out at random until this many reads remain, set to zero to not throw out any reads |
0
|
INPUT FILES
- Sample file: This is BED data from an immunoprecipitation experiment. Should have at least four tab delimited columns: chr name, start position, end position, strand.
- Control file: This is BED data from a non-IP input (DNA purified from chromatin). Should have at least four tab delimited columns: chr name, start position, end position, strand.
|