Haynes:ChIPDataMining1

Basic View-an-Interval Approach

Use the graphical Genome Browser to visualize chromatin mapping data within the human genome sequence.

Go to the UCSC Genome Browser at http://genome.ucsc.edu/cgi-bin/hgGateway.
Enter a gene or region under search term. For instance, try “chr21:34398216-34401503” (these are the coordinates for the RefSeq annotation of the Polycomb-silenced OLIG2 gene). Click “submit.”
A human genome browser window that shows the selected region should appear. Click the “hide all” button to hide all of the information.
Under Genes and Gene Prediction Tracks, set RefSeq Genes to “full” and click the “refresh” button. The annotation of genes in the region you selected should appear in the browser window. Repeat this step for other features, if desired.
To find ChIP-seq data from a publicly shared ChIP-seq experiment,
1. Click the “track search” button. You will navigate away from the browser, but your coordinates will remain the same. In the next window, click the Advanced search tab. For the first and, set the menu to “Antibody or target protein,” and for is among, set the menu to the chromatin marks of interest. For instance, try H3K27me3 (07-449).
2. For the second and, set the menu to “Cell, tissue, or DNA sample,” and for is among, set the menu to the chromatin marks of interest. For instance, try H1-hESC.
3. Click the “search” button. In the list of results select one or more tracks and set each to “full.” Tracks designated as “Peaks” or “Hotspots” will show the mapping data as low-resolution, horizontal bars. Tracks designated as “Signal” will show signal intensities as high-resolution, vertical bars
4. Click the “View in Browser” button.
If you are viewing any Signal tracks and high values are cut-off by a red bar, right-click on the track in the genome browser and select configure. Select “auto-scale to data view” on the drop down list for “Data view scaling.”

Getting Fancy: Retrieve ChIP values for Many Regions at Once

Use the Table Browser to retrieve chromatin mapping data values for a subset of genomic regions (500 bp surrounding transcription start sites of twenty control genes).

Go to the UCSC Table Browser at http://genome.ucsc.edu/cgi-bin/hgTables?hgsid=356574357.
Set clade to mammal, genome to human, and assembly to Feb. 2009 (GRCh37/hg19).
To add a track that contains the chromatin mapping data of interest,
1. Set group to Regulation.
2. Set track to a type of interest. For instance, try Broad Histone or UW Histone to analyze a histone mark.
3. Set table to a specific chromatin mark. For instance "H1-hESC H3K27m3" (with Broad Histone selected as the track type) is data on the location of trimethylated histone H3 K27 (H3K27m3) marks in human embryonic stem cells (H1-hESC).
On the region line, click the “define regions” button. In the text field, paste in the text below (tab-delimited), then click the “submit” button.

chr7	5569982	5570482	ACTB
chr3	52231849	52232349	ALAS1
chr15	45003435	45003935	B2M
chrX	153774983	153775483	G6PD
chr12	6643335	6643835	GAPDH
chr7	65447051	65447551	GUSB
chrX	133593925	133594425	HPRT1
chrX	77359416	77359916	PGK1
chr7	44835991	44836491	PPIA
chr6	170863171	170863671	TBP
chr1	212738426	212738926	ATF3
chr12	4382652	4383152	CCND2
chr9	21974882	21975382	CDKN2A
chr2	38303073	38303573	CYP1B1
chr4	107957203	107957703	DKK2
chr7	27224585	27225085	HOXA11
chr16	56701727	56702227	MT1G
chr20	62795577	62796077	MYT1
chr21	34397966	34398466	OLIG2
chr3	25469584	25470084	RARB

Note: You can define your own regions of interest by setting the chromosome, start position, end position, and gene name (if applicable) accordingly. Gene names are optional.

5. Set output format as “selected fields from primary and related tables.” Click the “get output” button.
6. On the proceeding “Select Fields” page, select chrom, chromStart, chromEnd, name, and signalValue. Click the “get output” button. The output will be the coordinates of signalValues (from the track, from step 3) that fall within the regions of interest (from step 4). Note: coordinates for which the chromatin-mapping signalValue is zero will be absent from the output list.

GALAXY: Intersecting Chromatin Mark Coordinates with Promoter Regions

Getting even fancier...use an automated workflow that finds all ~26,000 human "promoters," then retrieves the ChIP values at all of these regions.

Part 1: Load chromatin mark coordinates into GALAXY

In GALAXY (after creating an account) select “Create New” from the gear’s drop down list (located in the top right corner).
In “Get Data” select “Upload File” or “UCSC Main”.
If you have selected “Upload File” make sure the file format is “interval” and that your genome matches the assembly used in ENCODE. If you have selected “UCSC Main” follow steps 1, 2, 4, and 5 from “Filtering”, however, do not include signalValue this time. Click “done with selections.” Click “send query to Galaxy.”
After the data appears under “History,” edit it by clicking the pencil. Rename the data as needed and make sure Chrom column, Start Column, End column, Name/Identifier column, and Strand column are set to 1, 2, 3, 4, and 5 respectively. Select the “Datatype” tab and make sure it is set to interval. Click “Save.”

Part 2: Load promoter regions into GALAXY Here, “promoter region” is defined as the 500 bp interval centered at each annotated transcription start site from the Refseq dataset.

In “Get Data” select “Upload File” and upload the RefSeq_genes.txt file (included in Supplemental Material).
Repeat step 4 for the RefSeq data, but rename the data as RefSeq genes.
In “Operate on Genomic Intervals” select “Get Flanks”.
Set Select data as RefSeq genes, Region as around start, Location of the flanking region as both, and Offset as 0 with the length of the flanking region set to 500. (need to debug this)
To edit the dataset, repeat step 4 for the new data and rename as “Promoters”.
In “Operate on Genomic Intervals” select “Join” and join “Promoters” with your first uploaded data set.
In “Text Manipulation” select “Cut”.
Under “Cut columns:” make sure you are cutting Chrom, Start, End, Name, and Strand (columns are specified as c1, c2, and so on).
To edit the dataset, repeat step 4 for the new data and rename as “Clean Promoters”.
To visualize your data in “Graph/Display Data” select “Build custom track” add your first uploaded dataset, RefSeq, and Clean Promoters.
Under your built custom track select “display at UCSC main” to visualize your data.

Haynes:ChIPDataMining1

Basic View-an-Interval Approach

Getting Fancy: Retrieve ChIP values for Many Regions at Once

GALAXY: Intersecting Chromatin Mark Coordinates with Promoter Regions

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools