Haynes:ChIPDataMining1

From OpenWetWare

Revision as of 17:03, 15 April 2014 by Karmella Haynes (Talk | contribs)
(diff) ←Older revision | Current revision (diff) | Newer revision→ (diff)
Jump to: navigation, search

<- Back to Protocols

Basic View-an-Interval Approach

Use the graphical Genome Browser to visualize chromatin mapping data within the human genome sequence.

  1. Go to the UCSC Genome Browser at http://genome.ucsc.edu/cgi-bin/hgGateway.
  2. Enter a gene or region under search term. For instance, try “chr21:34398216-34401503” (these are the coordinates for the RefSeq annotation of the Polycomb-silenced OLIG2 gene). Click “submit.”
  3. A human genome browser window that shows the selected region should appear. Click the “hide all” button to hide all of the information.
  4. Under Genes and Gene Prediction Tracks, set RefSeq Genes to “full” and click the “refresh” button. The annotation of genes in the region you selected should appear in the browser window. Repeat this step for other features, if desired.
  5. To find ChIP-seq data from a publicly shared ChIP-seq experiment,
    1. Click the “track search” button. You will navigate away from the browser, but your coordinates will remain the same. In the next window, click the Advanced search tab. For the first and, set the menu to “Antibody or target protein,” and for is among, set the menu to the chromatin marks of interest. For instance, try H3K27me3 (07-449).
    2. For the second and, set the menu to “Cell, tissue, or DNA sample,” and for is among, set the menu to the chromatin marks of interest. For instance, try H1-hESC.
    3. Click the “search” button. In the list of results select one or more tracks and set each to “full.” Tracks designated as “Peaks” or “Hotspots” will show the mapping data as low-resolution, horizontal bars. Tracks designated as “Signal” will show signal intensities as high-resolution, vertical bars
    4. Click the “View in Browser” button.
  6. If you are viewing any Signal tracks and high values are cut-off by a red bar, right-click on the track in the genome browser and select configure. Select “auto-scale to data view” on the drop down list for “Data view scaling.”


Getting Fancy: Retrieve ChIP values for Many Regions at Once

Use the Table Browser to retrieve chromatin mapping data values for a subset of genomic regions (500 bp surrounding transcription start sites of twenty control genes).

  1. Go to the UCSC Table Browser at http://genome.ucsc.edu/cgi-bin/hgTables?hgsid=356574357.
  2. Set clade to mammal, genome to human, and assembly to Feb. 2009 (GRCh37/hg19).
  3. To add a track that contains the chromatin mapping data of interest,
    1. Set group to Regulation.
    2. Set track to a type of interest. For instance, try Broad Histone or UW Histone to analyze a histone mark.
    3. Set table to a specific chromatin mark. For instance "H1-hESC H3K27m3" (with Broad Histone selected as the track type) is data on the location of trimethylated histone H3 K27 (H3K27m3) marks in human embryonic stem cells (H1-hESC).
  4. On the region line, click the “define regions” button. In the text field, paste in the text below (tab-delimited), then click the “submit” button.
chr7 5569982 5570482 ACTB
chr3 52231849 52232349 ALAS1
chr15 45003435 45003935 B2M
chrX 153774983 153775483 G6PD
chr12 6643335 6643835 GAPDH
chr7 65447051 65447551 GUSB
chrX 133593925 133594425 HPRT1
chrX 77359416 77359916 PGK1
chr7 44835991 44836491 PPIA
chr6 170863171 170863671 TBP
chr1 212738426 212738926 ATF3
chr12 4382652 4383152 CCND2
chr9 21974882 21975382 CDKN2A
chr2 38303073 38303573 CYP1B1
chr4 107957203 107957703 DKK2
chr7 27224585 27225085 HOXA11
chr16 56701727 56702227 MT1G
chr20 62795577 62796077 MYT1
chr21 34397966 34398466 OLIG2
chr3 25469584 25470084 RARB

Note: You can define your own regions of interest by setting the chromosome, start position, end position, and gene name (if applicable) accordingly. Gene names are optional.

5. Set output format as “selected fields from primary and related tables.” Click the “get output” button.
6. On the proceeding “Select Fields” page, select chrom, chromStart, chromEnd, name, and signalValue. Click the “get output” button. The output will be the coordinates of signalValues (from the track, from step 3) that fall within the regions of interest (from step 4). Note: coordinates for which the chromatin-mapping signalValue is zero will be absent from the output list.

GALAXY: Intersecting Chromatin Mark Coordinates with Promoter Regions

Getting even fancier...use an automated workflow that finds all ~26,000 human "promoters," then retrieves the ChIP values at all of these regions.


Part 1: Load chromatin mark coordinates into GALAXY

  1. In GALAXY (after creating an account) select “Create New” from the gear’s drop down list (located in the top right corner).
  2. In “Get Data” select “Upload File” or “UCSC Main”.
  3. If you have selected “Upload File” make sure the file format is “interval” and that your genome matches the assembly used in ENCODE. If you have selected “UCSC Main” follow steps 1, 2, 4, and 5 from “Filtering”, however, do not include signalValue this time. Click “done with selections.” Click “send query to Galaxy.”
  4. After the data appears under “History,” edit it by clicking the pencil. Rename the data as needed and make sure Chrom column, Start Column, End column, Name/Identifier column, and Strand column are set to 1, 2, 3, 4, and 5 respectively. Select the “Datatype” tab and make sure it is set to interval. Click “Save.”


Part 2: Load promoter regions into GALAXY Here, “promoter region” is defined as the 500 bp interval centered at each annotated transcription start site from the Refseq dataset.

  1. In “Get Data” select “Upload File” and upload the RefSeq_genes.txt file (included in Supplemental Material).
  2. Repeat step 4 for the RefSeq data, but rename the data as RefSeq genes.
  3. In “Operate on Genomic Intervals” select “Get Flanks”.
  4. Set Select data as RefSeq genes, Region as around start, Location of the flanking region as both, and Offset as 0 with the length of the flanking region set to 500. (need to debug this)
  5. To edit the dataset, repeat step 4 for the new data and rename as “Promoters”.
  6. In “Operate on Genomic Intervals” select “Join” and join “Promoters” with your first uploaded data set.
  7. In “Text Manipulation” select “Cut”.
  8. Under “Cut columns:” make sure you are cutting Chrom, Start, End, Name, and Strand (columns are specified as c1, c2, and so on).
  9. To edit the dataset, repeat step 4 for the new data and rename as “Clean Promoters”.
  10. To visualize your data in “Graph/Display Data” select “Build custom track” add your first uploaded dataset, RefSeq, and Clean Promoters.
  11. Under your built custom track select “display at UCSC main” to visualize your data.
Personal tools