Preliminary Analysis: Using the UCSC Genome Browser and ENCODE Data
- The first part of using bioinformatics in this project will be using the genome browser to visualize chromatin mapping data within the human genome sequence. The following protocol was followed as an example to find the annotated chromatin marks of H3K27me3 in human embryonic stem cells.
- Go to the UCSC Genome Browser at http://genome.ucsc.edu/cgi-bin/hgGateway.
- Enter a gene or region under search term. For instance, try “chr21:34398216-34401503” (these are the coordinates for the RefSeq annotation of the Polycomb-silenced OLIG2 gene). Click “submit.”
- A human genome browser window that shows the selected region should appear. Click the “hide all” button to hide all of the information.
- Under Genes and Gene Prediction Tracks, set RefSeq Genes to “full” and click the “refresh” button. The annotation of genes in the region you selected should appear in the browser window. Repeat this step for other features, if desired.
- To find ChIP-seq data from a publicly shared ChIP-seq experiment,
- Click the “track search” button. You will navigate away from the browser, but your coordinates will remain the same. In the next window, click the Advanced search tab. For the first and, set the menu to “Antibody or target protein”; for is among, set the menu to the chromatin mark(s) of interest. For instance, try H3K27me3 (07-449).
- For the second and, set the menu to “Cell, tissue, or DNA sample”; for is among, set the menu to the cell type of interest. For instance, try H1-hESC (human embryonic stem cells). For detailed information on cell types, click the “Cell, tissue, or DNA sample” link.
- Click the “search” button. In the list of results select one or more tracks and set each to “full.” Tracks designated as “Peaks” or “Hotspots” will show the mapping data as low-resolution, horizontal bars. Tracks designated as “Signal” will show signal intensities as high-resolution, vertical bars.
- Click the “View in Browser” button.
- If you are viewing Signal tracks where high values are cut-off by a red bar, right-click on the track in the genome browser and select configure. Select “auto-scale to data view” on the drop down list titled “Data view scaling.”
- There were two tracks that resulted from the search for H3K27me3, and the view in the genome browser when the two tracks were viewed in "full" should look something like the following figure.
- The next step in becoming familiar with the genome browser was to using the Table Browser to retrieve chromatin mapping data values for a preliminary subset of genomic regions, enabling the ability to quickly collect ChIP enrichment values from several regions, instead of recording this information manually with the Genome Browser. The following protocol was followed.
- Go to the UCSC Table Browser at http://genome.ucsc.edu/cgi-bin/hgTables?hgsid=356574357.
- Set clade to mammal, genome to human, and assembly to Feb. 2009 (GRCh37/hg19).
- To add a track that contains the chromatin mapping data of interest, set group to Regulation, and track to a type of interest. For instance, try Broad Histone or UW Histone to analyze a histone mark.
- On the region line, click the “define regions” button. In the text window, paste in the text below (tab-delimited), then click the “submit” button.
chr7 5569982 5570482 ACTB
chr3 52231849 52232349 ALAS1
chr15 45003435 45003935 B2M
chrX 153774983 153775483 G6PD
chr12 6643335 6643835 GAPDH
chr7 65447051 65447551 GUSB
chrX 133593925 133594425 HPRT1
chrX 77359416 77359916 PGK1
chr7 44835991 44836491 PPIA
chr6 170863171 170863671 TBP
chr1 212738426 212738926 ATF3
chr12 4382652 4383152 CCND2
chr9 21974882 21975382 CDKN2A
chr2 38303073 38303573 CYP1B1
chr4 107957203 107957703 DKK2
chr7 27224585 27225085 HOXA11
chr16 56701727 56702227 MT1G
chr20 62795577 62796077 MYT1
chr21 34397966 34398466 OLIG2
chr3 25469584 25470084 RARB
- Set output format as “selected fields from primary and related tables.” Click the get output button.
- On the proceeding “Select Fields” page, select chrom, chromStart, chromEnd, name, and signalValue. Click the get output button. The output will be the coordinates of signalValues (from the track, from step 3) that fall within the regions of interest (from step 4). Note: coordinates for which the chromatin-mapping signalValue is zero will be absent from the output list.
- The obtained table should look something like this:
chrom chromStart chromEnd name signalValue
chr15 44997836 45024078 . 2.89004
chr15 45002305 45003965 . 12.4601
chr15 45003383 45003505 . 12.452
chr12 6643737 6645039 . 3.11452
chrX 77356139 77365636 . 1.90484
chr12 4369203 4402228 . 2.29872
chr2 38303256 38304176 . 16.0045
chr16 56252416 58646068 . 1.57571