Haynes:GOEnrichment: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
Line 22: Line 22:
==Tool: GOrilla==
==Tool: GOrilla==


Intro: These instructions will help you to use the Gene Ontology enRIchment anaLysis and visuaLizAtion tool (GOrilla) to search for enriched GO terms in a target list of genes compared to a background list of genes. The software searches for GO terms that are enriched in the target set compared to the background set using the standard Hyper Geometric statistics.  
Intro: These instructions will help you to use the Gene Ontology enRIchment anaLysis and visuaLizAtion tool (GOrilla) to search for enriched GO terms in a target list of genes compared to a background list of genes. The software searches for GO terms that are enriched in the target set compared to the background set using the standard Hyper Geometric statistics. Significant enrichment of a certain GO term suggests that your specific group of genes is associated with some biological process, and that this association is not just by chance.


Procedure:
# Go to http://cbl-gorilla.cs.technion.ac.il/
# Go to http://cbl-gorilla.cs.technion.ac.il/
# Set "Choose organism" to the relevant organism (e.g., Homo sapiens = human, Mus musculus = mouse)
# Set "Choose organism" to the relevant organism (e.g., Homo sapiens = human, Mus musculus = mouse)
Line 31: Line 32:
## Human genes - [http://openwetware.org/images/6/6c/GOBg_Human_092014.txt GOBg_Human_092014.txt]
## Human genes - [http://openwetware.org/images/6/6c/GOBg_Human_092014.txt GOBg_Human_092014.txt]
## Mouse genes - [http://openwetware.org/images/8/80/GOBg_Mouse_092014.txt GOBg_Mouse_092014.txt]
## Mouse genes - [http://openwetware.org/images/8/80/GOBg_Mouse_092014.txt GOBg_Mouse_092014.txt]
# Set "Choose an Ontology" to one of the three options. It is recommended that you run an analysis for each separately (do not select "All") for publishable results...
# Set "Choose an Ontology" to one of the following three options. It is recommended that you run an analysis for each separately (do not select "All") for publishable results...
## "Process" - is "Biological Process"; describes the process in which the gene product is involved
## "Process" - is "Biological Process"
## "Function" - is "Molecular Function"; describes the biochemical function of the gene product
## "Function" - is "Molecular Function"
## "Component" - is "Cellular Component"; key cellular structure(s) that contains the gene product
## "Component" - is "Cellular Component"
# Click the "Search Enriched GO Terms" button to run the analysis.
# After processing the results, use the back button on your browser and repeat the analysis with a different "Choose and Ontology" setting.
 
Results:
* The analysis outputs three important types of data:
** A GO term hierarchy tree, where GO terms are shown in boxes connected with lines. Most GO terms are specific sub-classes of parent terms.
** The color scale indicates ''P''-values. The ''P''-value represents the likelihood that the enrichment value for that GO term could be the same for a random list of genes. Therefore, the smaller the ''P'' value, the more significant the enrichment.
** A ranked table, where the GO terms with the smallest P-values are at the top. Click the "Show Genes" link to see the gene symbols that are associated with the GO term in that row.
 
There are many ways in which these results can be used in figures. The following are suggestions from Dr. Haynes
 
Bar Charts - small ''P''-values are converted into positive numbers for intuitive comparison
1. Run an analysis for "Process" and get results.
2. Open an Excel spreadsheet.
3. Make a table like the hypothetical example below.
 
{|
|-
| Target list || Go Category || Term ID || Term || P-value || No. genes || Neg Log 10
|-
| U2OS || Bio process || GO:0007186 || G-protein coupled receptor signaling pathway || 1.38E-47 || 46.86012091





Revision as of 16:38, 11 September 2014

<- Back to Protocols

Intro: Gene Ontology

So you discovered that a set of genes all become activated when you treat cells with a drug. What do the genes "do?" How will the phenotypes of the cells change as a consequence of activating these genes?

To help answer such questions, a group of scientists built a large list of standard terms to describe the functions of genes. It's very important to have a standard vocabulary, especially when many scientists are sharing information. For instance, one scientist might write about "secretion of extracellular matrix proteins" while another, who is studying the same gene reports the function as "cell surface matrix component delivery." It is important to establish which phrase is acceptable, especially when most scientists now days are working with hundreds and thousands of genes that all need to be described.

Another interesting problem...when more than one gene cooperates to control a single function, if the function has many different names, then it is hard to correctly classify the genes into a single functional group.

"The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions of gene products across databases." Read more at the Gene Ontology Consortium home page at http://geneontology.org/

The three major categories of the Gene Ontology are:

  1. "Biological Process" - describes the process in which the gene product is involved
  2. "Molecular Function" - describes the biochemical function of the gene product
  3. "Cellular Component" - key cellular structure(s) that contains the gene product


Tool: GOrilla

Intro: These instructions will help you to use the Gene Ontology enRIchment anaLysis and visuaLizAtion tool (GOrilla) to search for enriched GO terms in a target list of genes compared to a background list of genes. The software searches for GO terms that are enriched in the target set compared to the background set using the standard Hyper Geometric statistics. Significant enrichment of a certain GO term suggests that your specific group of genes is associated with some biological process, and that this association is not just by chance.

Procedure:

  1. Go to http://cbl-gorilla.cs.technion.ac.il/
  2. Set "Choose organism" to the relevant organism (e.g., Homo sapiens = human, Mus musculus = mouse)
  3. Set "Choose running mode" to "Two unranked lists of genes (target and background lists)"
  4. In the "Target Set" field, paste or upload a list of genes that you want to analyze. txt format, one gene symbol per line, is recommended for the upload option
  5. For the "Background Set," copy-paste or upload a complete list of all gene symbols for your organism. Use your own or one of the following:
    1. Human genes - GOBg_Human_092014.txt
    2. Mouse genes - GOBg_Mouse_092014.txt
  6. Set "Choose an Ontology" to one of the following three options. It is recommended that you run an analysis for each separately (do not select "All") for publishable results...
    1. "Process" - is "Biological Process"
    2. "Function" - is "Molecular Function"
    3. "Component" - is "Cellular Component"
  7. Click the "Search Enriched GO Terms" button to run the analysis.
  8. After processing the results, use the back button on your browser and repeat the analysis with a different "Choose and Ontology" setting.

Results:

  • The analysis outputs three important types of data:
    • A GO term hierarchy tree, where GO terms are shown in boxes connected with lines. Most GO terms are specific sub-classes of parent terms.
    • The color scale indicates P-values. The P-value represents the likelihood that the enrichment value for that GO term could be the same for a random list of genes. Therefore, the smaller the P value, the more significant the enrichment.
    • A ranked table, where the GO terms with the smallest P-values are at the top. Click the "Show Genes" link to see the gene symbols that are associated with the GO term in that row.

There are many ways in which these results can be used in figures. The following are suggestions from Dr. Haynes

Bar Charts - small P-values are converted into positive numbers for intuitive comparison 1. Run an analysis for "Process" and get results. 2. Open an Excel spreadsheet. 3. Make a table like the hypothetical example below.

Target list Go Category Term ID Term P-value No. genes Neg Log 10
U2OS Bio process GO:0007186 G-protein coupled receptor signaling pathway 1.38E-47 46.86012091