Richard Brous Week 9

From OpenWetWare
Jump to navigationJump to search

Richard Brous

Electronic Lab Notebook - Analysis of Vibrio Cholerae microarray data project for Week 9

Getting Started

  • Partnered with Zeb (older db)
  • I have selected the newest Vc-Std_External10201022 gene database
    • Extracted to Desktop

Staring GenMAPP and getting organized

  • Launched GenMAPP (thank god its not looking for update server!!! =D)
  • Ensure correct gene database is loaded Vc-Std_External10201022
    • If not, load the correct one
      • Data > Choose Gene Database menu item to select the Gene Database you need
  • Select Data menu then Expression Dataset manager
    • Select new dataset which is the tab deliminated text file formatted for GenMAPP made during week 8.
    • Since the tab-deliminated file contains only data DO NOT select any column to exclude from Data Type Specification window
    • Let the Expression dataset manager convert the data
      • This may take a while so don't think system hung up... only minimal screen feedback
        • Complete conversion yields the data active in the Expression Dataset Manager window AND creates a conversion file named *.gex placed in the same location as where your text-deliminated sourcefile was.
      • Errors (almost certainly) occured where the Expression Dataset manager could not convert 1 or more lines of data
        • An exception file is created which contains all the raw data with the addition of a column called "~Error~"
          • Errors are either messages or if the program finds no errors: a single space character
        • I specifically found 121 errors of 5221 records
        • Zeb should get more errors since his gene database is older and likely more genes labeled as hypothetical.
          • Zeb found 772 errors of 5221 records
    • Customize the new Expression Dataset by creating new Color Sets
      • Color Sets contain the GenMAPP instructions for displaying data from an Expression Dataset on MAPPS.
        • Create a color set by filling in the fields:
          • Name for the Color Set: pathogenic_vs_lab
          • gene value: Avg_LogFC_all
          • Criteria that determine how a gene object is colored on the MAPP
            • increased gene expression (red)
              • [Avg_LogFC_all] > 0.25 AND [Pvalue] < 0.05
            • decreased gene expression (green)
              • [Avg_LogFC_all] < -0.25 AND [Pvalue] < 0.05
            • Expressions are equivalent to queries performed in PostgreSQL
      • After completing a new criterion, add the criterion entry (label, criterion, and color) to the Criteria List by clicking the Add button
      • 2 criterion were created: "Increased" will be [Avg_LogFC_all] > 0.25 AND [Pvalue] < 0.05 and "Decreased will be [Avg_LogFC_all] < -0.25 AND [Pvalue] < 0.05
      • Can always add more criterion by following previous steps
      • Save the entire Expression Dataset by selecting Save from the Expression Dataset menu
    • Exit Expression Dataset Manager to view the Color Sets on a MAPP and then close it
  • Keep your *.gex file safe prior to wiki upload (save to usb drive or email it to yourself)

MAPPFinder Procedure

  • Zeb and I were in front row selected: INCREASED expression
  • Launch MAPPFinder program or launch from within GenMAPP select Tools -> MAPPFinder
  • Ensure the correct Gene Database is loaded!!!
    • If not choose File -> Choose Gene Database and select the correct one.
      • Located in: C:\GenMAPP 2 Data\Gene Databases\
  • Press "Calculate New Results" button
    • Select your *.gex file you created previously.
      • RAB_10_23_2010_Merrell_Compiled_Raw_Data_Vibrio.gex
    • Click OK
    • Choose the Color Set and Criteria with which to filter the data
      • Chose Increased since that is what we are assigned
      • Check Gene Ontology and Calculate p values
      • Click the "Browse" button and create a meaningful filename for the results
      • Click "Run MAPPFinder".
        • The analysis will take several minutes.
          • It may look like the computer is stalled; but the hourglass should be on the screen indicating its working
  • When the results have been calculated, a Gene Ontology browser will open showing your results
    • All of the Gene Ontology terms that have at least 3 genes measured and a p value of less than 0.05 will be highlighted yellow.
      • A term with a p value less than 0.05 is considered a "significant" result.

Evaluating MAPPFinder results

  • List most significant Gene Ontology terms
    • Click on "Show Ranked List" from Menu Bar for list ranked by Z score and p value
      • Zeb and my Go terms are different because I am working off the newer gene database. Assumptions of GO terms associated with specific genes from his database had likely been disproven and corrected in my version of the gene database.
  • RAB Top 10 Gene Ontology terms:
  1. branched chain family amino acid metabolic process
  2. branched chain family amino acid biosynthetic process
  3. IMP biosynthetic process
  4. IMP metabolic process
  5. arginine metabolic process
  6. cellular nitrogen compound biosynthetic process
  7. leucine biosynthetic process
  8. leucine metabolic process
  9. amine biosynthetic process
  10. arginine biosynthetic process
  • ZEB Top 10 Gene Ontology terms:
  1. localization
  2. cellular biopolymer biosynthetic process
  3. biopolymer biosynthetic process
  4. celluar macromolecule biosynthetic process
  5. macromolecule biosynthetic process
  6. cellular macromolecule metabolic process
  7. macromolecule metabolic process
  8. cell projection organization
  9. biopolymer metabolic process
  10. transporter activity
  • MAPPFinder lets you find Gene Ontology (GO) terms with which a listed gene is associated.
    • First collapse the tree
    • Type the gene identifier into the gene ID search field
      • RAB - genes mentioned in Merrell et al. (2000)
        • VC0028
          • metal ion binding
          • iron-sulfur cluster binding
          • 4 iron, 4 sulfur cluster binding
          • catalytic activity
          • lyase activity
          • dihydroxy-acid dehydratase
        • VC0941
          • pyridoxal phosphate binding
          • catalytic activity
          • glycine hydroxymethyltransferase
        • VC0869
          • nucleotide binding
          • ATP binding
          • catalytic activity
          • ligase activity
          • phosphoribosilformylglycinamidine synthase activity
        • VC0051
          • nucleotide binding
          • ATP binding
          • catalytic activity
          • lyase activity
          • carboxy-lyase activity
          • phosphoribosylaminoimidazole caroxylase activity
        • VC0647
          • nucleotidyltransferase activity
          • polyribonucleotide nucleotidyltransferase activity
        • VC0468
          • metal ion binding
          • nucleotide binding
          • ATP binding
          • catalytic activity
          • ligase activity
          • glutathione synthase activity
        • VC2350
          • catalytic activity
          • lyase activity
          • deoxyribose-phosphate aldolase activity
        • VCA0583
          • outer membrane-bounded periplasmic space
      • Are they the same as your buddy who is using a different Gene Database? Why or why not? - not same as his gene database only had two of the genes present. Also where there were matches it seems my later db version had eliminated some functions originally thought to be associated with the genes.
      • ZEB - genes mentioned in Merrell et al. (2000)
        • VC0028 - not found
        • VC0941 - not found
        • VC0869 - not found
        • VC0051 - not found
        • VC0647
          • mRAN catabolic process
          • RNA processing
          • cytoplasm
          • RNA binding
          • 3' - 5' exoribonuclease activity
          • transferase activity
          • nucleotidyltransferase activity
          • polyribonucleotide nucleotidyltransferase activity
        • VC0468 - not found
        • VC2350 - not found
        • VCA0583
          • transport
          • outer membrane-bounded periplasmic space
          • transporter activity
  • Click on one of the GO terms that are associated with one of the genes you looked up in the previous step.
    • A MAPP will open listing all of the genes (as boxes) associated with that GO term. Moreover, the genes on the MAPP will be color-coded with the gene expression data from the microarray experiment.
    • List in your journal entry the name of the GO term you clicked on and whether the expression of the gene you were looking for changed significantly in the experiment.
      • Looking for VC0028
        • GO term: 4 iron, 4 sulfur cluster binding
        • ILVD_VIBCH increased expression 1.65
        • LEUC_VIBCH increased expression 0.52
        • Q9KM58_VIBCH increased expression 1.01
        • RUMB_VIBCH increased expression 0.45
        • THIC_VIBCH increased expression 1.61
          • VC0028 = ILVD_VIBCH from UnitProt db
            • pathogenic strain 1.65 increased expression compared to 1.27 lab strain
      • Links out to other db (VC0028)
  • Compare excel files: The numbers are different because of the larger sampling of genes in the recent gene database compared to the older gene database. Also different genes associated to GO terms in the recent database shows a dfferent set of top 20 GO terms. This should (and does) show different groupings of parent-child GO terms when comparing our results.
    • RAB excel file (columnI filter >= 5) (columnL filter >= 26%)
      • 339 probes met the [Avg_LogFC_all] > 0.25 AND [Pvalue] < 0.05 criteria.
      • 338 probes meeting the filter linked to a UniProt ID.
      • 219 genes meeting the criterion linked to a GO term.
      • 5221 Probes in this dataset
      • 5100 Probes linked to a UniProt ID.
      • 2475 Genes linked to a GO term.
      • The z score is based on an N of 2475 and a R of 219 distinct genes in the GO.
    • Zeb excel file
      • 339 probes met the [Avg_LogFC_all] > 0.25 AND [Pvalue] < 0.05 criteria.
      • 291 probes meeting the filter linked to a UniProt ID.
      • 184 genes meeting the criterion linked to a GO term.
      • 5221 Probes in this dataset
      • 4449 Probes linked to a UniProt ID.
      • 1990 Genes linked to a GO term.
      • The z score is based on an N of 1990 and a R of 184 distinct genes in the GO.
    • Are any of the filtered GO terms closely related to eachother (parent-child). Compare this in the MAPPFinder browser.
      • Yes there are several relationships of this type which I highlighted in the excel file.

Interpret your results

  • I was able to organize my GO terms into 3 groups:
  • As a noobish but curious biologist I look at the 3 categories and think on their function as related to pathogenic Vibrio cholera. I'll start with flagellum since it to me is obviously about bacterial locomotion to either find a new host, search for food, move to a more hospitible environment, etc. All these advantages could lead to the increased survivability of pathogenic Vibro cholerae outside of a host. IMP increases would likely mean to enable many different types of resources as food and in relation to tRNA: to enable polypeptide creation to do whatever is needed in the cell (sorry don't really know). Lastly I believe the Amino Acid GO terms specifically for Glutamine and Argenine are also used for protein synthesis for adaptability purposes.

Richard Brous 02:46, 1 November 2010 (EDT)