Dahlquist:GEO Submission 2016
This page will document the preparation of microarray data for submission to the NCBI GEO database.
- The instruction page is found here: http://www.ncbi.nlm.nih.gov/geo/info/submission.html
- The in progress Excel file is in the DahlquistLab GitHub repository here.
Tasks
Submitted about an hour ago. — Kam D. Dahlquist 19:52, 22 June 2016 (EDT)
Here is the list of tasks that I think I need to complete for my submission.
- Series--done
- Samples are done
- Protocols--done
- Platforms
- According to the GEO support e-mail, I need to create a new platform that merges the GCAT and Ontario chips and that corresponds to the IDs in my processed data.
- Draft is done; need to review.
- Processed data
- Verify workflow and documentation on OWW--done
- Copy over the rounded, normalized data to the Matrix worksheet--done
- NA values should either be left blank or labeled as "null"--left blank, done
- Raw data
- Collect and rename .gpr files--done
- Changed GCAT and Ontario Targets files and re-ran normalization with all new names. Verified that results were the same, so file renaming is done.
- Note that the name of the GCAT and Ontario Targets files is hardcoded into the normalization script. I modified a local copy of the script, but I will need to make sure that I update the version on GitHub, along with the documentation of the workflow.--done
- Collect and rename .tif files for submission to LMU Digital Commons
- Collect and rename .gpr files--done
2016-06-08
- Downloaded GA_merged_dye_swap.xls and GA_merged_dye_swap_w_platf.xls Excel template files from GEO.
- Sent e-mail question to geo@ncbi.nlm.nih.gov:
- Lines 13 and 14 of the "Metadata Template" sheet from either the "GA_merged_dye_swap.xls" or "GA_merged_dye_swap_w_platf.xls" file seem to be the same. They are both labeled "summary" and have the same information in the popup. Is this a mistake in the template or should I fill out both fields? Should I put identical information in both fields?
- Received the following reply:
- You are correct, the "summary" field is repeated in the Metadata templates. It is not a mistake; it is intended to show that the 'summary' can hold very large volumes of text and can be divided into paragraphs (one paragraph per line). You need to add text to only one of the "summary" lines. You can leave the other "summary" line blank.
- Verified using the match function in Excel that the IDS matched between the GPL17965platform and my data.
- This platform has all the spots on the array and gives the block, column, and row designation of each ID for 13056 spots.
- Verified using match function in Excel that the IDs matched between the GPL13945 platform and my processed data for all 6189 genes.
- This platform just has the list of 6189 unique genes plus the 3XSSC and Arabidopsis control spots for a total of 6191. It gives the systematic name, the standard name, and the GO component, process, and function terms.
- GCAT information is found here: http://bio.davidson.edu/gcat/GCATprotocols.html#yeast
- GCAT chips used in study were received Fall 2005.
- 2005/2006 Yeast Chips produced at Washington Univ. (St. Louis); 70mer oligos printed on epoxy slides.
- Looked at GPL7882, but it is not the right one (list of IDs do not match gal file).
- Looks like Laura Hoopes submitted a platform for this here: GPL5947
- I used Excel match to confirm the ID's matched the gal file (filename: GPL4947_platform_verification_20160608.xlsx). All the IDs present in the platform are present in the gal file, but some of them are in a different order.
- I have written to the GEO help to clarify which platform IDs to use for the Ontario chips and to point out that the GCAT platform will not exactly match my data because of the normalization/processing merge of GCAT and Ontario data.
2016-06-10
- E-mail response from GEO
- My question: Which platform should I use for my data? I will be providing raw GenePix gpr files that are better described by the first platform, but my processed data merges duplicates and is potentially better represented by the second platform.
- The response: the Platform data table should align with the normalized/processed data table. Please note that the Platform IDs and the processed data ID_REFs must match.
- My question: The third ID (GPL5947) matches the other chip I used in my experiment. However, my processed data will not match the list of IDs provided because when I merged the two chip types in my processed dataset, I removed some IDs from the second platform that were not present in the first and included some IDs from the first platform that were not present in the second.
- The response: Please create a new Platform to represent your processed data. In the 'hyb protocol' and 'data processing' fields of the Metadata sheet, please be sure to describe the steps that were performed to generate the processed data.
- The "Generic single channel submission template, including Platform" or "Generic dual channel submission template, including Platform" may be suitable for your study; these templates can be downloaded from this web page: http://www.ncbi.nlm.nih.gov/geo/info/spreadsheet.html#GAtemplates
- My question: Which platform should I use for my data? I will be providing raw GenePix gpr files that are better described by the first platform, but my processed data merges duplicates and is potentially better represented by the second platform.
- Table of number of replicates per strain per timepoint with the number of dye swaps in parentheses.
Total Number of Replicates (Number Swapped) | t15/t0 | t30/t0 | t60/t0 | t90/t0 | t120/t0 | Strain Total |
wt | 4 (2) | 5 (2) | 4 (1) | 5 (2) | 5 (2) | 23 |
dCIN5 | 4 (2) | 4 (2) | 4 (2) | 4 (2) | 4 (2) | 20 |
dGLN3 | 4 (2) | 4 (2) | 4 (2) | 4 (2) | 4 (2) | 20 |
dHAP4 | 4 (2) | 4 (2) | 4 (2) | 3 (2) | 3 (2) | 18 |
dHMO1 | 4 (2) | 4 (2) | 4 (2) | 4 (2) | 4 (2) | 20 |
dSWI4 | 4 (2) | 4 (2) | 4 (2) | 4 (2) | 16 | |
dZAP1 | 4 (2) | 4 (2) | 4 (2) | 4 (2) | 4 (2) | 20 |
Grand total | 137 |
Field Entries
SERIES
SERIES #This section describes the overall experiment
- title
- Global transcriptional response of wild type and transcription factor deletion strains of Saccharomyces cerevisiae to the environmental stress of cold shock and subsequent recovery
- summary #Thorough description of the goals and objectives from this study.
- Previous studies on the global transcriptional response of budding yeast, Saccharomyces cerevisiae, to cold shock have revealed that the response can be divided into a set of early response genes (after 15 minutes to 2 hours of cold temperatures) and late response genes (after 12 to 60 hours of cold temperatures). The late response genes include the ESR genes induced by many environmental stresses and are regulated by the Msn2 and Msn4 transcription factors, as they are during other environmental stresses (Kandror et al. 2004 PMID:15053871; Schade et al. 2004 PMID:15483057). However, the transcription factors responsible for the induction of the early response genes, the overall regulatory mechanism governing this early response, and the transcriptional response to recovery after cold shock remain largely unknown. Thus, we measured the early transcriptional response of S. cerevisiae to cold shock and subsequent recovery using DNA microarrays. To determine which transcription factors were responsible for these changes in expression, the same cold shock and recovery microarray experiments were then performed on six strains individually deleted for the transcription factors Cin5, Gln3, Hap4, Hmo1, Swi4, and Zap1.
- overall design #Indicate how many Samples are analyzed, if replicates are included, are there control and/or reference Samples, etc.
- Yeast cells were grown to early log phase at 30°C, then shifted to 13°C for 60 minutes (cold shock), and then shifted back to 30°C for another 60 minutes (recovery). Samples from independently grown replicate cultures (flasks) were collected before cold shock (t0), after 15 (t15), 30 (t30), and 60 minutes (t60) of cold shock, and after 60 minutes of cold shock followed by 30 (t90) and 60 minutes (t120) of recovery at 30°C. Data was collected for the following strains: wild type (BY4741 or BY4739), BY4741 Δcin5, BY4741 Δgln3, BY4741 Δhap4, BY4741 Δhmo1, BY4741 Δswi4, and BY4741 Δzap1. Generally, 4 independent biological replicates were performed for each strain (independent culture flasks), although for some timepoints and strains there are 3 or 5 replicates instead. Note that the t15 sample was not collected for the BY4741 Δswi4 strain. The sample from each experimental timepoint (t15, t30, t60, t90, and t120) was labeled with Cy5, and the t0 sample (before cold shock control) was labeled with Cy3. Experimental and control samples from the same flask were co-hybridized onto the same microarray slide. The dye orientation was swapped for two of the biological replicates per timepoint per strain (experimental sample labeled with Cy3 and control sample labeled with Cy5), except in one instance when there was only one dye swap performed for a particular strain and timepoint. Technical replicates were not performed. There were a total of 137 microarrays hybridized in this study.
- contributor #Firstname,Initial,Lastname, each contributor on a separate line.
- Kam,D,Dahlquist
- Hassan,Abdulla
- Ashley,J,Arnell
- Cybele,Arsan
- Jessa,M,Baker
- Ross,M,Carson
- Wesley,T,Citti
- Steven,E,De Las Casas
- Lauren,G,Ellis
- Kevin,C,Entzminger
- Stephanie,D,Entzminger
- Ben,G,Fitzpatrick
- Samantha,P,Flores
- Nicolette,S,Harmon
- Kathleen,P,Hennessy
- Andrew,F,Herman
- Monica,V,Hong
- Heather,L,King
- Lauren,N,Kubeck
- Okensama,M,La-Anyane
- Douglas,L,Land
- Michael,J,Leon Guerrero
- Elizabeth,M,Liu
- Minh,D,Luu
- Kevin,P,McGee
- Matthew,R,Mejia
- Sheila,N,Melone
- Nicole,T M,Pepe
- Kenny,R,Rodriguez
- Nicholas,A,Rohacz
- Robert,J,Rovetti
- Olivia,S,Sakhon
- Jorrel,T,Sampana
- Katrina,Sherbina
- Laura,H,Terada
- Alondra,J,Vega
- Anthony,J,Wavrin
- Kevin,W,Wyllie
- Brianne,B,Zapata
SAMPLES
SAMPLES # The Sample names in the first column are arbitrary but they must match the column headers of the matrix table. If you have more than one raw data file for each Sample, include additional "raw data file" columns.
PROTOCOLS
PROTOCOLS #This section includes protocols and fields which are common to all Samples. Protocols which are applicable to specific Samples or specific channels should be included in additional columns of the SAMPLES section instead.
- growth protocol: Describe the conditions that were used to grow or maintain organisms or cells prior to extract preparation.
- 5 mL liquid overnight cultures of the appropriate strain of S. cerevisiae were inoculated with a single colony into YEPD (wild type strain) or YEPD + 200 μg/mL G418 (gene deletion strains) medium and incubated overnight at 30°C, shaking at 250 rpm. The next morning, these cultures were used to inoculate larger 750 mL cultures by adding a 1/500th volume of the overnight culture to a 2800 mL Fernbach flask containing YEPD (wild type strain) or YEPD 200 μg/mL G418 (gene deletion strains) medium. Two independent biological replicate experiments (flasks) of the same strain were carried out on the same day. Growth was monitored by reading the OD600 of the cultures at appropriate time intervals.
- treatment protocol: Describe the treatments applied to the biological material prior to extract preparation.
- Once the culture reached an OD600 of >= 0.2 at 30°C, the cold shock and recovery experiment was begun. Three 40 mL samples were collected at time zero (t0) before cold shock, and the flask was shifted to a 13°C water bath incubator/shaker, shaking at 250 rpm. Three 40 mL samples each were collected after 15 minutes (t15), 30 minutes (t30), and 60 minutes (t60) of cold shock at 13°C. The flask was then shifted to a 30°C water bath and manually swirled for 5 minutes, before being dried off and returned to the 30°C incubator/shaker at 250 rpm. Three 40 mL samples were collected after a total of 30 minutes at 30°C (t90) and 60 minutes at 30°C (t120). Immediately after collection, the samples were spun at 4000 rpm in a table-top centrifuge for 3 minutes at room temperature. The supernatant was discarded and the pellets were immediately frozen in a dry ice/ethanol bath and stored at -80°C.
- extract protocol: Describe the protocol used to isolate the extract material.
- Total RNA was extracted from a single cell pellet from each timepoint using the Ambion (now LifeTechnologies) RiboPure Yeast Kit, catalog #AM1926, according to the manufacturer's instructions. The DNase treatment from the kit was substituted with the Turbo DNA-free Kit (Ambion/LifeTechnologies catalog #AM1907), using the "rigorous DNAase treatment" option of the manufacturer's protocol. Quality of the total RNA was checked by UV spectroscopy using Bio-Rad trUView Cuvettes and SmartSpec 3000 or on an Implen Nanophotometer and by non-denaturing agarose gel electrophoresis (Lonza Reliant Precast RNA Gel, catalog #54948). Total RNA was stored at -80°C.
- label protocol: Describe the protocol used to label the extract.
- 1 μg aliquots of total RNA were converted to aRNA using the Amino Allyl MessageAmp II aRNA Amplification Kit (Ambion/LifeTechnologies catalog #AM1753), according to the manufacturer's instructions. Five aliquots of each t0 sample were amplified and then mixed together and one aliquot of each of the other timepoints was amplified. 20 μg aliquots of aRNA (5 aliquots of the mixed t0, one aliquot each for the other timepoints) were then labeled with either Cy3 or Cy5 (CyDye Post-Labeling Reactive Dye Packs, GE Healthcare Life Sciences catalog #RPN5661), using the Amino Allyl MessageAmp II aRNA Amplification Kit, according to the dye orientation chosen for the particular replicate experiment (see SAMPLES table). Labeling efficiency was checked by spectroscopy using Bio-Rad trUView Cuvettes and SmartSpec 3000 or on an Implen Nanophotometer. The t0 labeled samples were mixed together and 5 μg aliquots of the labeled, mixed t0 sample were then combined with 5 μg aliquots of labeled aRNA from each of the other timepoints for that replicate flask. The combined aRNA samples were then fragmented using Ambion (now LifeTechnologies) RNA Fragmentation Reagents (catalog #AM8740) according to the manufacturer's instructions and dried in a speedvac to <= 5 μL. A complete protocol is found here: http://www.openwetware.org/wiki/Dahlquist:DNA_Microarray_Protocol.
- hyb protocol: Describe the protocol used for hybridization, blocking and washing, and any post-processing steps such as staining.
- 85 μL of hybridization master mix was added to each labeled, paired, and dried aRNA sample. The hybridization master mix contained 100 μL DIG Easy Hyb solution (Roche Applied Science, catalog #11796895001), 0.38 μL of 1 μg/μL oligo dA (10-20mer) solution (Invitrogen, catalog #POLYA.GF) , and 5 μL of 10 mg/mL DNA, MB-grade from fish sperm (Roche Applied Science, catalog #11467140001) and was heated to 65-70°C for two minutes and cooled to room temperature before adding 85 μL of it to the labeled aRNA. After the hybridization master mix was added to the aRNA, it was heated to 65-70°C for two minutes and cooled to room temperature for two minutes before being applied to the microarray chip. Hybri-slips (Sigma-Aldrich, catalog #H0784) cover slips were used and the chip hybridized in a Corning Hybridization Chamber (catalog #2551) with 2 X 11 μL nuclease-free water inside the wells of the chamber. Microarrays were incubated in a 37°C water bath for 15-16 hours. Cover slips were removed from the microarray by dunking in 1X SSC buffer at room temperature. Slides were immediately transferred to a 50 mL conical tube containing 1X SSC/1% SDS buffer and incubated in a 50°C water bath for 15 minutes; slides were transferred and incubated twice more in 1X SSC/1% SDS at 50°C for 15 minutes each time. Slides were then washed by plunging up and down 4-6 times in 1X SSC at room temperature and plunging up and down 4-6 times in 0.1X SSC at room temperature. Slides were pulled out slowly from this final wash and any remaining water droplets were chased off with a compressed air duster. Slides were optionally dipped in a slide coating solution of 2% PEG (MW cut-off 2000) in 1:1 solution of acetone and toluene (see "description" column of SAMPLES table) before scanning. A complete protocol is found here: http://www.openwetware.org/wiki/Dahlquist:DNA_Microarray_Protocol
- scan protocol: Describe the scanning and image acquisition protocols, hardware, and software.
- Slides were scanned with an Axon GenePix 4000B Scanner (Molecular Devices). Preview scans were performed to manually or automatically set the PMT gain for the 635 wavelength channel and for the 532 wavelength channel; power was at 100% for both channels. For auto-PMT scans, the saturation tolerance was set to the default value of 0.05%. For the data scan, pixel size was set to 10 μm, lines to average was set to 2, and focus position was set to 0 μm. The GenePix Pro 6.1 software (v6.1.0.2) was used with the AxGenePixDemo.dll (v2.1.1.39) driver. When detecting features, the following settings were used: "Find irregular features", "Resize features during alignment to minimum diameter of 33% and maximum diameter of 200%", "Flag features that fail background threshold criteria as Not Found", "Estimate warping and rotation when finding blocks", and "Automatic Image Registration: Max translation of 10 pixels". Note that two chip types were used in this study: "GCAT" and "Ontario" (see SAMPLES table). The GCAT chips had two sets of blocks representing the full genome of S. cerevisiae at the top and bottom of the chip. The .gal file provided for this chip only had information for one set of blocks, so each of the GCAT chips had its features found twice, once for the top and once for the bottom, generating two .gpr results files (see SAMPLES table). The GCAT .gal file can be found here: https://github.com/kdahlquist/DahlquistLab/raw/master/GEO_submission/yeast_GCAT_gal.gal. The Ontario chip had a single .gal file for the entire chip; duplicate spots for genes were placed side-by-side. The Ontario .gal file can be found here: https://github.com/kdahlquist/DahlquistLab/raw/master/GEO_submission/Ontario_Y6.4Kv7.GAL. Complete details are found here: http://www.openwetware.org/wiki/Dahlquist:GenePix_Pro_Software_Protocol.
- data processing: Provide details of how data in the matrix table were generated and calculated, i.e., normalization method, data selection procedures and parameters, transformation algorithm, etc.
- A full description of the data processing is found here: http://www.openwetware.org/wiki/Dahlquist:Microarray_Data_Analysis_Workflow. Briefly, an R script was written to perform the data processing steps. Follow the links to download the script (https://github.com/kdahlquist/DahlquistLab/raw/master/normalization/GCAT-and-Ontario_normalization_script.R) and required GCAT (https://github.com/kdahlquist/DahlquistLab/raw/master/normalization/GCAT_Targets_20160616.csv) and Ontario (https://github.com/kdahlquist/DahlquistLab/raw/master/normalization/Ontario_Targets_wt-dCIN5-dGLN3-dHAP4-dHMO1-dSWI4-dZAP1_20160616.csv) "Targets" files to run it. Put the script, Targets files, and all .gpr files in the same folder to run. A script to automatically generate MA and box plots showing the results of the normalization can be downloaded from here (https://github.com/kdahlquist/DahlquistLab/raw/master/normalization/generate_MA_and_box_plots.R) . The results reported here were produced with Version x64 3.1.0 of R. Version 3.20.1 of the limma package (Smyth, G. K., 2005, Limma: linear models for microarray data. In Bioinformatics and computational biology solutions using R and Bioconductor, pp. 397-420, Springer New York.) was used to perform within-chip normalization. Briefly the script extracts the "Log Ratio (635/532)" values (column AS in Excel) of the .gpr file for each spot and removes bad spots (flagged as -50 in the .gpr file), and the control spots labeled "Arabidopsis" and "3XSSC". It performs the loess normalization using the "normexp" method. It performs between-chip normalization using the median absolute deviation (MAD). Ratios of the dye-swapped samples were corrected so that the ratio reported is experimental timepoint/t0 control. Then duplicate spots corresponding to the same gene are averaged and the data from the two chips are merged. The log2 ratios (experimental/control) are rounded to four decimal places. The processed data file only reports genes that were present on the Ontario chip type. Genes that were only found on the GCAT chip type and not the Ontario chip type were discarded because there was not enough replicate data for those genes to make downstream analysis meaningful.
- value definition: Provide a short description of the values in the matrix table, for example: loess normalized log2 ratio (test/reference).
- loess normalized log2 ratio (experimental/control), an average of duplicate spots on the chip
PLATFORM
PLATFORM # If your array is not deposited in GEO, please include platform annotation columns in your matrix table (see example) and complete the fields below.
- title: Provide a unique title that describes your Platform. We suggest that you use the convention [institution/lab]-[species]-[number of features]-[version], e.g. FHCRC Mouse 15K v1.0.
- Merge of GCAT: WashU Operon Array-Ready Yeast Oligo set (Yeast_V1.1.2) and Ontario: UHNMAC Yeast 6.4 K array (Y6.4K) version 7 chips
- technology: Select the category that best describes the Platform technology: spotted DNA/cDNA, spotted oligonucleotide, in situ oligonucleotide, antibody, tissue, SARST, RT-PCR, or MPSS
- GCAT: spotted oligonucleotide, Ontario: spotted DNA/cDNA
- distribution: Microarrays are 'commercial', 'non-commercial', or 'custom-commercial' in accordance with how the array was manufactured.
- non-commercial
- organism: Identify the organism(s) from which the features on the Platform were designed or derived.
- Saccharomyces cerevisiae
- manufacturer: Provide the name of the company, facility or laboratory where the array was manufactured or produced.
- GCAT: Washington University, Ontario: Microarray Centre - University Health Network - Toronto - Canada
- manufacture protocol: Describe the array manufacture protocol. Include as much detail as possible, e.g., clone/primer set identification and preparation, strandedness/length, arrayer hardware/software, spotting protocols.
- manufacture protocol: GCAT: Operon Array-Ready Yeast Oligo set was printed on epoxy slides according to the yeast gal file for 2005-6 on the GCAT website at: http://www.bio.davidson.edu/projects/GCAT/GCATprotocols.html#yeast.
- manufacture protocol: Ontario: Microarrays are printed on UltraGAPS slides (Corning Inc.) using high-precision robotics. Printed arrays are processed following Corning's protocol and are ready-to-use.
- manufacture protocol: Ontario: 1. Acquisition of bacterial clone set — our Yeast 6.4k clone set is derived from Research Genetics Genepairs. Detailed information about the ORFs and sequences can be found at http://www.yeastgenome.org/
- manufacture protocol: Ontario: 2. Clone set is PCR amplified and DNA purified in working 96 well plates.
- manufacture protocol: Ontario: 3. The purified DNA is then transferred to 384 well source plates with DNA concentration ranging from 0.10ug/ul to 0.25ug/ul. These are used for spotting.
- manufacture protocol: Ontario: 4. Microarrays are printed on UltraGAPS ™ slides (Corning Inc.) using high-precision robotics (up to 48 pins or print heads). Printed arrays are processed following Corning's protocol and are ready-to-use. Up to 120 slides are printed in each batch and batches that result in greater than 95% printing fidelity are selected for post printing quality control tests (measuring spot morphology, quality of background and signal intensity in a real hybridization using yeast total RNA, vector probe labeling method and terminal transferase tailing method).
- description
- GCAT: Operon Array-Ready Yeast Oligo set was printed on epoxy slides, twice, once on the top and once on the bottom of the slide resulting in two sets of grids on the top and bottom of the slide. For each, grid rows: 4; grid columns: 4; rows: 21 ;columns: 22; 6772 unique genes and 620 empty spots.
- Ontario: spotted DNA/cDNA (double-spotted array containing 6240 yeast ESTs derived from PCR amplification of synthetic EST clones, plus 320 control spots of Arabidopsis totalling 6.4K). Grid rows: 12; grid columns: 4; rows: 17; columns: 16, grid to grid distance: 4500 um; spot to spot distance: 200 um; spot size: 100 um.
- support
- glass
- coating
- GCAT: epoxy; Ontario: amino silane
# Platform Column Definitions: describe the contents of each platform column of the matrix table
- header 1
- header 2