GEO Features Analysis
From OpenWetWare
Jump to navigationJump to search
BENEFIT ANALYSIS OF THE GENE EXPRESSION OMNIBUS (GEO)
- Architecture
- Datasets organized by Series, Samples and Platforms—advantageous for sorting data.
- Data can be browsed if one does not want to perform search
- Datasets encouraged to be MIAME compliant to standardize on public data
- Easy to use interface with Java script for help box (help box changes when cursor is moved on different links)
- .CEL files clearly labeled and downloadable directly off results page of search
- Search
- Search functions include limits (Boolean Operators e.g. AND, OR, NOT), history, clipboard, and help
- Search available by platform, series or sample, as well as organism
- EVS (Enterprise Vocabulary Service) integrated search
- DataSets
- Each dataset given a unique accession number for easy search access.
- Entrez GEO search works in conjunction with other NCI databases including: GenBank, PubMed, Gene, UniGene, OMIM, Homologene, Taxonomy, SAGEMap, and MapViewer.
- “Hold until published” feature available for private data
- Login access can be given to reviewers for private data
- DataSoft files available for download has table with each patient and expression values for each probe set.
- Clearly lists array platform, PubMed ID organism studied
- Web-based Analysis
- Cluster heat maps: Hierarchical and K-means clustering algorithms provided. Regions of interest can be selected, enlarged, downloaded, plotted as line charts, or linked directly to Entrez GEO-Profiles.
- Query subset A vs. B: Helps in the identification of genes that display differences in expression level between two sets of samples in a DataSet, calculated using t-tests or fold difference. Genes that meeting defined significance criteria displayed in Entrez GEO-Profiles.
- Subset effects: Retrieves all profiles flagged as having significant effects with respect to a specific experimental variable, e.g. 'age' or 'strain'.
- Value distribution: plots average gene expression for patients for comparison and outlier detection
- Deposit formats supported
-
- SOFTtext file
- Multiple platform, sample and series data may be submitted as concatenated SOFT-formatted records.
- Suitable when data already in a database, or if many samples to submit.
- SOFT (Simple Omnibus Format in Text) is designed for rapid batch submission (and download) of data
- SOFTmatrix (spreadsheet or text)
- Multiple sample and series data may be submitted as a SOFT-formatted data matrix.
- Suitable when data multiple hybridizations contained in one spreadsheet, e.g. multi-chip Affymetrix Pivot files.
- Less suitable if sample data tables have many columns, e.g. GenePix data
- MINiML
- Multiple platform, sample and series data may be submitted as MINiML XML files.
- MINiML (MIAME Notation in Markup Language) is data exchange format for microarray gene expression data, and other types of high-throughput molecular abundance data.
- MAGE-ML
- Can accept MAGE-ML formatted data, but MAGE-ML data can be structured in different ways, so we should review format and content of files to determine they are valid XML documents containing all minimum info. for submission.
- Processing times for MAGE-ML submission can be substantially longer than for our other deposit routes
- Our database
- Should be able to search data by genes, and biomarkers (whether they are up-regulated or down-regulated) for cross-study comparisons.
- Search should be integrated with EVS
- List co-expressed genes and locations on chip
- No external data: .CEL files and data should be on FTP for easy access