Proportal
- ProPortal Prochlorococcus Portal is a web analytical tool for Prochlorococcus, a model system for Integrative Systems Biology.
Project Timelines
Timelines | Plan | Status | Comments |
---|---|---|---|
Jul 22 | Finish the modification of Proportal DB Schema. Re-create all tables in a new database with all the missing foreign keys and their cascade properties. | The new database schema has been created by adding all missing foreign keys in the current version. A number of errors are to be fixed in next week to synchronize the datasets stored in the development and production databases. | Add your comments |
Jul 29 | Finish the update of datasets in Proportal public website. Migrate/merge all datasets stored in the development and production databases. Verify the integrity and consistency between the development and production databases and websites. | Ongoing | Add your comments |
Aug 5 | Finish the modification of cluster analysis using static/manual processing. | Ongoing | Add your comments |
Aug 15 | Due date for submitting the database paper | TBD | Add your comments |
Proportal DB Schema
User Module
Project Module
Table: data_project
A list of projects
* 72 projects, as of 07-21-2011 (To be updated: 58 in production DB) * Last updated: 2010-12-10 * No foreign key
Notes
The following distinct "type" can be moved into a separated table for a clear definition,
* cpm, Cyanophage genomes part 1 (To be updated: 18 records in PRO DB, 28 in DEV DB) * cpp, Cyanophage genomes part 2 (To be upadted: 8 records in PRO DB, 11 in DEV DB) * cps, Cyanophage genomes part 3 (2 records in both DBs) * ma, physiology experiments (4 records in both DBs): Light Sensing, Nitrogen Availability, Phage Infection, and Phosphate Starvation) * mt, expression experiment (1 record in both DBs): Microbial community gene expression in ocean surface waters * p, Prochlorococcus genomes(13 genomes in both DBs) * pb, Prochlorococcus Publications (1 record in both DBs) * s, Synechococcus genomes(11 genomes in both DBs)
The link for "tax_id" is defined in data_url table.
* type_id = 59919 * source = tax * url = http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=59919
Table: data_projectpub
A list of publications from various projects.
* 32 publications as of 07-21-2011 * Foreign keys: o project_id: = data_project.id o pubmed_id: = data_publication.id
Notes
This table is used for mapping projects with related publications, which is displayed on the strain page in the "Genomes" section, for instance: MED4.
Table: data_genepub
This table is empty. Consider to use data_publication table instead?
* Foreign key: data_project
Table: data_publication
A list of publications related to Prochlorococcus, Cyanophage, and Synechococcus.
* 2528 publications listed as of 08-1-2011 (both DEV and PRO DBs are updated) * Not refered by any other table * pubmed_id can be used as a foreign key. * "year": last updated 2010
Table: data_url_map
This table is empty.
Table: data_url
The list of data links or data folders.
Meta Data Module
Table data_bats_ts
Information about field investigation.
* No foreign key
Table: data_meta_data
Information about field investigation for each project
* 66 meta data sets, as of 07-22-2011 * Foreign key: data_project.id, to be fixed.
Error
* One project_id 26 is missing in data_project table. * Six projects defined in data_project table (all in year 2008) do not have meta data defined in this table.
Genome Data Module
Table: data_scaffold
A list of strains/genomes used in various projects.
* Last updated: 12-10-2010 * 213 strains, as of 07-22-2011 * Fireigh key: data_project.id
Questions
* "refseg_id" not defined * "seq" field can be removed because its content is further defined in data_dna and data_protein tables.
Table: data_position
List of start and end positions of gene/DNA for each strain defined in Table data_scaffold.
* 67516 pair of positions, as of 07-22-2011 * 9 types of sequences are defined: 16s, 23s, 5s, as, m, n, orf, ps, t * Foreign key: data_scaffold.id.
Table: data_dna
A list of DNA sequesnces in correspondence to sequence postion information defined in data_position table.
* 67516 pieces of DNA sequences stored, as of 07-22-2011 * Three foreign keys: data_position.id, data_scaffold.id and data_protein.id
Error
* Foreign key pos_id has error: o Two position ids in data_position table: 37163 and 46814 are missing in this table o Two pos_id: 36978 and 37113 do not exist in data_position table.
Table: data_protein
A list of protein sequences.
* 65909 proteins defined, as of 07-22-2011 (1607 DNA sequences are not present in this table) * Two foreign keys: data_scaffold.id and data_protein.id
Notes
* "cluster_id" should be removed from this table
Table: data_ortholog
Protein orthologs.
* 830944 orthology pairs defined, as of 07-22-2011 * Foreign keys: protein_id and ortholog_id
Table: data_protein_xref
Definition: ?
* 36774 records stored, as of 07-22-2011 * Foreign key: data_protein.id, to be fixed,
Error
* Two records have missing protein_id: 36950 and 45482 in data_protein table
Affychip Expression Module
Table: data_affychip
Information about each affychip used.
* 1 chip defined, as of 07-22-2011 * No foreign key
Table: data_affyexp
A list of affychip experiments.
* 20 affychip experiments, as of 07-22-2011 * Foreign key: project_id, only three projects involved affychip experiments.
Table: data_affyprobeset
A list of probe sets for various affychip experiments.
* 9966 records, as of 07-22-2011 * Three foreign keys: o chip_id: o scaffold_id: has missing keys o feature_id: not defined
Notes
* feature_id not defined * Use "begin" and "end" to match DNA\gene\protein?
Table: data_affyprobe
A list of probes for various affychip experiments.
* 89749 records, as of 07-22-2011 * Foreign key: probeset_id
Table: data_affydata
The expression results of Affychip experiments.
* 110848 records, as of 07-22-2011 * Foreign keys, o exp_id o probeset_id
Notes
* No DNA\gene\protein info, use probeset_id?
Table: data_diel
The results of Affychip time course experiments.
* 1695 records, as of 07-22-2011 * Foreign keys, o probeset_id o protein_id o gene_id: not defined
Notes
* gene_id not defined
Table: data_dieltimepoint
Time courses of Affychip experiemnts.
* 42375 records, as of 07-22-2011 * Foreign key: diel_id
Cog Module
Table: data_cog_fun
A list of Cog gene functions.
* 24 funtion categoriess, as of 07-22-2011 * No foreign key
Table: data_cog
A list of Cog genome annotations
* 4874 records, as of 07-22-2011 * Foreign key: data_cog_fun.funcode ?
Notes
* data_cog_fun.funcode can't be regarded as a foreign key becuase some of funcodes in this table are missing in data_cog_fun table.
Table: data_protein_cog
The mapping between Cog genome and proteins.
* 18498 records, as of 07-22-2011 * Foreign keys: data_protein.id and data_cog.id
Microarray Module
Table: data_gos_site
A list of Gos field experiments, such as sites of experiments etc.
* 78 records, as of 07-22-2011 * No foreign key
Table: data_gos_read
A list of field reads for various Gos experiments.
* 9893120 records, as of 07-22-2011 * Foreign key: data_gos_site.id, no error
Table: data_gos_to_protein
The mapping between Gos genomes and proteins.
* 926072 records, as of 07-22-2011 * Foreign keys: o data_protein.id, has error, to be fixed o data_gos_read.id, has error, to be fixed
Error
* The foreign key: read_id=0 is not defined in data_gos_read table for id=1 and id=705172 in this table * The foreign key: protein_id=0 is not defined in data_protein table for id=1 and id=705172 in this table
Table: data_gos_blastn
A list of sequences from Gos experiments.
* 8666847 records, as of 07-22-2011 * Foreign keys: o data_scaffold.id, has error, to be fixed o data_gos_read.id, has error, to be fixed
Error
* The foreign key: scaffold_id=0 is not defined in data_gos_read table for 211 records in this table * The foreign key: read_id=0 is not defined in data_gos_read table for 56438 records in this table
Cluster Module
Table: data_protein_cluster
A list of protein clusters.
* 5597 records, as of 07-22-2011 * No foreign key
Notes
* Two distinct "type": phCOG and CyCog * "gene_name" not in use
Table: data_protein_cluster_synonym
The table is empty.
Table: data_protein_cluster_xref
* 1100 records, as of 07-22-2011 * Foreign key: data_protein_cluster.id, has error, to be fixed
Notes
* Only one "type": c * "xref": COG reference id, which may correspond to multiple cluster ids
Error
* The foreign key: some cluster_ids are not defined in data_protein_cluster table for about 880 records.
Table: data_protein_cluster_cog
This table is empty.
Table: data_clusterlink
A list of pairs of clusters.
* 71 records, as of 07-22-2011 * Foreign key: data_protein_cluster.id,has error, to be fixed
Notes
* "evidence" is not in use
Error
* The foreign key: cluster_id=0 is not defined in data_protein_cluster table.