Proportal: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
No edit summary
No edit summary
Line 8: Line 8:
|-
|-
! Jul 22
! Jul 22
| Finish the modification of Proportal DB Schema. Re-create all tables in a new database with all the missing foreign keys and their cascade properties. || The new database schema has been created by adding all missing foreign keys in the current version. A number of errors are to be fixed in next week to synchronize the datasets stored in the development and production databases. || Add your comments
| Finish the modification of Proportal DB Schema. Re-create all tables in a new database with all the missing foreign keys and their cascade properties. || The new [[Proportal DB Schema]] has been created by adding all missing foreign keys in the current version. A number of errors are to be fixed in next week to synchronize the datasets stored in the development and production databases. || Add your comments
|-
|-
! Jul 29
! Jul 29
| Finish the update of datasets in Proportal public website. Migrate/merge all datasets stored in the development and production databases. Verify the integrity and consistency between the development and production databases and websites. || Ongoing || Add your comments
| Finish the update of datasets in Proportal public website. Migrate/merge all datasets stored in the development and production databases. Verify the integrity and consistency between the development and production databases and websites. || Refer to the [[Release Notes]] for a number of issues resolved. || Add your comments
|-
|-
! Aug 5
! Aug 5
| Finish the modification of cluster analysis using static/manual processing. || Ongoing || Add your comments
| Finish the modification of cluster analysis using static/manual processing. || Refer to the [[Release Notes]] for a number of issues resolved. || Add your comments
|-
|-
! Aug 15
! Aug 15
| Due date for submitting the database paper || TBD || Add your comments
| Due date for submitting the database paper || Refer to the [[To-do List]] for a number of issues that should be addressed before the submission. || Add your comments
|}
|}


=[[Release Notes]]=
=Proportal DB Schema=
The new [[Proportal DB Schema]] is created by adding all the missing foreign keys back into the database. Some orphan records have been identified from this process and will be fixed/removed from next release of Proportal.
 
=Release Notes=
Some issues in the current version of Proportal have been resolved. Refer to the [[Release Notes]] for more details.


=[[To-do List]]=
=[[To-do List]]=
=[[Proportal DB Schema]]=
[[Image:Ocean-DBschema-v2.gif|alt text]]
==User Module==
==Project Module==
===Table: data_project===
A list of projects
    * 72 projects, as of 07-21-2011 (To be updated: 58 in PRO DB while 72 in DEV DB)
    * Last updated: 2010-12-10
    * No foreign key
Notes
The following distinct "type" can be moved into a separated table for a clear definition,
    * cpm, Cyanophage genomes part 1 (<B>To be updated: 18 records in PRO DB, 28 in DEV DB</B>)
    * cpp, Cyanophage genomes part 2 (<B>To be upadted: 8 records in PRO DB, 11 in DEV DB</B>)
    * cps, Cyanophage genomes part 3 (2 records in both DBs)
    * ma, physiology experiments (4 records in both DBs): Light Sensing, Nitrogen Availability, Phage Infection, and Phosphate Starvation)
    * mt, expression experiment (1 record in both DBs): Microbial community gene expression in ocean surface waters
    * p, Prochlorococcus genomes(13 genomes in both DBs)
    * pb, Prochlorococcus Publications (1 record in both DBs)
    * s, Synechococcus genomes(11 genomes in both DBs)
The link for "tax_id" is defined in data_url table. 
    * type_id = 59919
    * source = tax
    * url = http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=59919
===Table: data_projectpub===
A list of publications from various projects.
    * 32 publications as of 07-21-2011
    * Foreign keys:
          o project_id: = data_project.id
          o pubmed_id: = data_publication.id
Notes
This table is used for mapping projects with related publications, which is displayed on the strain page in the "Genomes" section, for instance: [http://proportal.mit.edu/genome/id=1/ MED4].
===Table: data_genepub===
This table is empty. Consider to use data_publication table instead?
    * Foreign key: data_project
===Table: data_publication===
A list of publications related to Prochlorococcus, Cyanophage, and Synechococcus.
    * 2528 publications listed as of 08-1-2011 (both DEV and PRO DBs are updated)
    * Not refered by any other table
    * pubmed_id can be used as a foreign key.
    * "year": last updated 2010
===Table: data_url_map===
This table is empty.
===Table: data_url===
The list of data links or data folders.
==Meta Data Module==
===Table data_bats_ts===
Information about field investigation.
    * No foreign key
===Table: data_meta_data===
Information about field investigation for each project
    * 66 meta data sets, as of 07-22-2011
    * Foreign key: data_project.id, to be fixed.
<B>Error</B>
    * One project_id 26 is missing in data_project table.
    * Six projects defined in data_project table (all in year 2008) do not have meta data defined in  this table.
==Genome Data Module==
===Table: data_scaffold===
A list of strains/genomes used in various projects.
    * Last updated: 12-10-2010
    * 213 strains, as of 07-22-2011
    * Fireigh key: data_project.id
Questions
    * "refseg_id" not defined
    * "seq" field can be removed because its content is further defined in data_dna and data_protein tables.
===Table: data_position===
List of start and end positions of gene/DNA for each strain defined in Table data_scaffold.
    * 67516 pair of positions, as of 07-22-2011
    * 9 types of sequences are defined: 16s, 23s, 5s, as, m, n, orf, ps, t
    * Foreign key: data_scaffold.id.
===Table: data_dna===
A list of DNA sequesnces in correspondence to sequence postion information defined in data_position table.
    * 67516 pieces of DNA sequences stored, as of 07-22-2011
    * Three foreign keys: data_position.id, data_scaffold.id and data_protein.id
Error
    * Foreign key  pos_id has error:
          o Two position ids in data_position table: 37163 and 46814 are missing in this table
          o Two pos_id: 36978 and 37113 do not exist in data_position table.
===Table: data_protein===
A list of protein sequences.
    * 65909 proteins defined, as of 07-22-2011 (1607 DNA sequences are not present in this table)
    * Two foreign keys: data_scaffold.id and data_protein.id
Notes
    * "cluster_id" should be removed from this table
===Table: data_ortholog===
Protein orthologs.
    * 830944 orthology pairs defined, as of 07-22-2011
    * Foreign keys: protein_id and ortholog_id
===Table: data_protein_xref===
Definition: ?
    * 36774 records stored, as of 07-22-2011
    * Foreign key: data_protein.id, to be fixed,
Error
    * Two records have missing protein_id: 36950 and 45482 in data_protein table
==Affychip Expression Module==
===Table: data_affychip===
Information about each affychip used.
    * 1 chip defined, as of 07-22-2011 (same for DEV and PRO DBs)
    * No foreign key
===Table: data_affyexp===
A list of affychip experiments.
    * 20 affychip experiments, as of 07-22-2011 (same for DEV and PRO DBs)
    * Foreign key: project_id, only three projects involved affychip experiments.
===Table: data_affyprobeset===
A list of probe sets for various affychip experiments.
    * 9966 records, as of 07-22-2011
    * Three foreign keys:
          o chip_id:
          o scaffold_id: has missing keys
          o feature_id: not defined
Notes
    * feature_id not defined, but is actually in one to one correspondence to gene_id in data_diel table, which is further linked to data_protein table using protein_id.
    * Use "begin" and "end" to match DNA\gene\protein?
===Table: data_affyprobe===
A list of probes for various affychip experiments.
    * 89749 records, as of 07-22-2011
    * Foreign key: probeset_id
===Table: data_affydata===
The expression results of Affychip experiments.
    * 110848 records, as of 07-22-2011
    * Foreign keys,
          o exp_id
          o probeset_id
Notes
    * No DNA\gene\protein info, use probeset_id?
===Table: data_diel===
The mapping between gene id and protein id, which is used in Affychip expression experiments.
    * 1695 records, as of 07-22-2011
    * Foreign keys,
          o probeset_id
          o protein_id
          o gene_id: not defined
Notes
    * gene_id not defined, but is actually in one to one correspondence to feature_id in data_probeset
===Table: data_dieltimepoint===
Time courses of Affychip experiemnts.
    * 42375 records, as of 07-22-2011
    * Foreign key: diel_id
==Cog Module==
===Table: data_cog_fun===
A list of Cog gene functions.
    * 24 funtion categoriess, as of 07-22-2011
    * No foreign key
===Table: data_cog===
A list of Cog genome annotations
    * 4874 records, as of 07-22-2011
    * Foreign key: data_cog_fun.funcode ?
Notes
    * data_cog_fun.funcode can't be regarded as a foreign key becuase some of funcodes in this table are missing in data_cog_fun table.
===Table: data_protein_cog===
The mapping between Cog genome and proteins.
    * 18498 records, as of 07-22-2011
    * Foreign keys: data_protein.id and data_cog.id
==Microarray Module==
===Table: data_gos_site===
A list of Gos field experiments, such as sites of experiments etc.
    * 78 records, as of 07-22-2011
    * No foreign key
===Table: data_gos_read===
A list of field reads for various Gos experiments.
    * 9893120 records, as of 07-22-2011
    * Foreign key: data_gos_site.id, no error
===Table: data_gos_to_protein===
The mapping between Gos genomes and proteins.
    * 926072 records, as of 07-22-2011
    * Foreign keys:
          o data_protein.id, has error, to be fixed
          o data_gos_read.id, has error, to be fixed
Error
    * The foreign key: read_id=0 is not defined in data_gos_read table for id=1 and id=705172 in this table
    * The foreign key: protein_id=0 is not defined in data_protein table for id=1 and id=705172 in this table
===Table: data_gos_blastn===
A list of sequences from Gos experiments.
    * 8666847 records, as of 07-22-2011
    * Foreign keys:
          o data_scaffold.id, has error, to be fixed
          o data_gos_read.id, has error, to be fixed
Error
    * The foreign key: scaffold_id=0 is not defined in data_gos_read table for 211 records in this table
    * The foreign key: read_id=0 is not defined in data_gos_read table for 56438 records in this table
==Cluster Module==
===Table: data_protein_cluster===
A list of protein clusters.
    * 5597 records, as of 07-22-2011
    * No foreign key
Notes
    * Two distinct "type": phCOG and CyCog
    * "gene_name" not in use
===Table: data_protein_cluster_synonym===
The table is empty.
===Table: data_protein_cluster_xref===
    * 1100 records, as of 07-22-2011
    * Foreign key: data_protein_cluster.id, has error, to be fixed
Notes
    * Only one "type": c
    * "xref": COG reference id, which may correspond to multiple cluster ids
Error
    * The foreign key: some cluster_ids are not defined in data_protein_cluster table for about 880 records.
===Table: data_protein_cluster_cog===
This table is empty.
===Table: data_clusterlink===
A list of pairs of clusters.
    * 71 records, as of 07-22-2011
    * Foreign key: data_protein_cluster.id,has error, to be fixed
Notes
    * "evidence" is not in use
Error
    * The foreign key: cluster_id=0 is not defined in data_protein_cluster table.

Revision as of 08:07, 8 August 2011

  1. ProPortal Prochlorococcus Portal is a web analytical tool for Prochlorococcus, a model system for Integrative Systems Biology.

Project Timelines

Schedule
Timelines Plan Status Comments
Jul 22 Finish the modification of Proportal DB Schema. Re-create all tables in a new database with all the missing foreign keys and their cascade properties. The new Proportal DB Schema has been created by adding all missing foreign keys in the current version. A number of errors are to be fixed in next week to synchronize the datasets stored in the development and production databases. Add your comments
Jul 29 Finish the update of datasets in Proportal public website. Migrate/merge all datasets stored in the development and production databases. Verify the integrity and consistency between the development and production databases and websites. Refer to the Release Notes for a number of issues resolved. Add your comments
Aug 5 Finish the modification of cluster analysis using static/manual processing. Refer to the Release Notes for a number of issues resolved. Add your comments
Aug 15 Due date for submitting the database paper Refer to the To-do List for a number of issues that should be addressed before the submission. Add your comments

Proportal DB Schema

The new Proportal DB Schema is created by adding all the missing foreign keys back into the database. Some orphan records have been identified from this process and will be fixed/removed from next release of Proportal.

Release Notes

Some issues in the current version of Proportal have been resolved. Refer to the Release Notes for more details.

To-do List