Proportal: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
No edit summary
Line 8: Line 8:
|-
|-
! Jul 22
! Jul 22
| Finish the modification of Proportal DB Schema. Re-create all tables in a new database with all the missing foreign keys and their cascade properties. || The new database schema has been created by adding all missing foreign keys in the current version. A number of errors are to be fixed in next week to synchronize the datasets stored in the development and production databases. || NA
| Finish the modification of Proportal DB Schema. Re-create all tables in a new database with all the missing foreign keys and their cascade properties. || The new database schema has been created by adding all missing foreign keys in the current version. A number of errors are to be fixed in next week to synchronize the datasets stored in the development and production databases. || Add your comments
|-
|-
! Jul 29
! Jul 29
| Finish the update of datasets in Proportal public website. Migrate/merge all datasets stored in the development and production databases. Verify the integrity and consistency between the development and production databases. || Ongoing || NA
| Finish the update of datasets in Proportal public website. Migrate/merge all datasets stored in the development and production databases. Verify the integrity and consistency between the development and production databases. || Ongoing || Add your comments
|-
|-
! Aug 5
! Aug 5
| Finish the modification of cluster analysis using static/manual processing. || Ongoing || NA
| Finish the modification of cluster analysis using static/manual processing. || Ongoing || Add your comments
|-
|-
! Aug 15
! Aug 15
| Due date for submitting the database paper || TBD || NA
| Due date for submitting the database paper || TBD || Add your comments
|}
|}


=Proportal DB Schema=
=Proportal DB Schema=

Revision as of 13:19, 27 July 2011

  1. ProPortal Prochlorococcus Portal is a web analytical tool for Prochlorococcus, a model system for Integrative Systems Biology.

Project Timelines

Schedule
Schedule Plan Status Comments
Jul 22 Finish the modification of Proportal DB Schema. Re-create all tables in a new database with all the missing foreign keys and their cascade properties. The new database schema has been created by adding all missing foreign keys in the current version. A number of errors are to be fixed in next week to synchronize the datasets stored in the development and production databases. Add your comments
Jul 29 Finish the update of datasets in Proportal public website. Migrate/merge all datasets stored in the development and production databases. Verify the integrity and consistency between the development and production databases. Ongoing Add your comments
Aug 5 Finish the modification of cluster analysis using static/manual processing. Ongoing Add your comments
Aug 15 Due date for submitting the database paper TBD Add your comments

Proportal DB Schema

alt text

User Module

Project Module

Table: data_project

A list of projects

   * 72 projects, as of 07-21-2011
   * Last updated: 2010-12-10
   * No foreign key

Notes

   * "type": cpm, cpp, cps, ma, mt, p, pb, s
   * "tax_id": the link for "tax_id" is defined in data_url table. 

For instance,

   * type_id = 59919
   * source = tax
   * url = http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=59919

Table: data_projectpub

A list of publications from various projects

   * 32 publications as of 07-21-2011
   * No publication in this table is listed in data_publication table
   * Foreign key: data_project

Table: data_genepub

This table is empty. Consider to use data_publication table instead?

   * Foreign key: data_project

Table: data_publication

A list of publications related to Prochlorococcus, Cyanophage, and Synechococcus.

   * 2526 publications listed as of 07-21-2011
   * Not refered by any other table
   * pubmed_id can be used as a foreign key.
   * "year": last updated 2010

Table: data_url_map

This table is empty.

Table: data_url

The list of data links or data folders.

Meta Data Module

Table data_bats_ts

Information about field investigation.

   * No foreign key

Table: data_meta_data

Information about field investigation for each project

   * 66 meta data sets, as of 07-22-2011
   * Foreign key: data_project.id, to be fixed.

Error

   * One project_id 26 is missing in data_project table.
   * Six projects defined in data_project table (all in year 2008) do not have meta data defined in  this table.

Genome Data Module

Table: data_scaffold

A list of strains/genomes used in various projects.

   * Last updated: 12-10-2010
   * 213 strains, as of 07-22-2011
   * Fireigh key: data_project.id

Questions

   * "refseg_id" not defined
   * "seq" field can be removed because its content is further defined in data_dna and data_protein tables.

Table: data_position

List of start and end positions of gene/DNA for each strain defined in Table data_scaffold.

   * 67516 pair of positions, as of 07-22-2011
   * 9 types of sequences are defined: 16s, 23s, 5s, as, m, n, orf, ps, t
   * Foreign key: data_scaffold.id.

Table: data_dna

A list of DNA sequesnces in correspondence to sequence postion information defined in data_position table.

   * 67516 pieces of DNA sequences stored, as of 07-22-2011
   * Three foreign keys: data_position.id, data_scaffold.id and data_protein.id

Error

   * Foreign key  pos_id has error:
         o Two position ids in data_position table: 37163 and 46814 are missing in this table
         o Two pos_id: 36978 and 37113 do not exist in data_position table.

Table: data_protein

A list of protein sequences.

   * 65909 proteins defined, as of 07-22-2011 (1607 DNA sequences are not present in this table)
   * Two foreign keys: data_scaffold.id and data_protein.id

Notes

   * "cluster_id" should be removed from this table

Table: data_ortholog

Protein orthologs.

   * 830944 orthology pairs defined, as of 07-22-2011
   * Foreign keys: protein_id and ortholog_id

Table: data_protein_xref

Definition: ?

   * 36774 records stored, as of 07-22-2011
   * Foreign key: data_protein.id, to be fixed,

Error

   * Two records have missing protein_id: 36950 and 45482 in data_protein table

Affychip Expression Module

Table: data_affychip

Information about each affychip used.

   * 1 chip defined, as of 07-22-2011
   * No foreign key

Table: data_affyexp

A list of affychip experiments.

   * 20 affychip experiments, as of 07-22-2011
   * Foreign key: project_id, only three projects involved affychip experiments.

Table: data_affyprobeset

A list of probe sets for various affychip experiments.

   * 9966 records, as of 07-22-2011
   * Three foreign keys:
         o chip_id:
         o scaffold_id: has missing keys
         o feature_id: not defined

Notes

   * feature_id not defined
   * Use "begin" and "end" to match DNA\gene\protein?

Table: data_affyprobe

A list of probes for various affychip experiments.

   * 89749 records, as of 07-22-2011
   * Foreign key: probeset_id

Table: data_affydata

The expression results of Affychip experiments.

   * 110848 records, as of 07-22-2011
   * Foreign keys,
         o exp_id
         o probeset_id

Notes

   * No DNA\gene\protein info, use probeset_id?

Table: data_diel

The results of Affychip time course experiments.

   * 1695 records, as of 07-22-2011
   * Foreign keys,
         o probeset_id
         o protein_id
         o gene_id: not defined

Notes

   * gene_id not defined

Table: data_dieltimepoint

Time courses of Affychip experiemnts.

   * 42375 records, as of 07-22-2011
   * Foreign key: diel_id

Cog Module

Table: data_cog_fun

A list of Cog gene functions.

   * 24 funtion categoriess, as of 07-22-2011
   * No foreign key

Table: data_cog

A list of Cog genome annotations

   * 4874 records, as of 07-22-2011
   * Foreign key: data_cog_fun.funcode ?

Notes

   * data_cog_fun.funcode can't be regarded as a foreign key becuase some of funcodes in this table are missing in data_cog_fun table.

Table: data_protein_cog

The mapping between Cog genome and proteins.

   * 18498 records, as of 07-22-2011
   * Foreign keys: data_protein.id and data_cog.id

Microarray Module

Table: data_gos_site

A list of Gos field experiments, such as sites of experiments etc.

   * 78 records, as of 07-22-2011
   * No foreign key

Table: data_gos_read

A list of field reads for various Gos experiments.

   * 9893120 records, as of 07-22-2011
   * Foreign key: data_gos_site.id, no error

Table: data_gos_to_protein

The mapping between Gos genomes and proteins.

   * 926072 records, as of 07-22-2011
   * Foreign keys:
         o data_protein.id, has error, to be fixed
         o data_gos_read.id, has error, to be fixed

Error

   * The foreign key: read_id=0 is not defined in data_gos_read table for id=1 and id=705172 in this table
   * The foreign key: protein_id=0 is not defined in data_protein table for id=1 and id=705172 in this table

Table: data_gos_blastn

A list of sequences from Gos experiments.

   * 8666847 records, as of 07-22-2011
   * Foreign keys:
         o data_scaffold.id, has error, to be fixed
         o data_gos_read.id, has error, to be fixed

Error

   * The foreign key: scaffold_id=0 is not defined in data_gos_read table for 211 records in this table
   * The foreign key: read_id=0 is not defined in data_gos_read table for 56438 records in this table

Cluster Module

Table: data_protein_cluster

A list of protein clusters.

   * 5597 records, as of 07-22-2011
   * No foreign key

Notes

   * Two distinct "type": phCOG and CyCog
   * "gene_name" not in use

Table: data_protein_cluster_synonym

The table is empty.

Table: data_protein_cluster_xref

   * 1100 records, as of 07-22-2011
   * Foreign key: data_protein_cluster.id, has error, to be fixed

Notes

   * Only one "type": c
   * "xref": COG reference id, which may correspond to multiple cluster ids

Error

   * The foreign key: some cluster_ids are not defined in data_protein_cluster table for about 880 records.

Table: data_protein_cluster_cog

This table is empty.

Table: data_clusterlink

A list of pairs of clusters.

   * 71 records, as of 07-22-2011
   * Foreign key: data_protein_cluster.id,has error, to be fixed

Notes

   * "evidence" is not in use

Error

   * The foreign key: cluster_id=0 is not defined in data_protein_cluster table.