Proportal ToDoList

From OpenWetWare

(Difference between revisions)
Jump to: navigation, search
(Cluster Analysis)
(January 30, 2012)
Line 61: Line 61:
where cluster_evi like '%hmmscan'
where cluster_evi like '%hmmscan'
order by cluster_id;
order by cluster_id;
 +
 +
===January 24, 2012===
 +
Add P-SSP3 genome into Proportal and run cluster pipeline again. Include the following genomes in the output,
 +
 +
 +
76 P-GSP1
 +
54 P-HP1
 +
75 P-RSP2
 +
55 P-RSP5
 +
71 P-SSP10
 +
72 P_SSP6_G2088
 +
24 P_SSP7
 +
57 P-SSP9_G2089
 +
49 SYN5_gp01 [NC_009531]
 +
56 P-SSP2
 +
 +
58 P-SSP5 (old P-SSP3) should be removed from Proportal?
==Annotation Pipeline==
==Annotation Pipeline==

Revision as of 18:37, 30 January 2012

Contents

To-do List

To-do List
id description Status Comments
1 Orphan records in DB To be confirmed: whether remove them or fix the wrong links. Add your comment
2 Add/update 13 Cyanophage genome strains into production server To be confirmed: published or not published data? Add your comment
3 Modify the search page Complete: systematically modified for accurate results. Add your comment
4 Datasets download Complete: wait for new datasets released or published. Add your comment
5 Datasets upload Open for suggestion: mechanisms for incorporating the community efforts. Add your comment
6 Pipeline for cluster analysis On going. Add your comment
7 Dynamic presentation of cluster network On going. Add your comment
8 Annotation pipeline On hold. Add your comment

Cluster Analysis

The current COG clustering pipeline is in review. New COG clusters are being generated on the internal development website and will be updated soon on the public Proportal website.

January 30, 2012

SSSM7 is a phage, the rest are Prochlorococcus. Should this be an orphan phage gene in the SSSM7 genome?

>PMED4_13831|3728 >P9303_01001|3728 >A9601_14421|3728 >SS120_13441|3728 >SSSM7_186|3728

To verify this problem, use the query,

SELECT * FROM `ocean-dev`.`data_protein` A left join data_scaffold B on A.scaffold_id=B.id left join data_project C on C.id = B.project_id where cluster_id=3728;

To find all hmmscan cases, use

SELECT * FROM `ocean-dev`.`data_protein` A left join data_scaffold B on A.scaffold_id=B.id left join data_project C on C.id = B.project_id where cluster_evi like '%hmmscan' order by cluster_id;

January 24, 2012

Add P-SSP3 genome into Proportal and run cluster pipeline again. Include the following genomes in the output,


76 P-GSP1 54 P-HP1 75 P-RSP2 55 P-RSP5 71 P-SSP10 72 P_SSP6_G2088 24 P_SSP7 57 P-SSP9_G2089 49 SYN5_gp01 [NC_009531] 56 P-SSP2

58 P-SSP5 (old P-SSP3) should be removed from Proportal?

Annotation Pipeline

October, 2011

Blast2GO provides another annotation pipeline.

B2G4PIPE - Blast2GO without graphical interface. The Blast2GO Pipeline Version (B2G4Pipe) runs Blast2GO without graphical interface.

For more information, refer to http://www.blast2go.com/b2glaunch/resources

September 30, 2011

Kat: Since Matt already offered his pipeline and it sounded like it has been continuously maintained and developed, it does sound like a good option. However, pay attention to how they train the gene calling program and what program(s) are used. The old method (described in the T4 paper) was dependent on a gene calling program, GeneMark. I think Matt's pipeline's improvement was mostly on the start sites... But it's perhaps not that critical to get the start sites right depending on the focus of your project.

The general idea of a pipeline is simple if you'd rather build one yourself: 1. Evaluate the gene calling programs and figure out the best way to train the programs for phage genomes. 2. Combine the results into a final set. 3. Filter false positives. For Prochlorococcus genomes, I filter the short orphan gene models (< 50aa without any homologs in sequenced genomes).

For step 1, this has to be a continuous effort and it's most time-consuming since new programs and better algorithms are continuing to be developed and so any annotation pipeline requires constant maintenance and re-evaluation.

September 21, 2011

Simon: I met Matt Henn last Friday and we talked about the phage annotation pipeline. We can send them our sequences for annotation but both of us would prefer to have the pipeline independent. The problem is (or are) that there are in-house dependencies linked to the annotation pipeline. So to make it public, we would need to remove/move these. Matt estimate that it could be between 3-4 months of work for one person.

September 9, 2011

katya: Would you guys be available next week to discuss setting up a pipeline for reannotating some of our newer phages, e.g. the strange new siphos, which were pitifully annotated by the Broad pipeline? (I'd also like to revisit a couple of the myos that were annotated by Matt's group once we have a pipeline we're happy with in place.)

Data Download

September 30, 2011

We should add the iron microarray data since it is published. The Supp Info of the paper does not include the entire microarray dataset, only the differentially expressed genes in MED4/MIT9313.

Here's the data as log2 fold change. The 70 (and 72) hour time points come after an iron rescue to the experiment (-Fe) treatment.

September 23, 2011

The data posted for the different papers should look much more professional, or take it down. The names of the files are hokey, and not transparent, for one thing... (that would be easy to fix).

More importantly, the spread sheets for the temp and light data have those messy graphs on them. We should delete the graphs. And there is no annotation on the spread sheets so they would not be useful to anyone, and they don't have units. And they have too many significant figures. Just not ready for the public eye. Just too "raw" to have out there for the whole world to see.

The data we have under the different publications: http://proportal.mit.edu/download/ We probably should take some of it down for now until we can figure out how to clean it up. We should discuss in the next lab meeting.

Data Upload

A number of new strains should be uploaded into the DB. Refer to the Strain Discussion for more detail.

Broken Links

September 30, 2011

For instance: from our UI, we can query a specific Pro/Syn/phage read, and see which genome it is recruited to and what gene(s) it overlaps with: http://proportal.mit.edu/gosread/JCVI_READ_1105499780090/

But, strangely the fasta report isn't reported correctly.

Personal tools