Question: Proportal vs IMG, CAMERA and others websites?
While there are several excellent resources available to explore and compare microbial genomes—e.g. CyanoBase, IMG and MicrobesOnline, the unique strength of ProPortal is its comprehensive nature—including genomic, transcriptomic, metagenomic and population data from both domesticated and wild populations of cyanobacteria and phage.
It is easier to have all data in one place for large-scale data retrieval and cross-link between different types of data. Essentially, Proportal provides an own way of clustering genes (also described in Kettler et al) that are perhaps more suitable for the genomes in Proportal database. Proportal also provides external links to MicrobesOnline (from the gene page) if available and from there, users can browser the genomes by KEGG pathways, use MO's comparative genome browser and view the precomputed BLAST results. In Addition, Proportal provides a link to NCBI BLAST page for users to perform BLAST search on the fly for the gene in view.
Question: How to use the Search page?
The current keyword search engine in Proportal has the simplest implementation and is pretty aggressive. It is also case sensitive. A keyword will be used to search gene name, locus tag and gb tag first, and then segregated to search gene descriptions.
Question: Will more microarray data become available?
Microarray-based readouts of transcript levels in Prochlorococcus strains MED4 and MIT9313 exposed to 85 various phosphate, nitrogen, iron and ambient light conditions have been integrated into ProPortal. Transcript data for changing O2/CO2 ratios will soon be added. Datasets from other groups describing transcriptional response in Synechococcus 90 are not currently integrated, but could be in future releases.
Question: How to understand the metagenome data?
The CAMERA dataset is the first of the metagenomic data set that was used as a model how to store Prochlorococcus metagenomes into ProPortal. However, unlike CAMERA, only Pro/Syn/cyanophage metagenome data are interested and included at ProPortal.
The bar graphs provided on the website report the direct read counts that are assigned to the currently available host/phage genomes. The read counts should also be normalized to the genome size. Reporting the raw read counts is intended to answer the simplest questions such as "Is this gene/genomic region represented at all in the metagenomes?" And to give the users a quick answer on whether it's worth proceeding further.
The metagenomic part of ProPortal has been implemented with very user-friend UI but is an area that is still pretty much under-development. For instance, from the UI, we can query a specific Pro/Syn/phage read, and see which genome it is recruited to and what gene(s) it overlaps with, http://proportal.mit.edu/gosread/JCVI_READ_1105499780090/.
Question: Future development?
Finally, it would be really nice if some of the things that will only appear in a future ProPortal update e.g. phylogenetic trees for gene clusters; linking GOS reads to gene clusters and genomes are actually included at its outset.