User:James Estevez/Notebook/Spring 2011: Bdellovibrio Independent Study/2011/01/28

= Sequence acquisition and manipulation =

This step turned out to be much easier than I thought. Only strain W will require any prediction or annotation. B. marinus SJ and Bdellovibrio HD 100 can go to pSORTdb extraction immediately; which will leave much more time for analysis.

B. bacteriovorax strain W
Because CloVR-Microbe requires that each contig is represented by its own sequence file. The contigs from microgen are concatenated into a single .fasta file, so that&rsquo;ll have to be broken up. This seems trivial, so it&rsquo;ll make a good first script. Using the Biopython cookbook, so there&rsquo;s already a small set of example code to modify. I modified the script to point it towards the file, then moved the contigs to another directory. Pretty simple.


 * Split large file script: HTTP://openwetware.pastebin.com/V8AW6bz4

B. marinus SJ
Turns out the annotations were available on GenBank after all. I'm going to split this file anyway, just to leave my options open for the R stage of the computation. Same script as above, modified title and filename.

Bdellovibrio HD 100
Already available.

= Next steps =
 * 1) Setup CloVR-Microbe for strain W.
 * 2) Convert SJ and HD 100 to AA, or locate them online
 * 3) Setup pSORTdb server, on EC2 or locally
 * 4) Literature review for expanded cost and location analysis