User:James Estevez/Notebook/Spring 2011: Bdellovibrio Independent Study/2011/01/28
Sequence acquisition and manipulation
This step turned out to be much easier than I thought. Only strain W will require any prediction or annotation. B. marinus SJ and Bdellovibrio HD 100 can go to pSORTdb extraction immediately; which will leave much more time for analysis.
B. bacteriovorax strain W
Because CloVR-Microbe requires that each contig is represented by its own sequence file. The contigs from microgen are concatenated into a single .fasta file, so that’ll have to be broken up. This seems trivial, so it’ll make a good first script. Using the Biopython cookbook, so there’s already a small set of example code to modify. I modified the script to point it towards the file, then moved the contigs to another directory. Pretty simple.
- Split large file script: HTTP://openwetware.pastebin.com/V8AW6bz4
B. marinus SJ
Turns out the annotations were available on GenBank after all. I'm going to split this file anyway, just to leave my options open for the R stage of the computation. Same script as above, modified title and filename.
Bdellovibrio HD 100
Already available.
Next steps
- Setup CloVR-Microbe for strain W.
- Convert SJ and HD 100 to AA, or locate them online
- Setup pSORTdb server, on EC2 or locally
- Literature review for expanded cost and location analysis