Sequence acquisition and manipulation

This step turned out to be much easier than I thought. Only strain W will require any prediction or annotation. B. marinus SJ and Bdellovibrio HD 100 can go to pSORTdb extraction immediately; which will leave much more time for analysis.

B. bacteriovorax strain W

Because CloVR-Microbe requires that each contig is represented by its own sequence file. The contigs from microgen are concatenated into a single .fasta file, so that’ll have to be broken up. This seems trivial, so it’ll make a good first script. Using the Biopython cookbook, so there’s already a small set of example code to modify. I modified the script to point it towards the file, then moved the contigs to another directory. Pretty simple.

  • Split large file script: HTTP://

B. marinus SJ

Turns out the annotations were available on GenBank after all. I'm going to split this file anyway, just to leave my options open for the R stage of the computation. Same script as above, modified title and filename.

Bdellovibrio HD 100

Already available.

Next steps

  1. Setup CloVR-Microbe for strain W.
  2. Convert SJ and HD 100 to AA, or locate them online
  3. Setup pSORTdb server, on EC2 or locally
  4. Literature review for expanded cost and location analysis
