User:James Estevez/Notebook/Spring 2011: Bdellovibrio Independent Study/2011/01/22

From OpenWetWare

Jump to: navigation, search
Image:HD100128.png Bdellovibrio Independent Study Main project page
Next entry

Friday night.

Proposal draft is up the food chain for review. Lab notebook set-up. Need to prep the analysis pipeline:

CloVR

  1. Getting familiar with the interface and navigating Hadoop. VirtualBox doesn't seem to want to work, so need to switch to VMware for the project. What do we want to know?
    1. AMI prep from inside the VM?
    2. Which version?
  2. Sample datasets from CloVR and MM.
  3. Documentation's pretty thin.

Running the first 16s pipeline

  • Editing clovr_16s.config:
## Template configuration file for
#########################################################
## Input information.
## Configuration options for the pipeline.
#########################################################
[input]
GROUP_COUNT=1
FASTA_FILES=/mnt/AMP_Lung.small.fasta
MAPPING_FILE=/mnt/IGS.qmap
 
PIPELINE_NAME=clovr_16S_pipeline 
FASTA_TAG=16S_FASTA
MAPPING_TAG=MAPPING
DB_TAG=clovr-core-set-aligned-imputed-fasta
 
##########################################################
## Cluster info.
## If the cluster_tag is present, the script will first
## check for the presence of this cluster and if it's not
## running will start a cluster with the default settings
##########################################################
[cluster]
CLUSTER_NAME=local
EXEC_NODES=1
CLOVR_CONF=clovr.conf
CLUSTER_CREDENTIAL=local
 
#key=/mnt/devel1.pem
#host=localhost
 
#########################################################
## Output info.
## Specifies where locally the data will end up and also
## logging information
#########################################################
[output]
OUTPUT_DIRECTORY=/mnt/output
log_file=/mnt/clovr_16S_run.log
## the higher, the more output (3 = most verbose)
debug_level=3
 
[pipeline]
PIPELINE_TEMPLATE=clovr_16S
PIPELINE_ARGS=--FASTA_FILES=${input.FASTA_TAG} --MAPPING_FILE=${input.MAPPING_TAG} --DB_PATH=${input.DB_TAG} --GROUP_COUNT=${input.GROUP_COUNT} 
 
#prestart,prerun,postrun are all run locally. Use noop.xml for no operation
#Prestart is run before cluster start
#Possible actions: tag input data and do QC metrics
PRESTART_TEMPLATE_XML=/opt/clovr_pipelines/workflow/project_saved_templates/clovr_16S/clovr_16S.prestart.xml
#Prerun is run after cluster start but before pipeline start
#Possible actions: tag and upload data sets to the cluster
PRERUN_TEMPLATE_XML=/opt/clovr_pipelines/workflow/project_saved_templates/clovr_16S/clovr_16S.prerun.xml
#Postrun is run after pipeline completion and after download data
#Possible actions: local a local database, web browser.reorganize data for local ergatis
POSTRUN_TEMPLATE_XML=/opt/clovr_pipelines/workflow/project_saved_templates/clovr_16S/clovr_16S.postrun.xml
  • Editing clovr_ec2.conf:
[cluster]
ami=ami-0c7b8d65
key=vappio_00
master_type=c1.xlarge
master_groups=vappio,web
 
#Uncomment to use spot pricing
master_bid_price=0.68
exec_type=c1.xlarge
exec_groups=vappio,web
 
#Uncomment to use spot pricing
exec_bid_price=0.68
 
# Maximum number of exec instances a cluster is allowed to have.
exec_max_instances=5
#availability_zone=us-east-1b
 
 
#
# Include the base clovr configuation
[]
-include /mnt/vappio-conf/clovr_base.conf

The first pass errored out. The mailing list says there needs to be a Vappio script to handle the cluster config. This set-up only started the local cluster. There'll be upstream problems modifying the pipeline, too, I expect.


Personal tools