MicrobeDB

From OpenWetWare
Jump to navigationJump to search

Home Project People News For Team Calendar Library


MicrobeDB

MicrobeDB is the local storage repository of all RefSeq microbial genomes (Bacteria and Archaea) from NCBI. All genomes are downloaded monthly from NCBI, stored locally on Edhar and are parsed into a MySQL database. A perl API has also been written to query the MySQL database.

MicrobeDB Flat Files

All NCBI_genomes from the NCBI FTP site are downloaded once a month and are stored in a new folder on Edhar at: /var/opt/iseem/NCBI_genomes/.

Each monthly folder is in the form of "Bacteria_YYYY-MM-DD" where the dates correspond to the download dates. The most recent download directory is always sym linked to "Bacteria"

MicrobeDB MySQL

  • MySQL database to store genome project, replicon (chromosome and plasmid), and gene information.
  • The database can be accessed using:

username:perlapi

password:microbedb

database name:microbedb

Connect via command line: "mysql -uperlapi -pmicrobedb"

or using the web interface: phpMyAdmin

Information

  • Version - Each monthly download from NCBI is given a new version number
    • Advantages
      • Data will not change if you always use the same version number of microbedb
      • Version date can be cited for any method publications
    • Disadvantages
      • Data is redundant in the database (e.g. multiple versions of the same gene)
      • A version number (version_id) must always be used when retrieving information otherwise multiple copies will be returned
  • Genomeproject
    • Contains information about the genome project and the organism that was sequenced
    • E.g. taxon_id, org_name, lineage, gram_stain, genome_gc, patho_status, disease, genome_size, pathogenic_in, temp_range, habitat, shape, arrangement, endospore, motility, salinity, etc.
    • Each genomeproject contains one or more replicons
  • Replicon
    • Chromosome or plasmids
    • E.g. rep_accnum, definition, rep_type, rep_ginum, cds_num, gene_num, protein_num, genome_id, rep_size, rna_num, rep_seq (complete nucleotide sequence)
    • Each replicon contains one or more genes
  • Gene
    • Contains gene annotations and also the DNA and protein sequences (if protein coding gene)
    • E.g. gid, pid, protein_accnum, gene_type, gene_start, gene_end, gene_length, gene_strand, gene_name, locus_tag, gene_product, gene_seq, protein_seq

Microbedb Perl API

Information and Perl Modules are located on Edhar at: /var/opt/iseem/perl_modules/MicrobeDB


See example in "/var/opt/iseem/perl_modules/MicrobeDB/information/example_scripts": Retrieve_16s_RNA_seqs.pl;

Deletion of old versions

Old files and records in the database are deleted unless "saved".

To save a version from being deleted simply run the script: /var/opt/iseem/microbedb/scripts/save_version.pl