MicrobeDB
Home | Project | People | News | For Team | Calendar | Library |
MicrobeDB
MicrobeDB is the local storage repository of all RefSeq microbial genomes (Bacteria and Archaea) from NCBI. All genomes are downloaded monthly from NCBI, stored locally on Edhar and are parsed into a MySQL database. A perl API has also been written to query the MySQL database.
MicrobeDB Flat Files
All NCBI_genomes from the NCBI FTP site are downloaded once a month and are stored in a new folder on Edhar at: /var/opt/iseem/NCBI_genomes/.
Each monthly folder is in the form of "Bacteria_YYYY-MM-DD" where the dates correspond to the download dates. The most recent download directory is always sym linked to "Bacteria"
MicrobeDB MySQL
- MySQL database to store genome project, replicon (chromosome and plasmid), and gene information.
- The database can be accessed using:
username:perlapi
password:microbedb
database name:microbedb
Connect via command line: "mysql -uperlapi -pmicrobedb"
or using the web interface: phpMyAdmin
Information
- Version - Each monthly download from NCBI is given a new version number
- Advantages
- Data will not change if you always use the same version number of microbedb
- Version date can be cited for any method publications
- Advantages
- Disadvantages
- Data is redundant in the database (e.g. multiple versions of the same gene)
- A version number (version_id) must always be used when retrieving information otherwise multiple copies will be returned
- Disadvantages
- Genomeproject
- Contains information about the genome project and the organism that was sequenced
- E.g. taxon_id, org_name, lineage, gram_stain, genome_gc, patho_status, disease, genome_size, pathogenic_in, temp_range, habitat, shape, arrangement, endospore, motility, salinity, etc.
- Each genomeproject contains one or more replicons
- Replicon
- Chromosome or plasmids
- E.g. rep_accnum, definition, rep_type, rep_ginum, cds_num, gene_num, protein_num, genome_id, rep_size, rna_num, rep_seq (complete nucleotide sequence)
- Each replicon contains one or more genes
- Gene
- Contains gene annotations and also the DNA and protein sequences (if protein coding gene)
- E.g. gid, pid, protein_accnum, gene_type, gene_start, gene_end, gene_length, gene_strand, gene_name, locus_tag, gene_product, gene_seq, protein_seq
Microbedb Perl API
Information and Perl Modules are located on Edhar at: /var/opt/iseem/perl_modules/MicrobeDB
See example in "/var/opt/iseem/perl_modules/MicrobeDB/information/example_scripts": Retrieve_16s_RNA_seqs.pl;
Deletion of old versions
Old files and records in the database are deleted unless "saved".
To save a version from being deleted simply run the script: /var/opt/iseem/microbedb/scripts/save_version.pl