Data Files

Drosophila 16S Paper

Sequence files

 * These are the quality-trimmed reads that did not have enough overlap to assemble (thus "frags").
 * These are the quality-trimmed reads that did not have enough overlap to assemble and had a significant blast hit to the "left" side of a reference 16S sequence.
 * These are the quality-trimmed reads that did not have enough overlap to assemble and had a significant blast hit to the "right" side of a reference 16S sequence.


 * These are the reads that assembled into complete clones using the JGI's 16S pipeline (genelib).


 * NAST-aligned sequences from the Corby-Harris paper.


 * NAST-aligned sequences from the Cox and Gilmore paper.

[[Media:Unclassified.fasta.gz]]
 * A fasta file of all the Clean Chimera-checked sequenced that were unclassified at the Genus level --James Angus Chandler 19:16, 24 April 2009 (EDT)

Taxonomy Assignments

 * OLD, not quality-trimmed data: there are three files here, each corresponds to one of the three chimera-checked sequence files above (i.e., putative, sub-threshold, and clean)


 * 1) clean sequences [[Image:Classifications All NAST.Bclean.fasta30593.xls]]
 * 2) sub-threshold chimeric sequences [[Image:Classifications All NAST.Bambig.fasta27959.xls]]
 * 3) putative chimeric sequences [[Image:Classifications All NAST.Bchimera.fasta28223.xls]]

Alignment

 * below is the most current version of the NAST-formatted alignment file.

this is the older, not quality-trimmed file. It contains all of our sequences, plus the Corby-Harris and Cox-Gilmore sequences.




 * This is a concatenated alignment with the quality-trimmed data. Each half was aligned using the NAST aligner and then both halves were concatenated. There is a reference sequence in there (called testseq) that was used to decide where to end the left half and begin the right half before concatenating. This alignment does not include all of the full-length sequences that were assembled with genelib.


 * Infernal Aligned reads and sequences
 * Left half [[Media:Infernal_aligned_reads.left.fa.zip]]
 * Right half [[Media:Infernal_aligned_reads.right.fa.zip]]
 * Merged [[Media:Infernal_aligned_reads.merged.fa.zip]]
 * Merged and Masked across the whole length [[Media:Infernal_aligned_reads.merged.masked.0.8.bz2‎]]
 * Merged and Cleaned [[Media:Infernal_aligned_reads.merged.cleaned.fa.zip‎ ‎]]: Removing selected columns(1-11, 642-806, 1400-1524) and sequences shorter than 300.

Redoing

 * File were reprocessed to change sequences ids into a uniform format
 * Format LIB_SNO[L/R]
 * Sequence ID key [[Media:Fly.seqid.key.xls.zip‎]]


 * Raw files with new seq ids
 * Left half [[Media:FLY.new.left.fa.zip]]
 * Right half [[Media:FLY.new.right.fa.zip]]
 * All [[Media:FLY.new.seqs.zip]]
 * Renamed sequence ids in alignments
 * Left half [[Media:Infernal_aligned_reads.new.left.fa.zip]]
 * Right half [[Media:Infernal_aligned_reads.new.right.fa.zip]]
 * Merged [[Media:Infernal_aligned_reads.new.merged.fa.zip]]
 * Removing libereries
 * Libraries removed: NEG, N/XXN, DmW_TurPh/TPH, 3A/03A, 3B/03B
 * Left half [[Media:Infernal_aligned_reads.new.sellib.left.fa.zip]]
 * Right half [[Media:Infernal_aligned_reads.new.sellib.right.fa.zip‎]]
 * Merged [[Media:Infernal_aligned_reads.new.sellib.merged.fa.zip‎]]
 * Chimera removed
 * Left half [[Media:]]
 * Right half [[Media:]]
 * Merged [[Media:]]
 * Alignment cleaned
 * Removing the missing middle chunk and the terminal columns to reduce non-overlapping sequencing bias
 * Left half [[Media:]]
 * Right half [[Media:]]
 * Merged [[Media:]]
 * Taxonomy file
 * Sequence renamed
 * Selected libraries removed (see above)
 * Chimera removed
 * Discrepancies between the left and right flagged

Metadata Files
Here is the environment file you asked for. Look it over to tell me if I need to add anything. I left ??? for the Cox-Gilmore samples since I cannot seem to find my copy of it and thus do not know what they collected over.

[[Media:MainEnvFile.xls]] --James Angus Chandler 20:45, 19 May 2009 (EDT)


 * This file allows the translation from JGI clone IDs to our sample IDs.

Nearly Complete Taxonomy and Alignment
[[Media:NoWolb_noNNs_cleaned_noTurrs_noDescrepsAKAfinal.fasta‎ ]] [[Media:Infernal_RDP_Correct%_noWolb_noNNs_noTurrs_noDescreps_noMissingsAKAfinal.xlsx]]

CalTech Presentation
[[Media:CalTech_Presentation.ppt]] --James Angus Chandler 00:56, 11 June 2009 (EDT)