User:Morgan G. I. Langille/Notebook/Unknown Genes/2010/09/23

{| width="800"
 * style="background-color: #EEE"|[[Image:owwnotebook_icon.png|128px]] Unknown Genes
 * style="background-color: #F2F2F2" align="center"|  |Main project page
 * style="background-color: #F2F2F2" align="center"|  |Main project page


 * colspan="2"|
 * colspan="2"|

Filtering pfam vs metagenomic sample counts

 * 9810 PFams (out of 11K?) have at least one protein in one of the samples from the "Camera Proteins" dataset
 * However, calculating correlations or ecological distance measurements results in the pfams with very low numbers to appear to have high correlation.


 * To start to filter out these pfams without many counts I plotted the sum of the pfam counts across all samples (ranging from 1 to 209446 (ABC_trans of course))
 * Doesn't really give a good clear cutoff for using row sum or diversity index (e.g. sum row > 50 will remove many that have a high diversity index. vice versa for using diversity cutoff).




 * }