User:Morgan G. I. Langille/Notebook/Unknown Genes/2010/09/23

From OpenWetWare
Jump to: navigation, search
Owwnotebook icon.png Unknown Genes <html><img src="/images/9/94/Report.png" border="0" /></html> Main project page
Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html>

Filtering pfam vs metagenomic sample counts

  • 9810 PFams (out of 11K?) have at least one protein in one of the samples from the "Camera Proteins" dataset
  • However, calculating correlations or ecological distance measurements results in the pfams with very low numbers to appear to have high correlation.
  • To start to filter out these pfams without many counts I plotted the sum of the pfam counts across all samples (ranging from 1 to 209446 (ABC_trans of course))
  • Doesn't really give a good clear cutoff for using row sum or diversity index (e.g. sum row > 50 will remove many that have a high diversity index. vice versa for using diversity cutoff).

//R Code




row.sums <- apply(x, 1, sum)


//To filter a list in R use the "Filter" function


//this tells us that 4018 pfams are still left with a row sum size of greater than 100

Sum vs shannon diversity for pfams.png