Eigencluster: Project Description
Professor Santosh Vempala (MIT), R. Kannan (Professor, Yale), and G. Wang (grad student, MIT) through Deshpande Center
Massive amounts of data are now available in many fields, e.g., a company's sales data or the human genome project in biology. There is no doubt that valuable information lies in this data. However, its sheer size impedes finding innate trends and patterns. Our work addresses this problem via an effective method to cluster data; clustering data organizes it into a small number of distinct homogeneous groups. For instance, clustering genes in biology can result in groups of genes that are similar in function. Our method overcomes obstacles faced by existing clustering techniques: (i) it views the data in a global (rather than local) manner, (ii) its performance has rigorous mathematical guarantees for finding a good clustering of the data; existing techniques have no such guarantee and could output poor clusterings and (iii) it is designed to be a general clustering method, applicable to a wide range of data types, e.g., the WWW or portions of it, a library or company's database, market data etc.. For a biotech company, the impact of an effective clustering method might be discovering important correlations among drugs. For a retail business, clustering data can reveal new customer groups, allowing one to base business decisions on aggregate patterns in data. Our business model is to provide enterprises with both the clustering method and the ability to tailor its options and parameters for specific goals.