User:Pedrobeltrao/Notebook/Structural analysis of phosphorylation sites/data collection and stats

Data collection
Phosphorylation data for S. cerevisiae was collected from previously published phosphoproteomic datasets. Protein domain predictions were obtained from PFAM, disorder predictions from disEMBL and homology models for a large fraction of the S. cerevisiae proteome were obtained from modbase. Only models with more than 30% identity to the template structure were kept. Conservation of the phosphorylation site was determined by alignment with orthologs in other species with available phosphorylation data. For the following general analysis we considered that a S. cerevisiae phosphorylation site was conserved if the ortholog had a phosphorylation site within 10 alignment positions of the S. cerevisiae phosphorylation site.

Below is the current number of S. cerevisiae phosphorylation sites, the fraction that can be mapped to domains/structures and their conservation in other species. This will be updated as necessary. As expected from previous studies the phosphorylation sites within domains/structures are more likely to be conserved than average phosphorylation sites. This could be explained by a lower alignment uncertainty within these regions and potentially a higher fraction of functional sites.