CHARGE DISTRIBUTIONAL ANALYSIS

The distribution of charges in the protein sequence is evaluated in terms of clusters, high scoring segments, and runs and periodic patterns. Clus- ters indicate regions of typically 30 to 60 residues exhibiting  a  rela- tively high charge concentration. For high scoring charge segments, posi- tive scores are assigned to charge residues of the appropriate  type  and negative scores  to all other residues. A significant cumulative positive score again indicates a region of high charge concentration. The cluster method and  the  scoring method will generally pick out the same segments (with the scoring method often  delimiting  the  segment  to  a  narrower range),  conferring  robustness  to  the  results. Short segments of high charge concentration are displayed as runs (with errors). Periodic pat- terns focus  on  those  with charges every second or third position, with possible relevance to amphipathic secondary  structures;  other  periodic patterns are displayed in the general periodicity analysis section of the output.

1 00+00+0++0 0000000000 0000000000 0000-00+00 000+-00000 0-000000-0         61  0000000000 0+0000-000 +00-0-0-0- 0000-0++00 0000000000 00000-0000        121  0-0000+000 +0-00+000- 0000-00-00 0000-0000+ 000+000-00 000000000-        181  -00+000+00 0000+0-000 000+00-000 -00-000+00 00000-000- --0-00-00+        241  0+0-00-0+0 000+000++0 00000-0000 -+-00000-0 --00+0+00- 0000+0000+        301  000000+000 000000-000 +000000000 0-0000000+ -000-00000 -000000000        361  000-0--000 00000000+0 00000000-0 0000--0000 0000000++0 0+000+-0+0        421  000000++00 0000-00000 00000+0000 0000000000 00000000-- 00000-0000        481  0000000000 00000000-0 0000000000 0000000000 000++0-00- 00000000+0        541  -000000000 0-+-0---00 00000+000+ -0+0+00-+- 0+00000000 000--00-+0        601  00000+0000 000-000000 00000000+0 +000+0-000 00000+0-00 000000-000        661  000000000- -0-000+000 00000-+000 0-000-0+00 000000-000 0000000000        721  0+0000000+ 000-0+0000 000-0+0000 0+-0000000 00+0--0000 0000-00000        781  000000-000 00+0000000 000000000- 000-000000 00-0+-00+0 0000000000        841  000+-00000 00000000+- ++000000+0 00+000-00+ 00+0000000 00000--000        901  0000000000 00000+00-0 -000000000 0-000000+0 -0000000-0 0000000000        961  0000000000 -0-000-000 0-0000+000 0000000000 -000000000 0000000000       1021  00-0000000 --0+000000 00+0000000 0000000000 0000000-00 0000000000       1081  0000000000 00+0000000 0000000000 000-000000 0-00000000 0000000000       1141  0000000000 0000000000 -000000000 0000000000 00000+000+ 0000000000       1201  -+000+0000 -+0000000- 00000-0000 00000-0-00 0000000000 00-0000000       1261  -000000+

A. CHARGE CLUSTERS.

Positive, negative, and mixed charge clusters are distinguished. In each case, cmin indicates the minimum number of charges required for a signifi- cant charge cluster corresponding to the given window size; e.g., cmin  = 9/30 or 12/45 or 15/60 means that significance requires at least 9 charges in a segment of 30 (or fewer) residues, or 12 charges  in  a  segment  of length  45,  or 15 charges in a segment of length 60. In the case of posi- tive and negative charge clusters, these counts refer to net charge, i.e., charges of  the  opposite  sign  within the window are counted as -1. The sizes of the clusters are optimized for display to indicate the segment of highest charge  concentration,  but  a  minimum  size  of  20 residues is required. A mixed charge cluster that begins and ends within 15 residues of the endpoints of a pure charge cluster is not displayed (since its sig- nificance rests mostly on the charged residues comprising  the  displayed pure charge cluster), unless the -v (verbose output) flag is set, in which case both the pure and the mixed charge cluster  are  displayed. On the other hand,  pure charge clusters that are embedded in mixed charge clus- ters are displayed separately (indicated by a * preceding the  specifica- tion of location). For each cluster are given its location in the sequence (From,  to), the quartile  of  the  location  (1st,  2nd,  3rd,  or 4th quarter of the sequence), length, count, and t-value (standard deviations above the mean; to  accommodate  the  multiple  tests  performed, the t-value significance threshold is set to 4.0 for sequences up  to  750  residues,  to  4.5  for sequences  of  length 750-1500 residues, and to 5.0 for longer sequences); also indicated are residues comprising at least 10% of the cluster.

Positive charge clusters (cmin = 8/30 or 11/45 or 13/60):  none

Negative charge clusters (cmin = 10/30 or 13/45 or 16/60): none

Mixed charge clusters (cmin = 14/30 or 18/45 or 23/60):

1) From 552 to  582:   ERDGEEEAAAQYGSKLNGREYKVKVLDKDGK                         -+-0---0000000+000+-0+0+00-+-0+    quartile: 2; size: 31, +count:  7, -count:  8, 0count: 16; t-value:  4.70 *    G:  4 (12.9%);  K:  5 (16.1%);  E:  5 (16.1%);

B. HIGH SCORING (UN)CHARGED SEGMENTS.

For each scoring scheme (scores assigned to residues as displayed),  SAPS displays segments of the sequence with aggregate score exceeding the par- ticular threshold values M_0.01 (1% significance level, segments labeled with  **),  M_0.05 (5% significance level, segments labeled *), or other- wise as indicated. A minimal segment length is set as shown. The expected score/letter should be sufficiently large negative, and the average infor- mation per letter should be sufficiently large positive in order for  the scoring statistics to apply properly (the program prints out when the con- ditions are not met and skips evaluations).

______________________________________ High scoring positive charge segments:

score=  2.00 frequency=   0.072  ( KR ) score=  0.00 frequency=   0.000  ( BZX ) score= -1.00 frequency=   0.832  ( LAGSVTIPNFQYHMCW ) score= -2.00 frequency=   0.096  ( ED )

Expected score/letter: -0.881;    Average information/letter:   1.973 Minimal length of displayed segments set to: 20

M_0.01= 9.33  (cv=  6.16, lambda=  1.15953, k=  0.39565, x=  3.17;                90% confidence interval for segment length:   8 +-   6) M_0.05= 7.92  (x=  1.76)


 * 1) of segments (>=20 residues) exceeding M_0.05: none

______________________________________ High scoring negative charge segments:

score=  2.00 frequency=   0.096  ( ED ) score=  0.00 frequency=   0.000  ( BZX ) score= -1.00 frequency=   0.832  ( LAGSVTIPNFQYHMCW ) score= -2.00 frequency=   0.072  ( KR )

Expected score/letter: -0.783;    Average information/letter:   1.431 Minimal length of displayed segments set to: 20

M_0.01= 10.95 (cv=  7.33, lambda=  0.97470, k=  0.34200, x=  3.62;                90% confidence interval for segment length:  11 +-   9) M_0.05= 9.28  (x=  1.95)


 * 1) of segments (>=20 residues) exceeding M_0.05: none

___________________________________ High scoring mixed charge segments:

score=  1.00 frequency=   0.168  ( KEDR ) score=  0.00 frequency=   0.000  ( BZX ) score= -1.00 frequency=   0.832  ( LAGSVTIPNFQYHMCW )

Expected score/letter: -0.664;    Average information/letter:   1.533 Minimal length of displayed segments set to: 20

M_0.01= 6.94  (cv=  4.47, lambda=  1.60000, k=  0.52997, x=  2.48;                90% confidence interval for segment length:  10 +-   7) M_0.05= 5.93  (x=  1.46)


 * 1) of segments (>=20 residues) exceeding M_0.05: none

________________________________ High scoring uncharged segments:

score=  1.00 frequency=   0.832  ( LAGSVTIPNFQYHMCW ) score=  0.00 frequency=   0.000  ( BZX ) score= -8.00 frequency=   0.168  ( KEDR )

Expected score/letter: -0.512 Average information/letter:  0.065 < .10; too small !

C. CHARGE RUNS AND PATTERNS.

The table below shows the charge runs and patterns searched for (* stands for  +  or  -)  and  the required minimum number of matches to the pattern allowing for at most 0 (lmin0), 1 (lmin1), or  2  (lmin2)  mismatches  or insertions/deletions (1% significance level). Occurrences are arranged in the order in which they appear in the sequence. For each run  or  pattern are displayed  its  length  (number  of matches) and a triplet giving the number of mismatches, insertions and deletions. 0-runs are further charac- terized by  their  composition (residues comprising more than 10% of the run). Run count statistics are compiled for runs of lengths at least 2/3 of the minimal significant length (lmin0); given are the number and locations of such runs.

pattern (+)|  (-)|  (*)|  (0)| (+0)| (-0)| (*0)|(+00)|(-00)|(*00)| (H.)|(H..)| lmin0    4 |   5 |   6 |  54 |   9 |  10 |  13 |  11 |  13 |  16 |   5 |   5 | lmin1    6 |   6 |   8 |  65 |  11 |  12 |  15 |  14 |  15 |  19 |   6 |   7 | lmin2    7 |   8 |   9 |  73 |  13 |  14 |  17 |  16 |  17 |  21 |   7 |   8 | (Significance level: 0.010000; Minimal displayed length: 6) There are no charge runs or patterns exceeding the given minimal lengths.

Run count statistics:

+ runs >=   3:   0 - runs >=   3:   2, at  230;  556; * runs >=   4:   1, at  859; 0 runs >=  36:   1, at 1123;