Keating:ProcedeLog

From OpenWetWare
Jump to navigationJump to search
Information concerning the
Keating Lab
Research

Lab Members

Resources:
Publications
Experimental Protocols
Coder's Corner
Internal

Here the ongoing changes to version d of ProCEDe will be documented. Users are encouranged to read through this and this and ask questions and make suggestions.









Module/Script Ver Date Nature of Change Affected scripts/modules
ProCEDe

versioning

d 12/19/05 Before ProCEDe versioning was done manually (mainly versions 2.0 and d were separately maintained). Now, I am switching to using Subversion 1.2.3. I downloaded it from subversion.tigris.or and installed it on my machine (keating9). A subversion repository was created under /bionet/keating/local/energy/procede_svn_repository/ using the FSFS protocol. The development cycle will work as follows:
  • A developer downloads a local copy of the current code using svn update and works on it.
  • After changes are made and verified (svn stats and svn diff are useful), they are commited to the repository using svn commit.
  • The actual versions of ProCEDe that users can access (i.e. version 2.0 and d) are going to be created from time to time by svn update-ing the appropriate directions under /bionet/keating/local/energy/.
This will make it easier to develop and use the code at the same time as well as allow to go back and examine snapshots of the code at different points in time.
ENERGY.pm d 12/15/05 There was an issue of occasional slight disagreements between CHARMM and EPIC self calculations (pair calculations were always dead on). This turned out to be a concequence of how the self template was defined in EPIC/CHARMM. In both cases the self template was supposed to be whatever is specified in the template PDB file except the side chains on design sites. The way this was done in CHARMM was that design residue were put into the template with the command "patch", but removed by simply deleting their side chains. What this meant was that when the design residue was PRO or GLY, this changed the template after visiting these residue (in the case of PRO, deleting the side chain does not add amide H back; in case of GLY, deleting the empty side chain does not change the CA atom from CH2E to CH1E). This was already noticed before and taken care of for PRO (by re-loading the original structure after visiting a PRO), but for GLY this was not done. Now, I changed ENERGY::writeSelfDDE, ENERGY::writeSelfDDE_min and ENERGY::writeSelfEEF such that the same thing is also done for GLY. Now the agreement between EPIC and CHARMM self energies is great. Changes self vdW and EEF energies (slightly).
Energy.pl d 12/14/05 Energy.pl no longer calls ENERGY::writeRefSolv(). This is a completely unnecessary calculation - each aa has a constant reference, which cancels exactly with the unfolded state (ENERGY::writeRefSolv() creates both the self and unfold terms, which are the same table). The only possible use for this term is to change the definition of EEF's reference state (i.e. for each aa there is some kind of a reference desolvation energy, such that when you look at just the folded or just the unfolded energies, desolvation is relative to this reference state). However, EEF1 already has a built-in small-molecule reference state, which is quite convenient (plus, we rarely look at just folded or just unfolded energies). The bad thing about this term is that it messes up our intuition for the meaning of composition-dependent unfolded state weights (if we use one). Total energy for any structure (f minus uf) does not change, but the folded and unfolded energies will change.
toph19_eef1.inp

patchtoph19_eef1.inp

d 12/13/05 Made a change to the PRO entry in EEF1's topology file (toph19_eef1.inp). CD in PRO used to belong to the {N, CA, CD} group (i.e. was acting in place of amide H). This caused incorrect desolvation energies to be reported by inte when {sidechain} - {template} was asked for. Because EEF1 is group based when asking for sidechain-template desolvation what was really calculated was {CB CG CD N CA} - {template minus N and CA}. I moved the CD atom from its original group to the group with other side chain atoms (CB, CG, CD). A similar change was made in EEF1's patch topology file (patchtoph19_eef1.inp). Now {CB CG CD} - {template} means exactly that. The discrepancy was discovered in the process of comparing EPIC and CHARMM self energies. Affects PRO self and unfold EEF1 energies (in a way that does NOT cancel between the folded and unfolded states).
ENERGY.pm d 12/13/05 Implemented function ENERGY::writeUnfoldEPIC. It calculates all self terms that EPIC can handle. It is called from the main script (Energy.pl) and the actual terms calculated depend on the energy scheme. It works very similar to ENERGY::writeSelfEPIC (the same residue-specific hacks) except that a different template is used (and different for each design site). New function, so no behavior changes.
PDB.pm d 12/13/05 Changed PDB::writeUnfoldFragment such that the PDB files of unfoldmers have all the sites (including the central one) replaced with GLY. Also, in addition to outputting PDB and CRD files for unfoldmers, the function also outputs DUMMY formatted files. For this purpose PDB::writeDUMMY was written. All of these changes are for making communication with EPIC easier. Upon close examination, none of unfold functions rely on having a specific residue in the central position of penta-peptides. Thus, this change should not affect anything. This has been verified on a small test case for es1 and es2.
ENERGY.pm d 12/12/05 Implemented function ENERGY::writeSelfEPIC. It calculates all self terms that EPIC can handle. It is called from the main script (Energy.pl) and the actual terms calculated depend on the energy scheme. The way the function works, is it creates a template structure and writes its coordinates in DUMMY format to a file (currently rots/template.out). After this the task is: for each residue at each design site {1. figure out the exclusion lists for i-t and i-i interactions; 2. ask EPIC to calculate these given previously generated residue-specific .out files (all rotamers in one file) }. A few residue-specific hacks in this procedure are worth noticing. 1) For PRO the template does change (the amide H is supposed to be removed). But instead of doing this explicitly, I simply add {amide H} - {all sidechain atoms of PRO} atom pairs into the 1-2 exclusion list such that EPIC skips over these. Also, since some parameters for the amide H have to be given to EPIC (it expects one set of parameters for each atom in template.out), I temporarily add an amide H to the topology structure for PRO (such that it has some parameters). Which properties it has is not important because no interactions with it are ever calculated (the topology entry is removed upon finishing the PRO calculation). 2) For TYR, PHE and TRP CHARMM does not calculate any non-bond ring-ring interactions (for TRP any ring to any ring interactions are omitted). All {out of ring} - {ring} interactions are treated normally (i.e. subject to regular exclusion rules). To replicate this behavior all 1-4 ring-ring interactions for TYR, PHE and TRP are added to the 1-2 exclusion lists such that EPIC skips these interactions. New function, so no behavior changes.
ENERGY.pm d 12/11/05 Implemented functions ENERGY::interfaceEPIC and ENERGY::writePairwiseEPIC. The earlier serves as an interface to EPIC. The second is used to calculate all pairwise terms that EPIC can handle. It is called from the main script (Energy.pl) and the actual terms calculated will depend on the energy scheme. Currently, only one new energy scheme involves EPIC terms - 1p, where EEF1, vdW and DDE terms are to be handled by EPIC. Agreement with CHARMM thoroughly checked. New function, so no behavior changes.
EPIC d 12/11/05 Implemented a new package called EPIC (Engine for Pairwise Interaction Calculations) that efficiently calculates all pairwise interactions for a pair of conformation sets (i.e. a pair of amino acids with all rotamers). It can do either self or pair calculations. This package is to replace multe and much of other CHARMM stuff in ProCEDe. EPIC can handle exclusion lists and is very general and extendible. Currently, it can calculate EEF1, vdW and DDE terms. New package, so no behavior changes.
PDB.pm d 11/15/05 Implemented PDB::replaceSidechain - a function that changes the identity of an amino acid by changing its name and sidechain atoms (backbone atoms are untouched). This is useful for rotamer manipulations. NOTE: for prolines, the amide hydrogen has to be taken care of manually. Currently, for self calculations I am planning to manually add all H - {side chain} pairs into the exclusion list whenever a proline's energy is calculated. New function, so no behavior changes.
PDB.pm d 11/13/05 Implemented PDB::exclusionList - a function that can generate exclusion lists for a given set of atoms and topology information. It can handle 1-2, 1-3 and 1-4 lists. This is going to be used for moving self calculations outside of CHARMM. New function, so no behavior changes.
PDB.pm d 10/25/05 Since I am planning to start building exclusion lists in perl I need a faster way to find atoms in residues by name. PDB::getAtomInRes was modified such that it creates an atom name hash the first time it is called for each residue. Thus, after the first time each next call is a simple lookup. This new entry into the reidue has is called "atomnamehash". It is not copied by PDB::copyResidue or by PDB::cloneResidue, but this is less messy since the hash will be created later for the new residue, if it is necessary. Behavior does not change so old stuff should not be affected.
CHARMM.pm d 10/25/05 I am planning to start implementing more of the terms outside of CHARMM - like the van der Waals term. The pair components are always not a problem because there is no need for an exclusion list. However, self energies do need the bonding information. For this reason, I modified CHARMM::readCharmmTopology such that it also reads the bond list for every residue. Bonds are stored as hashes. For example, defined($res->{bonds}->{A}->{B}) means that in residue $res atom A is bonded to atom B. By construction, $res->{bonds}->{B}->{A} also has to exist. This new information is just an extra entry into the topology hash - will not change any of the old behavior.
ENERGY.pm
GENEREAL.pm
2.0 9/20/05 After testing the changes proposed below (perioding "touching" of the local directory) in the version d and making sure that it all works fine, I made the same changes in version 2.0.
ENERGY.pm
GENEREAL.pm
d 9/14/05 Currently GENERAL::createLocalSpace is supposed to check whether any of the previously made local directories have not been written to for a certain period of time (now 60 minutes). If true, one of these directories is reused. The problem is that now pair energy functions do not write CHARMM scripts, there could be no local output for quite some time. Thus if two ProCEDe jobs are running on the same machine, one launched some time after the other, the second one may end up wiping and reusing the local directory of the first one. To avoid this I wrote a function called GENERAL::touchLocalSpace, which basically does a UNIX touch on the local directory if it does not exist. This function should be periodically called from within all functions that run the risk of not producing local output for a long time. Also, a new global variable was created - $LOCAL_TIMEOUT, which keeps track of the number of minutes that need to be passed since the last write for a directory to be reused. Currently calls to GENERAL::touchLocalSpace have been placed in ENERGY::writePairwiseDDE_min, writePairwiseDDE_fast, writePairwiseEEFnoCHARMM, writePairwiseEEF_fast, and writePairwiseGB.
ALL d
2.0
09/07/05 Copied all of the development version into version 2.0 (older version 2.0 was archived and put into /bionet/keating/local/energy/old_versions/).
getStructureEnergy.pl d 09/06/05 Added the capability to break DelPhi energies (flag --dp) into sci-scj, sci self (self here means sci rf + sci-sci), sci-t and t self (again t rf + t-t). When no --fs is specified or if --fs is 0 (i.e. do a total calculation on the whole structure without breaking terms out), --dp will still calculate the total energy, which has to be equal to the sum of sci-sci, sci self, sci-t, t self (to within calculation error). It used to be that --dp output only the total energy regardless of the state of --fs. That has changed (see above).
ENERGY.pm d 09/02/05 Added two routines - ENERGY::writeSelfDDE_min and ENERGY::writePairwiseDDE_min. These are similar to their corresponding "_min"-less versions, except that a short minimization step in the context of the template is performed in CHARMM before evaluating the energies. This makes the routines pretty slow. Additionally, these routines use a constant rdie for self as well as pair interactions. The rdie can be specified as the last argument. The default is specified under "epsilon" in CHARMM.param (currently set to 4). These routines are invoked under two new energy schemes - 1s and vdws (s stands for "smooth"). 1s is the corresponding "smoothed" version of 1 and vdws is that for vdw. When 1s is used and it is desired to eliminate rotamers in stage 1 based on vdW energy, ecut_type should be specified as vdws (since straight vdw is not calculated and specifying it will cause an error). Under energy scheme 1s the following routines are also invoked: ENERGY::writeUnfoldDDE (no too much point in minimizing in the unfolded state, but can be done in the future if proves to be a problem); ENERGY::writeUnfoldDDE_fast; ENERGY::writeSelfEEF_fast; ENERGY::writePairwiseEEF_fast and ENERGY::writePairwiseEEFnoCHARMM. Under energy scheme vdws, the following routines are executed: ENERGY::writeUnfoldDDE; ENERGY::writeSelfDDE_min and ENERGY::writePairwiseDDE_min. Default behavior did not change, but now energy scheme 1s can be used.
Energy.pl d 09/02/05 Change the calls to all DDE-calculating routines to include the last parameter ($parm->{rdie}). This parameter can now be specified in the energy.conf file. If not specified, undef values will be passed down to the energy routines and the default behavior will be exhibited. Since the default behavior did not change, nothing should be effected.
ENERGY.pm d 09/02/05 Moved ENERGY::writeUnfoldDDE to ENERGY::writeUnfoldDDE_old and writeUnfoldDDE_fast to writeUnfoldDDE. This change was reflected in Energy.pl. Put a deprication statement at the top of the old routine (it no longer supports all the possible parameters). Changed ENERGY::writeUnfoldDDE, writeSelfDDE, writePairwiseDDE, and writePairwiseDDE_fast to have an additional optional parameter - the rdie to use with DDE. If not specified, undef values are sent to CHARMM::setupMulteNonbonded and CHARMM::setupMultePairNonbonded and the default behavior is exhibited (see below). The default behavior did not change, so nothing should be effected.
CHARMM.pm d 09/02/05 Changed CHARMM::setupMulteNonbonded and CHARMM::setupMultePairNonbonded such that they can optionally take the rdie constant to use with DDE. By default they use the constants specified in the CHARMM.param file (currently set to 4.0 for self and unfolded and 4/8/16 for pair depending on the amino acid identities). Since this is only useful/applicable for coiled coils, a general way of specifying an rdie is useful. The default behavior is as before, so nothing should be effected.
CHARMM.pm d 08/13/05 Communication with CHARMM is now done with a pipe. So when CHARMM dies for some reason, the OS sends a SIGCHLD signal to the perl wrapper and the process is not removed from the process table until the parent examines it with waitpid or ignores the SIGCHLD signal - this is how zombies are made. To avoid zombies I used to set $SIG{CHLD} to "IGNORE" just before printing to CHARMM in CHARMM::send such that if the child process died as a result of the print, the SIGCHLD signal would be ignored and the child process would be removed from the proccess table (and upon the next print or closing of the pipe the fact that the child had died would be discovered). At the end of CHARMM::send $SIG{CHLD} was set to "". This only worked sometime and I think this is because CHARMM would not always die immediately after a bad print statement (it maye take some time for the buffer to be flushed or the command to be processed). So if CHARMM dies when $SIG{CHLD} is "", the signal does not get taken care of and so a zombie is made. Now, I though that if in this condition you set $SIG{CHLD} to ignore, the undelivered signal should now be ignored and the zombie be cleared, but apparently this is not so (if there is no handler at the time of the signal, it does not matter if you make a handler later - the signal will not be processed). To deal with this I simply made a real child reaper (GENERAL::REAPER) according to recipe 16.9 from the Perl Cookbook and set $SIG{CHLD} to its reference in CHARMM::new. Additionally, before doing this I save the previous SIGCHLD handler in $self->{_prev_reaper} (if it is defined) and in CHARMM::DESTROY I restore the handler back to it (again, if it exists). Anything using CHARMM. This change should not affect anything on the user's side.
naccess d 08/10/05 I changed the naccess shell script such that error and warning messages are printed to STDERR. It used to be printed to STDOUT and this was an issue since ENERGY::getAtomicSA executed naccess with "> /dev/null" meaning that STDOUT was ignored. So if there was an error, it would never be seen. Printing errors and warnings to STDERR is in general better than having to examine STDOUT after every call. Anything calling naccess (ENERGY::getAtomicSA)
CHARMM.pm d 08/10/05 Now that communication with CHARMM is interactive, every write MUST be followed by a proper title ("* something\n*\n"). Otherwise, CHARMM tries to read the title and when it does not find one, it does BACKSPACE on the input stream (really stupid, but that is how it's implemented), which works for regular files, but may fail for pipes. I looked through all occurances of writes in CHARMM.pm and made sure that a proper title follows. CHARMM.pm and everything depending on it.