Wikiomics:PDB File Howto
This is a short guide to the manipulation of the files of the Protein Data Bank (PDB) . There are several issues which are related to the format of these files and to the data they contain which are not always easy to understand without some experience. The idea here is to give some hints and guidelines on what to expect from these files whether they come from the Wikiomics:PDB or were generated by a program, and how to write software which manipulates these data.
About the PDB format
The official PDB format that should be followed by the entries of the PDB which were deposited after 1996 is described on the official site of the PDB. However, one of the main concern of the maintainers of the PDB is to not modify the data which was deposited by the original authors without their explicit consent. In practice, data are usually deposited once and never touched anymore. So there are files which may contain mistakes, incorrect annotations or misinterpretation of the data which will stay there forever.
That said, the format is a clear text format which contains some minimal description of the macromolecule being studied and some notes on how the 3D model of the structure was obtained. Some of these notes are not formatted very strictly and are pretty difficult to parse reliably.
Although the format itself is sometimes not respected, and often is ambiguous - for instance which chain identifier should be given to a small ligand, when there are several chains in a protein structure? - this reflects the fact that there is simply not a unique and best way of describing the properties of the macromolecules. So the fact of having a clear text format for the PDB files is still useful in practice for 2 reasons:
- they are easily read by a human
- they are easy to write from a program and the format is easily used for representing non-standard properties such as pseudoatoms. That might be dirty but still very useful in many cases, and it does not require sophisticated software.
There are plenty of macromolecular viewers. Normally they all support the standard PDB format and are usually flexible enough to support a range of malformed PDB files that are generated by various programs that we will not enumerate.
Feel free to complete and rearrange the list of programs below:
- RasMol is a simple, fast, free and popular program for the visualization of macromolecules. It is however not adapted for producing high quality pictures suitable for publications. It uses special tricks to the projection of spheres which make it usually faster than OpenGL-based programs. Platforms: Unix, Windows, MacOS
- RasTop: molecular visualization software adapted from the program RasMol. It wraps a user-friendly graphical interface around the "RasMol molecular engine".
- Jmol is a viewer that inherits most of RasMol functionality and script language. It's free, open source, and written in Java. It has better graphics quality than RasMol and it's been very actively developed, so its feature list is surpassing that of RasMol. Platforms: Unix, Windows, MacOS, any that runs Java 1.4. Jmol Wiki; Jmol at Wikipedia.
- VMD: free, more sophisticated than RasMol, runs on Unix, Windows and MacOS. It can produce input files for many different ray tracers.
- DeepView - Swiss-PdbViewer: free, allows simultaneous analysis and superposition of selected macromolecules. Reads electron density maps. Useful in conjunction with POV-Ray.
- Accelrys Discovery Studio Visualizer (formerly Accelrys Viewer Lite, formerly WebLab Viewer Lite): free of charge, runs on Windows and Linux, suitable for making pictures of good quality, especially if used in conjunction with the POV-Ray ray tracer.
- PyMOL is a sophisticated molecular visualization system, free software. Runs on Unix, Windows and MacOS.
- Chimera is a modular, interactive molecular graphics program capable producing high quality pictures. It has large number of extensions, allowing for example for shared over internet molecular modelling sessions. Platforms: Unix, MacOS, SGI, Windows.
Other sources of information
The World Index of Molecular Visualization Resources provides some valuable information about... vizualizing molecules, making nice figures, etc.
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, and Bourne PE. The Protein Data Bank. Nucleic Acids Res. 2000 Jan 1;28(1):235-42. DOI:10.1093/nar/28.1.235 |
Bond, CS. Easy editing of Protein Data Bank formatted files with EMACS. J. Appl. Cryst. 2003; 36, 350-1 doi:10.1107/S0021889803001651
- Hooft RW, Sander C, Scharf M, and Vriend G. The PDBFINDER database: a summary of PDB, DSSP and HSSP information with added value. Comput Appl Biosci. 1996 Dec;12(6):525-9.
- Martin Jambon started this page with some general considerations about the PDB format
- links to molecular viewers were added by various Wikiomics contributors