Home | RAPPERtk | People | RamPage | Download | Docs | FAQ | Search | RSS feed | Contact Us Bioinformatics Group
University of Cambridge 
Protein Structure Modelling  

X-ray refinement HOW-TO

RAPPER can be used to automatically identify and rebuild poorly fit regions in a protein structure given an electron density map. Additionally, the rappermc.py script can be configured (with the appropriate arcane commands) to iteratively rebuild and refine structures against reflection data.

Please join the RAPPER modelling Yahoo! group to keep up to date with the continually evolving RAPPER project. If you have experienced problems with RAPPER or have questions, please take a look at the RAPPER FAQ.

We have finally finished a more user-friendly version of the rappermc.py scripts. You can obtain the newest version of RAPPER include the rappermc.py scripts, from the downloads section of this website. Nevertheless, rappermc.py requires a serious investment in time and experience to configure and run correctly. Please don't become discouraged or give up if you hit some snags! Instead, please email us at rapper@cryst.bioc.cam.ac.uk and we'll try to help you along with the process of setup and running RAPPER for refinement.

Introduction

This document describes how to install, setup, and run RAPPER-based refinement with CNS. If you experience problems or have questions, please email us at rapper@cryst.bioc.cam.ac.uk.

RAPPER can be used to automatically identify and rebuild poorly fit regions in a protein structure given an electron density map. Additionally, the rapper.py script can be configured (with the appropriate arcane commands) to iteratively rebuild and refine structures against reflection data. This document describes how to install, setup, and run RAPPER-based refinement with CNS. If you experience problems or have questions, please email us at rapper@cryst.bioc.cam.ac.uk.

Note that these protocols are those used in DePristo et al., Structure, 2005, but may be updated to reflect the continuing development of RAPPER.

Outline

This HOW-TO details how you go about setting up your crystallographic data for rebuilding and refinement with RAPPER. Fundamentally, refinement with RAPPER requires a number of things (listed following). This example is based on refinement of the PDB structure 9ILB. The 9ILB example file and rappermc.py scripts are provided in the current RAPPER distribution or you can just fetch the data files for this structure here 9ilb.tgz [6.29 MB].

An installed RAPPER distribution (see downloads), including the scripts directory rappermc/bin. Let's suppose you have installed rapper into your home directory, so that ~/rappermc is the RAPPER root. You also need a version of python to run the automated refinement scripts. We use 2.3 here, but it should work with 2.4 too (I hope).

An initial structure, encoded in a file in the PDB format. Let's call it 9ilb/9ilb.pdb [117.23 KB].An structure options file, called 9ilb/9ilb_options.inp [271 B], that describes basic crystallographic parameters about your structure, like its resolution, the unit cell dimensions, and symmetry group. This file is in the CNS format, and is prepended by the rappermc.py scripts to run CNS refinement:

{===>} sg="P4(3)";
{* unit cell parameters in Angstroms and degrees *}
{+ table: rows=1 "cell" cols=6 "a" "b" "c" "alpha" "beta" "gamma" +}
{===>} a=55.02;
{===>} b=55.02;
{===>} c=77.16;
{===>} alpha=90.00;
{===>} beta=90.00;
{===>} gamma=90.00;
{===>} high_res= 2.30;

Finally, you need a CNS HKL file containing the crystallographic reflections, 9ilb/9ilb.hkl [754.58 KB].

The rappermc.py script and configuration file

Although the RAPPER binary -- a C++ program -- is good for rebuilding structures, we use a high-level python scripted called rappermc.py to manage rebuilding and refinement. This program reads in a configuration file provided by the user (you) and executes a series of rebuilding and refinement steps based on the command in this file. We have conventionally called this file defaults.cfg. Most options in the file are not necessary to change, but some are essential and must be tuned to your particular structure and needs.

There are some basic parameters in defaults.cfg that must be setup properly for rappermc.py to function at all. The are those listed below, exactly as they appear in the provided 9ilb/defaults.cfg [17.53 KB] file in the rapper distribution:

;; -------------------------------------------------------------------
;; Basic configuration options for rapper refinement
;;

;; The path to the rappermc distribution
root: /Users/mdepristo/research/rapper-dist/rappermc

;; Path to the CNS binary. Note that these must be the full paths, not aliases
cns-binary: /sw/share/xtal/cns_solve_1.1/mac-ppc-darwin_g77/bin/cns_solve

;;
;; The name of the input PDB file. This is the structure to be refinement
;; and rebuilt, or, if initial model sampling is enabled, the structure
;; that is sampled, and than rebuilt/refined
;;
pdbfile: 9ilb.pdb

;; The resolution of the crystal -- needed for CNS calculations
resolution: 2.3

;; A file that contains sg and a,b,c,alpha,beta,gamma assignments (see examine in rappermc/bin)
cns-options-file: 9ilb_options.inp

;; the CNS reflection file containing the crystallographic observations
cns-reflection-file: 9ilb.hkl

To configure RAPPER for your needs, copy the provided defaults.cfg into a directory where your structure, reflections, and option files reside. Change the parameters to reflect your setup (root, cns-binary, file names). After those changes, you should decide where you what you want RAPPER do due during refinement. There are two basic approaches provided by the default.cfg script.

One, RAPPER takes the provided PDB structure, and runs 20 cycles of RAPPER rebuilding and CNS refinement. Two, RAPPER generates a number of alternate structures near your provided PDB structure, and runs refinement/rebuilding as above on each structure. The first process can easily take overnight, depending on the size of your structure and processor speed, while the second may take weeks, depending on how long a single refinement/rebuilding process requires and how many samples you consider. By default RAPPER refines and rebuilds only the provided structure. If you want to enable initial sampling, change the build-models parameter to 1 (i.e., true) and models to the number of samples to generate.

;; IMPORTANT PARAMETERS --
;; should we RAPPER to generate initial samples of the structure which are
;; each rebuilt/refined with RAPPER?
;;
;; The number of models to generate during the build-models stage. If
;; build-models is 1, this should be around > 0. The Structure 2005 paper
;; generated 5 models. Note that rebuilding/refinement is performed on each
;; sample independently, so 5 initial samples requires 5x as long to run.
build-models: 0
models: 5

Running rappermc.py

Once you have installed rappermc.py, created your options file, and modified the default.cfg configuration file to suit your needs, you are ready to run rappermc.py. Continuing with our example, you can simply execute rappermc.py with the configuration file:

./rappermc.py -c defaults.cfg

You can see the output of rappermc.py for the refinement of 9ILB here 9ilb/RESULTS/log [16.75 KB]. During the course of a refinement and rebuilding series, rappermc.py will execute CNS to perform simulated annealing and, gradient descent refinement, calculate simulated annealing omit maps, and call upon the stand-alone RAPPER program to rebuild sections of your structure that don't fit well into the omit map.

Whoa, that's a lot of output...

rappermc.py is fairly verbose, saving all intermediate log file, structure files, and maps. There is a scheme to understanding the output files, though. First, if you enabled sampling, then all of the files will be in the TESTRUNS subdirectory, called looptest-0.pdb, looptest-1.pdb, etc. for the number of models you requested.

From this moment onwards, i.e., during refinement, rappermc.py names your output files based on their ID. The ID of the above samples are 0, 1, etc. and your input PDB structure has ID 'pdb'. The initial refinement file is called refine-ID-0.pdb, which is just a copy of your PDB structure or looptest-X.pdb files. Each subsequent refinement or rebuilding step produces files named refine-ID-(i+1).pdb. So the next refinement step of your PDB structure produces files starting with refine-pdb-2. In the 9ILB example, this is:

9ilb/RESULTS/refine-pdb-2_makemap.log [336.12 KB]
9ilb/RESULTS/refine-pdb-2.map [7.49 MB]
9ilb/RESULTS/refine-pdb-2_edm_rebuild [4 KB]
9ilb/RESULTS/refine-pdb-2.log [166.59 KB]

Here refine-pdb-2_makemap.log is the SA omit map output from CNS, refine-pdb-2.map is the SA omit map, refine-pdb-2_edm_rebuild is the directory where RAPPER puts the rebuilding output, and refine-pdb-2.log is the RAPPER output log.

Summary output

Once rappermc.py finally finishes, it writes out a lot of summary information files, that contain the R and Rfree factors for each structure, and the mainchain, all-atom RMSDs, and chi1 < 60 degrees values of each structure with respect to the provided PDB file:

9ilb/RESULTS/rfree_cns.dat [310 B]
9ilb/RESULTS/r_cns.dat [310 B]
9ilb/RESULTS/mc-rmsd.dat [310 B]
9ilb/RESULTS/all-atom-rmsd.dat [310 B]
9ilb/RESULTS/chi1-percent.dat [319 B]

Really, the 9ilb/RESULTS/rfree_cns.dat [310 B] file is the most useful, as it lists in the simple tabular format the free R factor during each step of refinement.

rappermc.py also writes out a number of aggregated model files:

9ilb/RESULTS/refine-models-of-0.pdb [393.85 KB]
9ilb/RESULTS/refine-models-of-pdb.pdb [387.95 KB]
9ilb/RESULTS/refine-models-0.pdb [195.39 KB]
9ilb/RESULTS/refine-models-1.pdb [193.86 KB]
9ilb/RESULTS/refine-models-2.pdb [196.28 KB]
9ilb/RESULTS/refine-models-3.pdb [196.41 KB]
9ilb/RESULTS/final-models.pdb [196.41 KB]

9ilb/RESULTS/final-models.pdb [196.41 KB] contains the final model produced by RAPPER refinement/rebuilding for each initial model as a multi-model PDB file, numbered according to their order in the rappermc.py models list (i.e., model 0, 1, etc. to the PDB structure). The file series refine-models-of-STEP.pdb contains all of the models generated as refinement step STEP, with the same model numbering as above. In the simple 9ILB example, we have the initial models (i.e., PDB model and RAPPER sample) 9ilb/RESULTS/refine-models-0.pdb [195.39 KB] to the third and final step 9ilb/RESULTS/refine-models-3.pdb [196.41 KB]. Finally, there is a multi-model file that contains the models from each sucessive refinement step, called refine-models-of-ID.pdb, where the first step produces model 0, the second 1, etc, showing you in effect the refinement trajectory followed by RAPPER. In the 9ILB example, we have two such files, 9ilb/RESULTS/refine-models-of-0.pdb [393.85 KB] and 9ilb/RESULTS/refine-models-of-pdb.pdb [387.95 KB].

rappermc.py takes forever, or how I learned to stop worrying and go on vacation

rappermc.py runs a lot of computationally intensive programs on your behalf, including simulated annealing refinement with CNS, creation of SA omit maps with CNS, and x-ray restrained conformational sampling with RAPPER. It does these three things up to 20 times for each model generated, from 1 if you just provided a single structure, to 10+ models if you ask RAPPER to generate initial CA-trace samples of your structure. On my dual G5, it takes around 10-20 hours to run the 9ILB example with a single CA-trace sample and only 1 round of rebuilding!

There are two important things to remember. One, the computer is your slave, and never grows tired of running calculations. So you can always fire up rappermc.py and go on vacation, while still technically working. Don't say we didn't do our part to improve your quality of life.

If you are not this patient, however, there are several things working in your favor and several active interventions you can attempt. First, as described in the DePristo et al. Structure paper, the amount of work rebuilding goes down exponentially with rebuilding/refinement cycle. In other words, the first step is always the longest, with each successive step getting shorter and shorter. So maybe it takes a day to do the first step, but only a week to do 25 steps.

If this still isn't good enough and you need something sooner, and are willing to sacrifice some quality, you can play with some parameters in the defaults.cfg file. Among the biggest time costs is RAPPER sampling, the behavior of which is governed by the parameter additional-edm-rebuild-args. The default value is the mildly-hardcore parameter set. If you want more performance, step it down to the good set with the Lovell library or even the fast and loose values.

;;
;; super-hardcore parameter set
;;
;;additional-edm-rebuild-args: --sidechain-radius-reduction 0.75
--fix-mislabeled-atoms false --write-user-remarks false
--mainchain-restraint-threshold 2 --restraints-are-pass-optional true
--chi-squared-electron-density-scoring true --cryst-d-high %(resolution)s
--edm-poor-region-stddev-threshold %(edm-poor-region-stddev-threshold)s
--edm-poor-region-buffer-size 0 --edm-fit true --sidechain-library
RAPPER-DIR/data/scl-B30-occ1.0-rmsd0.4-chi60.pdb
--enforce-mainchain-min-sigma-restraints true
--enforce-sidechain-min-sigma-restraints false --edm-mainchain-min-sigma 0.5
--optional-edm-mainchain-restraints true

;;
;; mildly-hardcore parameter set
;;
additional-edm-rebuild-args: --sidechain-radius-reduction 0.75
--fix-mislabeled-atoms false --write-user-remarks false
--mainchain-restraint-threshold 2 --restraints-are-pass-optional true
--chi-squared-electron-density-scoring true --cryst-d-high %(resolution)s
--edm-poor-region-stddev-threshold %(edm-poor-region-stddev-threshold)s
--edm-poor-region-buffer-size 0 --edm-fit true --sidechain-library
RAPPER-DIR/data/scl-B30-occ1.0-rmsd0.4-chi60.pdb

;;
;; use good fitting parameters, but with the Lovell rotamer library
;;
;; additional-edm-rebuild-args: --sidechain-radius-reduction 0.75
--fix-mislabeled-atoms false --write-user-remarks false
--mainchain-restraint-threshold 2 --restraints-are-pass-optional true
--chi-squared-electron-density-scoring true --cryst-d-high %(resolution)s
--edm-poor-region-stddev-threshold %(edm-poor-region-stddev-threshold)s
--edm-poor-region-buffer-size 0 --edm-fit true

;;
;; fast and loose parameters -- not explicit side chain fitting
;;
;; additional-edm-rebuild-args: --sidechain-radius-reduction 0.75
--fix-mislabeled-atoms false --write-user-remarks false
--mainchain-restraint-threshold 2 --restraints-are-pass-optional true
--chi-squared-electron-density-scoring true --cryst-d-high %(resolution)s
--edm-poor-region-stddev-threshold %(edm-poor-region-stddev-threshold)s
--edm-poor-region-buffer-size 0

© 2001-2006 The RAPPER Team 
[Powered by FreeBSD]