Can one apply a principle-based method for protein structure ...
Protein Structure Prediction by Global Optimizatio n and its Application to Biological Systems Jooyoung Lee http://lee.kias.re.kr Center for In Silico Protein Science Korea Institute for Advanced Study Seoul, Korea Oct.12, 2009 Physics-Based Protein Modeling vs. Template-Based Modeling -- CASP7 & CASP8 High-Accuracy Protein Modeling by Global Optimization Accurate Protein 3D Modeling Better Understanding of Biology? KIAS Protein Folding Laboratory http://lee.kias.re.kr Proteins are important Anti-bodies
Enzymes Proteins Control all cellular processes hormones sequence (genome) scientific bottleneck structure (post-genome) The most challenging problem of this century (Protein Folding Problem)
function KIAS Protein Folding Laboratory http://lee.kias.re.kr Human hemoglobin Protein folding problem sequence structure For a given amino acid sequence (of size n), find the native structure of the protein. Total # of protein structures: 10n mathematically NOT well defined problem
function Protein folding problem 1. Protein Structure Prediction: For a given protein sequence, to determine its 3D structure by computation 2. Protein-Folding Mechanisms: By what process does a protein folds into its native and biologically active conformation? 3. Inverse Folding: For a given protein structure, to design its 1D sequence Protein Structure Prediction 1. Physics-based approaches: Principle based-modeling Accurate potential energy function Powerful global optimization method what we can do better than others Ab initio, de novo, new fold targets (10-20%) 2. Informatics-based approaches: Template based-modeling Map the original problem to a problem with solution mapping problem (alignment problem)
Use templates (problems with solutions) to obtain the solution of the original problem (multiple alignment) Comparative modeling, fold recognition (80-90%) Narrows the search while maintaining diversity of sampling. Annealing in conformational space. Conformational Space Annealing Examples of successful optimizations Optimization of ECEPP/3 for a 20-residue membrane-bound portion of me littin [Biopolymers 46, 103-115 (1998)] Unbiased global optimization of Lennard Jones clusters up to N =201 [Ph ys Rev Lett 91, 080201 (2003)] Ground state in the frustrated XY model and lattice coulomb gas with f =1/ 6, [Physica A 315 314-320 (2002)] Conformational space annealing and an off-lattice frustrated model protei
n, [J Chem Phys 119 10274-10279 (2003)] Structure optimization of an off-lattice AB protein model [Phys Rev E 72 011916 (2005), Submitted] Efficient molecular docking using conformational space annealing, [J Co mput Chem 26 78-87 (2005)] Ground-state energy and energy landscape of the Sherrington-Kirkpatrick spin glass [Phys. Rev. B 76, 184412 (2007)] Successful High Accuracy Template Based Modeling in the CASP7 ex periments [Proteins, Vol. 69, 83-89 Suppl. 8 (2007)] Multiple sequence alignment by conformational space annelaing [Biophys ical J. 95 4813-4819 (2008)]: att532 What is CASP? Critical Assessment of Techniques for Protein S tructure Prediction (http://predictioncenter.gc. ucdavis.edu/). Goal is to help advance the methods of identif ying protein structure from sequence. Community-wide experiments held every two y
ears starting 1994 to prepare the post-genomi c era Blind prediction (and blind assessment). Since CASP1 (1994), there are a total of 514 pr otein sequences predicted. Since CASP5 (2002), ~200 methods have been tested for each CASP. Protein Structure Prediction 1. Physics-based approaches: Principal based modeling Accurate potential energy function Powerful global optimization method Ab initio, de novo, new fold targets (10-20%) 2. Informatics-based approaches: Template based modeling Map the original problem to a problem with solution mapping problem (alignment problem) Use templates (problems with solutions) to obtain the solution of the original problem (multiple alignment) Comparative modeling & fold recognition (80-90%) HDEA
RMSD=4.2 for 61 residues (80%, residues 25-85) HDEA Segment RMSD=2.9 for 27 residues (36%, residues 16-42) PNAS 96,5482 Past CASP Performances of KIAS protein folding lab - CASP5 (2002): 18th out of 165 team in new-fold category - CASP6 (2004): selected as a member of 12 elite teams in new-fold CASP6 example: T0199_D3 (FR/A, Nres=82, 145-22 6) Native structure Model4 Physics & Protein Structure Prediction
(I) 1. Proteins are polypeptide chains containing many atoms, and the interaction between atoms is considered to be reasonably well described by physics and chemistry. 2. However, there are only a few anecdotal examples of successful physics-based protein modeling (compared to the informatics-based method). 3. Currently, protein structure prediction methods relying only on physics-based approaches do not work as well as informatics-based methods. Protein Structure Prediction 1. Physics-based approaches: Principal based modeling Accurate potential energy function Powerful global optimization method Ab initio, de novo, new fold targets (10-20%) 2. Informatics-based approaches: Template based modeling Map the original problem to a problem with solution mapping problem (alignment problem) Use templates (problems with solutions) to obtain the
solution of the original problem (multiple alignment) Comparative modeling & fold recognition (80-90%) Physics & Protein Structure Prediction (I I) 1. The goal is to achieve better protein modeling by fusing informatics-based methods with a principle of physics (global optimization) 2. The task was to map protein modeling using templates into a series of combinatorial optimization problem 3. The reality was to learn TBM (template-based modeling) by making lots of mistakes in a real situation (CASP7) CASP7 Experiment
2006, May -- August About 200 prediction methods are tested Total of 104 targets (9 cancelled) Three major categories: High Accuracy Template Based Modeling (28 domains) Use fine resolution measures for backbone assessme nt Side-chains are also assessed Only model 1s are considered Template Based Modeling (108 domains) Free Modeling (16 domains) Physics-based methods have chances for providing co mpetitive protein models Official results are available from CASP7 conferen ce homepage (11/26-11/30/2006) and Proteins CA SP7 issue Homology modeling (template-based mod eling) methods in the literature
Conventional methods: Minimal amount of computing resources Human power intensive (several days per target) A series of decision making procedures require human expertise. More advanced (and successful) methods: Requires some/significant computing power Fragments are reassembled Complicated score functions (not available to others) are optimized TASSER by Zhang and Skolnick & ROSETTA by Baker Our approach: Problems are all mapped onto combinatorial optimization proble ms
Computing power intensive (CSA is used) Requires no human expertise (this is our first-ever TMB attempt i n the CASP) Score functions are made up with those available in public Goal was to learn TBM by making mistakes in a real situation We formulate protein modeling as a series of co mbinatorial optimization problems: Multiple Sequence Alignment (MSA) optimizatio n of a frustrate system [Biophysical J. 95 4813-48 19 (2008)]: generate pair-wise alignments between all pairs from each pair-wise alignment, generate residue-to-resid ue restraints a library of restraints a frustrated syste m All-atom chain building from MSA another combi natorial problem of the modeller energy function [Proteins 75 1010-1023 (2009)]: modeller energy is a collection of competing terms inclu ding distant restraint terms from MSA and stereo-chemis
try terms inherent frustration when dealing with more t han one template modeller energy is treated as a black box for optimizatio n CASP Strategy Proteins, Vol. 69, 83-89 Suppl. 8 (2007) Biophysical J. 95, 4813 (2008) Proteins 75 1010-1023 (2009) CASP7 High Accuracy Template Based Modelingz 0.995 Proteins 69, Issue S8, 27 37 (2007)
CASP7 High Accuracy Template Based Modeling Group nHA GDT-HA AL0 1 1/2 nMR LLG Sum TS556 (LEE) TS020 (Baker) TS249 (taylor) TS186 (CaspIta-FOX) TS004 (ROBETTA)
A total of 174 0.907 0.997 0.768 0.396 0.271 0.561 0.924 0.687 0.752 0.449 0.333 0.679 12 12 12 2 12
12 0.510 0.883 0.688 1.105 1.016 0.411 2.022 2.016 2.015 2.001 1.954 1.928 groups Proteins 69, Issue S8, 27 37 (2007) Conclusion of the official CASP7 assessment for HA/TBM
targets (Proteins 69, Issue S8, 38 56 (2007) reads: A number of groups did well in the HA/TBM category. Group 556 (LEE) stood out as the only group that performed near the top according to all criteria investigated: fold quality (particularly GDT-HA), side-chain rotamer quality, and molecular replacement model quality. Template Based Modeling (Skolnick) Proteins 69, Issue S8, 38 56 (2007) CASP8 2008, May 5 Aug 23. Over 200 prediction methods are tested. Total of 128 targets (6 cancelled). We tried 2 methods: LEE and LEE-SERVER (server) Partial assessment data released during the CASP8
meeting (Sardinia, Italy, Dec 3-7, 2008). Highlights of LEE & LEE-SERVER prediction: 50 HA-TBM targets: LEE & LEE-SERVER are 2 best methods Binding site prediction: LEE & LEE-SERVER are 2 best meth ods. Refinement category: Best model1 prediction by LEE 5 LEE 4.5 LEE-SERVER 4
17 Top 20 methods sorted by GDT-HA for HA-TBM targets http://casp.kias.re.kr 18 19 20 Predictor Group Rankings (CASP8 TBM category) What can one do better with more a ccurate protein models? Predict protein functions: CASP8 performance (best HA-TBM prediction by LEE and L EE-S) Best Binding site prediction protein design Suggest working mechanism of proteins at atomic
resolution (insulin analogs collaboration with Prof H Shin) Screen natural proteins to find more efficient enzy mes: Discovery of more efficient amino-transferases by protein modeling and docking simulation confirmed by wet exp eriments where 30-60 folds increased in the reaction rate is validated (collaboration with Prof BG. Kim) Determine a protein complex structure by combini ng X-ray diffraction data and protein modeling (Cel Cell 136 85-96, Jan 9 2009 X-tal structure of condensin complex MukBEF Cell 136 85-96, 2009 Screening of w-aminotransferase for the asymmetric s ynthesis of chiral amine
Sequence name Lowest distance among 100 docking poses () Initial rate for forward reaction (Uf, mol/mgmin)mol/mgmin) Initial rate for reverse reaction (Ur, mol/mgmin)mol/mgmin) Ur / Uf Produced (S)-MBA (mM) Atu4761
0.018 9.13 1. Caulobacter w-TA were selected and PSI-BLAST was run: 250 sequences were selected 2. 250 sequences were multiply-aligned and 4 subgroups were identified. 3. 51 sequences belong to w-TA and all the sequences were used for model building. 4. The models were docked with aminodiphenylmethane(ADPM), and the distance between PLP and the N atom of ADPM was measured. CASP8 Binding Site Prediction T0391 (Human/Server) Magenta X-ray (3d89A) PDB code: 3d89 Blue LEE HETERO ATOMS: FES
Protein Structure Determination by X-ray crystallography & MR Conclusions We have successfully mapped the template-based protein modeling into three layers of combinatorial optimizatio n problems: MSACSA, ModellerCSA and ROTCSA. We have demonstrated that high accuracy protein 3D mode ling can be achieved simply by rigorous optimization of relevant score functions. The proposed method requires a large amount of computati onal resources (100 CPU days per 300aa protein), but prod uces significantly better results. There are rooms for improvement for better template det ection and loop modeling Application to real/experimental systems is in the prelimin ary stage but quite promising. Acknowledgements 3D modeling: Keehyoung Joo, Jinwoo Lee, Dept. of Math., Kwangwoon U.
Sung Jong Lee, Dept. of Phys., Suwon U. Function: Mina Oh NMR Structure Optimization: Jinhyuk Lee Collaboration with experimental groups: Byung-Gee Kim, School of Chemical and Biological Engineering, SNU Byung-Ha Oh, POSTECH (moved to KAIST) H Shin, Soongsil U. DH Shin, Ewha Wemens Univ. Cluster computers: KIAS Thank You!
Set only HSV to generate single, early alarm. Adding HHSV alarm assuming that the first one is ignored only worsens the problem. Bad Example: Old SNS 'MEBT' Alarms. Each amplifier trip:≥ 3 ~identicalalarms, no guidance.
Networking Alan L. Cox [email protected] ... 802.11*, T1-links, DSL, …) Cox Networking * The Internet Circa 1986 Merit (Univ of Mich) NCSA (Illinois) Cornell Theory Center Pittsburgh Supercomputing Center San Diego Supercomputing Center John von Neumann Center (Princeton) BARRNet (Palo...
Overview of The Book. A Whole New Mind - Daniel Pink /20. In this book, the author has tried to convince the readers that we are experiencing a transition to a new age, which he calls 'The Conceptual Age', and...
Instinctive Drift- resorting to a primitive response that interferes with learning. Preparedness- species-specific biological predisposition to learn in certain ways but not others. Both due to strong evolutionary associations
Diavik Diamond Mine Power plant 34 MWe Fuel Storage 110 million litres Lack of affordable power impacting mine development 22 MWe required at the mine site Annual production 114,000 MWh (30m litres diesel) 22 MWe required at the port Annual...
Molecular Orbital (MO) Theory. Valence bond theory and hybridization explain bond angles, lengths, and shapes like the 4 equal bonds in methane, CH 4 as 4 equal sp3 hybrid orbitals rather.. But…some aspects of bonding are better explained by another...
Some emphasize the warm, nurturing maternal side of women. Other depict the Freudian idea of the nightmarish, monstrous feminine. Dream Imagery and Interpretation Freud believed that dreams operated on two levels. The "manifest content" was the imagery of the dream...
Ready to download the document? Go ahead and hit continue!