This Title All WIREs
How to cite this WIREs title:
WIREs Comput Mol Sci
Impact Factor: 25.113

Constructing Markov State Models to elucidate the functional conformational changes of complex biomolecules

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

The function of complex biomolecular machines relies heavily on their conformational changes. Investigating these functional conformational changes is therefore essential for understanding the corresponding biological processes and promoting bioengineering applications and rational drug design. Constructing Markov State Models (MSMs) based on large‐scale molecular dynamics simulations has emerged as a powerful approach to model functional conformational changes of the biomolecular system with sufficient resolution in both time and space. However, the rapid development of theory and algorithms for constructing MSMs has made it difficult for nonexperts to understand and apply the MSM framework, necessitating a comprehensive guidance toward its theory and practical usage. In this study, we introduce the MSM theory of conformational dynamics based on the projection operator scheme. We further propose a general protocol of constructing MSM to investigate functional conformational changes, which integrates the state‐of‐the‐art techniques for building and optimizing initial pathways, performing adaptive sampling and constructing MSMs. We anticipate this protocol to be widely applied and useful in guiding nonexperts to study the functional conformational changes of large biomolecular systems via the MSM framework. We also discuss the current limitations of MSMs and some alternative methods to alleviate them. WIREs Comput Mol Sci 2018, 8:e1343. doi: 10.1002/wcms.1343 This article is categorized under: Structure and Mechanism > Computational Biochemistry and Biophysics Theoretical and Physical Chemistry > Statistical Mechanics
Suggested protocol for constructing Markov State Models (MSMs) to investigate the functional conformational changes. The workflow consists of three stages: (a)–(c) generating the minimum free energy path(s) among the known functional states; (d)–(g) adaptive sampling and microstate MSM construction/validation; (h) elucidating the slowest kinetics of the system via the validated microstate MSM and interpreting the mechanism by lumping the microstate MSM into a macrostate MSM. (a) Find the known functional states from experimental structures or molecular modeling; (b) build a preliminary transition path between the known states via morphing (e.g., the Climber algorithm) or biased molecular dynamics (MD) simulation (e.g., steered MD, targeted MD); (c) optimize the preliminary path to locate the closest minimum free energy path via string method or extensive MD sampling; (d) initiate an ensemble of short unbiased MD simulations from the representative conformations along the optimized path; (e) select kinetically slow reaction coordinates using time‐lagged independent component analysis (tICA); (f) partition the collected samples into microstates based on their geometric proximity in the reduced tIC space; (g) build and validate the microstate MSM and perform further unbiased sampling seeded by the representative structures of each microstate if the local equilibrium is not reached in the microstate MSM; and (h) predict kinetic properties of the system via the microstate MSM and build the macrostate MSM via kinetic lumping for mechanism visualization and interpretation.
[ Normal View | Magnified View ]
The multiensemble Markov Model (MEMM) for the protein‐ligand binding of trypsin‐benzamidine. (a) The coarse‐grained kinetic network of the MEMM. All transition rates between macrostates are labeled in ms−1. (b) The efficiency of transition‐based reweighting analysis method (TRAM) and Markov State Model (MSM) in computing unbinding kinetics koff (Figure adapted with permission from Ref . Copyright 2016 National Academy of Sciences, USA).
[ Normal View | Magnified View ]
The backtracking process of RNA Polymerase II (Pol II) revealed by MSMs. (a) The stepwise process occurs among four metastable states. The equilibrium population of the states and MFPT among them are labeled. These values are calculated based on ultra‐long macrostate chains that are simulated by an 800‐state microstate MSM after bootstrapping the original 480 molecular dynamics (MD) trajectories. (b) A cartoon model of the backtracking mechanism. In S1 → S2, the motion of the RNA 3′‐end nucleotide is triggered by the bending of Bridge Helix (BH). In S2 → S3, the BH residue Y836 stacks with DNA transition nucleotide and Rpb2 residue Y769 stacks with RNA 3′‐end nucleotide through their aromatic rings. In S3 → S4, the movement of the RNA:DNA hybrid finally delivers the Pol II to the backtracked state (Figure adapted with permission from Ref . Copyright 2016 Nature Publishing Group).
[ Normal View | Magnified View ]
Comparison of different kinetic lumping methods for 1‐residue alanine dipeptide, 35‐residue villin headpiece, and 263‐residue β‐lactamase systems. (a) Crystal structures of alanine dipeptide (all‐atom), villin (ribbon), and β‐lactamase (ribbon). (b) Bayes factor of five lumping methods (less negative means better model). (c) Metastability values of the five lumping methods (larger value means better model) (Figure reprinted with permission from Ref . Copyright 2013 AIP Publishing LLC).
[ Normal View | Magnified View ]
Markov state model identifies key intermediate states along activation pathways of c‐Src kinase. (a) Crystal structures of inactive (left) and active (right) states of c‐Src. The differences lie in the activation loop (A‐loop; red), C‐helix (orange), and switching of electrostatic network among Lys295, Glu310, Arg409, and Tyr416. (b) Two intermediate states are identified on the potential of mean force calculated based on the stationary population of a 2000‐state microstate Markov State Model (MSM) over two reaction coordinates: root‐mean‐square distance (RMSD) of A‐loop residues and difference of distance between residue pairs E310‐R409 and K295‐E310. (c) The variation of four structural metrics along a long trajectory, synthesized from the MSM via the kinetic Monte Carlo scheme, provides a rough estimate of the timescale of the activation and deactivation processes. Here inactive state, active state, intermediated state I1 and I2 are shown in magenta, blue, green, and black, respectively (Figure adapted with permission from Ref 4. Copyright 2014 Nature Publishing Group).
[ Normal View | Magnified View ]
The quality of putative path is important for the adaptive sampling scheme. (a) The translocation process of RNA Polymerase II: Isomap representation of the initial paths generated by the Climber algorithm and the samples from the final Markov State Model (MSM) (colored dots). The MSM samples clearly deviate from the initial paths, indicating the necessity of path optimization before the adaptive sampling and MSM construction (Figure adapted with permission from Ref 37. Copyright 2014 National Academy of Sciences, USA). (b) The initial path can be optimized via the string method, as exemplified by the study of activation pathway of c‐Src kinase: the initial targeted molecular dynamics (MD) path can be optimized using limited amount of sampling (Figure adapted with permission from Ref 79. Copyright 2009 Elsevier).
[ Normal View | Magnified View ]
An example of choosing the input structural features for the time‐lagged independent component analysis (tICA) analysis. In all subgraphs (a)–(f), the left panel demonstrates the set of atoms (blue) among which the pair‐wise distances are selected as input features for the tICA analysis; the right panel plots the implied timescales (ITS) of the 1000 state MSM built by k‐centers clustering on the slowest four tICs shown in the left panel. The correlation lag time for tICA is 40 ns. The Markov State Model (MSM) lag time is 8 ns. The error bars of ITS of the MSMs are calculated by 100 times of bootstrapping experiments on all molecular dynamics (MD) trajectories. The distance set (f) is chosen as the optimal one, because the top MSM ITS is the highest among all sets yet with sufficiently less number of input distances (Figure adapted with permission from Ref 2. Copyright 2016 Nature Publishing Group).
[ Normal View | Magnified View ]

Related Articles

Algorithm improvements for molecular dynamics simulations

Browse by Topic

Theoretical and Physical Chemistry > Statistical Mechanics
Structure and Mechanism > Computational Biochemistry and Biophysics

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts