This Title All WIREs
How to cite this WIREs title:
WIREs Comput Mol Sci
Impact Factor: 8.127

Essentials of de novo protein design: Methods and applications

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

The field of de novo protein design has undergone a rapid transformation in the last decade and now enables the accurate design of protein structures with exceptional stability and in a large variety of folds not necessarily restricted to those seen in nature. Before the existence of de novo protein design, traditional strategies to engineer proteins relied exclusively on modifying existing proteins already with a similar to desired function or, at least, a suitable geometry and enough stability to tolerate mutations needed for incorporating the desired functions. De novo computational protein design, instead, allows to completely overcome this limitation by permitting the access to a virtually infinite number of protein shapes that can be suitable candidates to engineer function. Recently, we have seen the first examples of such functionalization in the form of de novo proteins custom designed to bind specific targets or small molecules with novel medical and biotechnological applications. Despite this progress, the incursion on this nascent field can be difficult due to the plethora of approaches available and their constant evolution. Here, we review the most relevant computational methods for de novo protein design with the aim of compiling a comprehensive guide for researchers embarking on this field. We illustrate most of the concepts in the view of Rosetta, which is the most extensively developed software for de novo protein design, but we highlight relevant work with other protein modeling softwares. Finally, we give an overall view of the current challenges and future opportunities in the field. This article is categorized under: Computer and Information Science > Computer Algorithms and Programming Structure and Mechanism > Computational Biochemistry and Biophysics Software > Molecular Modeling Structure and Mechanism > Molecular Structures
Complementary strategies to control the Packer during sequence design. (a) Layer design. Example of the classification of each residue position in three different layers: core, surface, and the boundary between the two. (b) HBNet method for designing hydrogen‐bonded networks. (Reprinted with permission from Ref (). Copyright (2016) American Association for the Advancement of Science (AAAS)). (c) Consensus loop design generates sequence profiles from naturally occurring loops with structures similar to the design. As an example, the local conformation of the loop is described with the sequence of ABEGO torsion bins (left) and natural loops with the same ABEGO torsions string are used to identify the most likely amino acids at each position
[ Normal View | Magnified View ]
Sequence‐independent rules. (a) Loop length determines the relative orientation of consecutive secondary structure elements. (b) Example of the application of the rules to build a three‐stranded antiparallel β‐sheet with a C‐terminal helix on the top with the following requirements: β‐strands should have between five and nine residues, strands 1 and 2 are in register, β‐hairpin loops have two residues, and the loop between the last β‐strand and the helix can have two or three residues. In principle, there are five possible lengths for the first strand pair and the third strand can only be equally long or shorter than the second strand (otherwise the extra residues would not belong to the strand as they cannot be paired with strand 2). These are 15 combinations that together, with the two possible loop lengths for the helix, makes a total of 30 combinations (see bottom left table). In contrast, according to the sequence‐independent rules, the two‐residue β‐hairpin loops preceding and following strand 2 are only compatible with an even number of residues (six or eight) for this strand. Additionally, depending on the side chain direction of the last residue of strand 3, it will be necessary to insert a loop of two or three residues to place the helix on the top—odd and even positions of strand 3 will require a two‐ and three‐residue loop, respectively (see bottom right table). Overall, with the sequence‐independent rules, there are only six possible combinations, just 20% of what would be theoretically possible. Such large differences in the number of combinations for such a simple protein topology highlights the importance of considering these rule‐based constraints when targeting larger and more complex folds
[ Normal View | Magnified View ]
Computational methods for building de novo protein backbones. (a) Fragment assembly scheme for building helix–loop–helix–loop repeat modules based on blueprint definitions exploring different combinations of secondary structure lengths (i,j,k,l represent lengths for each secondary structure) (Reprinted with permission from Ref (). Copyright (2015) Nature Publishing Group (NPG)). (b) Geometric parameters describing the relative orientation of pairs of interacting helices (e.g., for the parametric design of helical bundles). (Reprinted with permission from Ref (). Copyright (2011) Elsevier). (c) Scheme of a kinematic closure move for sampling a backbone conformation around two fixed ends (pivots) and its use for building cyclic peptides. (Reprinted with permission from Ref (51). Copyright 2013 PLOS, published under CC‐BY license). (d) Scheme of the SEWING method for recombining pieces of existing protein structures to build new protein backbones—continuous (top) and discontinuous (bottom). (Reprinted with permission from Ref (). Copyright (2016) American Association for the Advancement of Science (AAAS))
[ Normal View | Magnified View ]
De novo computational protein design workflow. It starts with the definition of the target protein topology, including all length combinations to be explored. Depending on the size and fold type, the most suitable backbone generation method is chosen to generate thousands of models compatible with the target topology that are subsequently filtered before running full‐sequence design calculations. The top‐ranked backbone‐sequence pairs are evaluated by their sequence‐structure compatibility, and those exhibiting funnel‐shaped energy landscapes are selected for experimental characterization. Different choices relevant at each design stage are provided
[ Normal View | Magnified View ]
Examples of recent de novo designed proteins of different classes. (a,b) ferredoxin and Rossmann folds (PDB ids: 2kpo and 2lnd); (c) four‐fold symmetric TIM barrel (PDB id: 5bvl); (d) curved β‐sheet fold (PDB id: 5 l33); (e) helical repeat (PDB id: 5cwb); (f) parametric four‐helix bundle (PDB id: 4uos); (g) coiled‐coil helical barrel (PDB id: 4pna); (h) helical fold designed with SEWING (PDB id: 5e6g); (i) ββαββ miniprotein (PDB id: 5up1); (j) two‐helix peptide with right‐ and left‐handed helices (PDB id: 5kx0); (k) βαβ peptide (PDB id: 5jhi); (l) macrocyclic peptide with mixed chiralities (PDB id: 6bet); (m) complex of the three‐helix bundle BINDI (cyan) with its target BHRF1 (green) (PDB id: 4oyd); (n) complex of a mini three‐helix bundle (cyan) with botulinum neurotoxin (green) (PDB id: 5vid); (o) four‐helix bundle in complex with its target porphyrin ligand (PDB id: 5tgy). Colors highlight different secondary structure types (α‐helices in red, β‐strands in yellow, and loops in green). Disulfide bonds designed for increased stability are highlighted in panels j and k
[ Normal View | Magnified View ]
Time evolution of experimentally solved structures of de novo computationally designed proteins. Stacked bar plot of the cumulative number of structures belonging to one of three classes: all‐α, mixed αβ and others (i.e., all‐β and macrocyclic peptides)
[ Normal View | Magnified View ]
Molecular dynamics simulations (MD) as an orthogonal qualitative test of de novo protein designs. (a) Top‐left. Multiple MD replicas (100 ns each) for a macrocycle demonstrate that the designed structure has very high stability (i.e., low RMSD compared with the designed structure). Left‐bottom, mutants predicted in Rosetta to greatly destabilize the design (by an extensive sampling of the energy landscape) also show great destabilization by MD. Right, histogram of the MD simulated fluctuations for the design (blue color) and mutants (pink color). (Reprinted with permission from Ref (). Copyright (2017) American Association for the Advancement of Science (AAAS)). (b) Extensive MD simulations for hundreds of miniprotein binders for 143 designs for BoNT and 146 for influenza HA show that designs with smaller interface residue fluctuations are more likely to result in high‐affinity binders (green color) than nonbinders (pink color)
[ Normal View | Magnified View ]
Compatibility between sequence and structure evaluation. (a) Fragment quality test comparing the 9‐mers of the design with those of natural structures with similar sequences (200 fragments are picked at each residue position). Biased (b) and ab initio (c) folding simulations from extended chain. Red dots represent the lowest‐energy structures obtained in each simulated folding trajectory. Green dots represent the lowest‐energy structures obtained from relaxing the design model
[ Normal View | Magnified View ]
Disulfide‐staples to stabilize de novo peptides and miniproteins. (a) The work by Bhardwaj et al. presented various examples of peptides and miniproteins with diverse topologies that are stabilized by means of disulfide bridges (one to three disulfides are necessary to stabilize the diverse de novo peptides and miniproteins) and demonstrated that chemical disruption of the disulfides greatly affects (or destroys) the designed proteins. (b) Schematic representation of a four‐helix protein (blue‐cylinders connected by loops) showing that the binding sites (orange cylinder) of nondisulfided de novo miniprotein binders can be stabilized (and the binding improved) by using single‐disulfide staples (yellow color). However, the improvement is only observed if the functional/binding site is included in the protein region enclosed by the disulfide. (c) Illustrating the concept in panel b showing that only designs with disulfides that improve Rosetta monomer energy and, at the same time, enclose four to nine hotspots (of the binding site) show improvement in binding (green dots), while fewer hotspots mainly result in no change in binding (gray dots) and disulfided protein monomers with worse energies mostly result in decreased binding (compared with the nondisulfide design). (d) Comparison of the crystallographic structure of one of the designs with improved binding from the data shown in panel c (cartoon representation, green color, two monomers in the asymmetric unit) versus its computational design (cartoon representation, pink color). The region enclosed by the disulfide is much more similar than the region outside, as well as the variability observed in the two structures observed in the asymmetric unit, supporting the idea that the contribution of the disulfide is to decrease the entropy of the elements enclosed by it
[ Normal View | Magnified View ]

Browse by Topic

Structure and Mechanism > Computational Biochemistry and Biophysics
Computer and Information Science > Computer Algorithms and Programming
Structure and Mechanism > Molecular Structures
Software > Molecular Modeling

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts