This Title All WIREs
How to cite this WIREs title:
WIREs Comput Mol Sci
Impact Factor: 25.113

Surviving the deluge of biosimulation data

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Abstract New hardware, massively parallel and graphical processing unit‐based computers in particular, has boosted molecular simulations to levels that would be unthinkable just a decade ago. At the classical level, it is now possible to perform atomistic simulations with systems containing over 10 million atoms and to collect trajectories extending to the millisecond range. Such achievements are moving biosimulations into the mainstream of structural biology research, complementary to the experimental studies. The drawback of this impressive development is the management of data, especially at a time where the inherent value of data is becoming more apparent. In this review, we summarize the main characteristics of (bio)simulation data, how we can store them, how they can be reused for new, unexpected projects, and how they can be transformed to make them FAIR (findable, accessible, interoperable and reusable). This article is categorized under: Molecular and Statistical Mechanics > Molecular Dynamics and Monte‐Carlo Methods Computer and Information Science > Databases and Expert Systems
Example of ontology metadata required for molecular dynamics (MD) simulations uploaded into the BigNASim database. Information about the study, the system, the setup conditions, the version of the software and the force field used to run the trajectory included in the ontology. The figure presents the complete BigNASim ontology metadata represented as interactive graph with a zoomed‐in area showing the simulation condition ontology class together with its closest relations (red square). Images from the website have been modified for visualization issues, original interactive version available at: http://www.visualdataweb.de/webvowl/#iri=http://mmb.irbbarcelona.org/BigNASim/htmlib/help/onto/ParmRNAOwl.owl
[ Normal View | Magnified View ]
Example of interactive graphics captured from the website http://mmb.irbbarcelona.org/MoDEL‐CNS. (a). RMSd values of a protein in the membrane (see image in panel C) along the molecular dynamics (MD) simulation trajectory with the resolution of one frame per ns. (b). Plot of RMSd values zooming into the range between 54 and 55 ns of trajectory. Selecting a shorter timescale, the resolution is automatically adapted and the precision (level defined in the blue square) increased to the maximum resolution of one frame per ps. (c). NGL viewer, which allows to visualize the biological complex at the particular snapshot selected from the RMSd plot. We selected a point with high RMSd value (0.35 nm), corresponding to the frame at time 54,03 ns of the simulation, and the 3D visualization, panel on the right, allows to see the complex protein–membrane conformation at that precise time along the simulation. Video showing the interactivy of the graphics: https://www.youtube.com/watch?v=bbmhAUuHKto
[ Normal View | Magnified View ]
Captures from the website https://mmb.irbbarcelona.org/MCDNA/, showing an example of interactive graphics for trajectory flexibility analysis of DNA. In the case of the end‐to‐end analysis, the user can move the dotted line along the snapshots of the plot of the end‐to‐end distance in a chromatin fiber and the corresponding structure appears as image. The same can be done also from the sliding bar placed at the top of the page. (a). Along the end‐to‐end plot, the lowest value of distance has been selected and on the right the corresponding image of the DNA with the distance value between the two ends appears. (b). Another example but selecting the highest distance, with the related image represented on the right where it is possible to see the stretched conformation of the DNA. In this way, the user can click into a given point of time in the simulation to analyze the conformation of the system in the 3D visualizer, where can zoom, rotate, and explore all the details. Images of the DNA were modified from the website for the sake of visibility. Video showing the interactivity of the graphics: https://www.youtube.com/watch?v=gCJMqhJ9d8U
[ Normal View | Magnified View ]
Example of analysis and data available in the database for nucleic acids BigNASim. On the top panel, a representation of DNA with the base pair and base pair step parameters that have been calculated for each trajectory of nucleic acids stored in BigNASim is shown. (a). Distribution of the collection of values of the base pair step twist found in the database for the tetramers ACGC (black) and GCGA (red) respectively, retrieved from the different trajectory stored. This collection of data, obtained from the analysis of 160,000 and 55,000 occurrences, respectively, allows to detect the very different behavior of the two tetramers in terms of twist deformation. (b). Meta‐analysis comparing the values for the base pair parameter opening of G·C and A·T base pairing. The distributions derived from 1,075,000 occurrences for the G‐C (black) and 1,225,000 occurrences for A‐T (red), clearly demonstrate the different breathing probability of G·C and A·T pairs. Image for parameters from: https://x3dna.org/highlights/simple‐base‐pair‐parameters
[ Normal View | Magnified View ]
Scheme of a modern molecular dynamics (MD) database architecture. Bottom to top, the MD database is based on the combination of two NoSQL engines, one for storing trajectories coordinates (e.g., Cassandra or MongoDB GridFS), and another one to store trajectories analyses results and simulation metadata (e.g., MongoDB documents) (Databases, bottom). Databases are queried using RESTful APIs (Server, middle) from the backend of the server (red box inset, bottom part). Backend information is then transferred to the frontend (red box inset, top part), where the new web technologies transform the data to an interactive web‐based graphical user interfaces (GUIs) (Client, top), finally making MD simulations available to the final user from a current web browser. Figure adapted from the BigNASim scheme (https://mmb.irbbarcelona.org/BigNASim/help.php)
[ Normal View | Magnified View ]
Disk space (GB) occupied by an example of molecular dynamics (MD) trajectory of 500 ns: the protein PDB ID 1UOT. Solvated system: 93,101 atoms; Dry system: 1,840 atoms. Storage capacity occupied considering solvated and dry system, and different formats of data compression: gzip, NetCDF and PCAzip
[ Normal View | Magnified View ]
Descriptors used in the MoDEL protein molecular dynamics (MD) database that have to be satisfied by a trajectory in order to be validated. On the contrary (descriptors global and local not satisfied), the trajectory would be labeled with warnings
[ Normal View | Magnified View ]

Browse by Topic

Computer and Information Science > Visualization
Computer and Information Science > Databases and Expert Systems
Molecular and Statistical Mechanics > Molecular Dynamics and Monte-Carlo Methods

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts