This Title All WIREs
How to cite this WIREs title:
WIREs Comp Stat

Challenges and opportunities beyond structured data in analysis of electronic health records

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Abstract Electronic health records (EHR) contain a lot of valuable information about individual patients and the whole population. Besides structured data, unstructured data in EHRs can provide extra, valuable information but the analytics processes are complex, time‐consuming, and often require excessive manual effort. Among unstructured data, clinical text and images are the two most popular and important sources of information. Advanced statistical algorithms in natural language processing, machine learning, deep learning, and radiomics have increasingly been used for analyzing clinical text and images. Although there exist many challenges that have not been fully addressed, which can hinder the use of unstructured data, there are clear opportunities for well‐designed diagnosis and decision support tools that efficiently incorporate both structured and unstructured data for extracting useful information and provide better outcomes. However, access to clinical data is still very restricted due to data sensitivity and ethical issues. Data quality is also an important challenge in which methods for improving data completeness, conformity and plausibility are needed. Further, generalizing and explaining the result of machine learning models are important problems for healthcare, and these are open challenges. A possible solution to improve data quality and accessibility of unstructured data is developing machine learning methods that can generate clinically relevant synthetic data, and accelerating further research on privacy preserving techniques such as deidentification and pseudonymization of clinical text. This article is categorized under: Applications of Computational Statistics > Health and Medical Data/Informatics
Example of a chest X‐ray with an attached radiology report written by a radiologist. (Ground truth) and three automatically generated text so‐called synthetic texts created by the three different machine learning methods (our‐coattention, our‐no‐attention and soft attention)Source: Extracted from figure 3 in Jing et al. (2017). Licensed under creative commons
[ Normal View | Magnified View ]

Browse by Topic

Applications of Computational Statistics > Health and Medical Data/Informatics

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts