This Title All WIREs
How to cite this WIREs title:
WIREs Clim Change
Impact Factor: 5.124

# Data assimilation in the geosciences: An overview of methods, issues, and perspectives

Can't access this content? Tell your librarian.

We commonly refer to state estimation theory in geosciences as data assimilation (DA). This term encompasses the entire sequence of operations that, starting from the observations of a system, and from additional statistical and dynamical information (such as a dynamical evolution model), provides an estimate of its state. DA is standard practice in numerical weather prediction, but its application is becoming widespread in many other areas of climate, atmosphere, ocean, and environment modeling; in all circumstances where one intends to estimate the state of a large dynamical system based on limited information. While the complexity of DA, and of the methods thereof, stands on its interdisciplinary nature across statistics, dynamical systems, and numerical optimization, when applied to geosciences, an additional difficulty arises by the continually increasing sophistication of the environmental models. Thus, in spite of DA being nowadays ubiquitous in geosciences, it has so far remained a topic mostly reserved to experts. We aim this overview article at geoscientists with a background in mathematical and physical modeling, who are interested in the rapid development of DA and its growing domains of application in environmental science, but so far have not delved into its conceptual and methodological complexities.

• Climate Models and Modeling > Knowledge Generation with Models
Time‐ and ensemble‐averaged angle (in degree) between an anomaly from the ensemble Kalman filter (EnKF) ensemble and the unstable‐neutral subspace as functions of the ensemble size N (left y‐axis) and corresponding time‐averaged root mean square error of the EnKF (right y‐axis). The numerical experiments are performed on the Lorenz‐96 model with m = 40 variables (Lorenz & Emanuel, ) and the data assimilation (DA) setup is H = I d, R = I d, observational frequency Δt = 0.05 and error variance σ = 1
[ Normal View | Magnified View ]
Average root mean square error (RMSE) of several data assimilation (DA) methods computed from synthetic experiments with the Lorenz‐96 model. The left panel shows the filtering analysis RMSE of optimally tuned ensemble Kalman filter (EnKF), 4DVar, iterative ensemble Kalman smoother (IEnKS) assimilation experiments, as a function of the length of the data assimilation window (DAW). The right panel shows the smoothing analysis RMSE of optimally tuned ensemble Kalman smoother (EnKS), 4DVar, and IEnKS as a function of the length of their data assimilation window. The optimal RMSE is chosen within the window for 4DVar and it is taken at the beginning of the window for the IEnKS. The EnKF, EnKS, and IEnKS use an ensemble of N = 20, which avoids the need for localization but requires inflation. The length of the DAW is L × Δt, where Δt = 0.05
[ Normal View | Magnified View ]
Panel a: True covariance matrix. Panel b: Sample covariance matrix. Panel c: Gaspari–Cohn correlation matrix used for covariance localization. Panel d: Regularized covariance matrix obtained from a Schur product
[ Normal View | Magnified View ]
A linear advection equation on a periodic domain (traveling wave from left to right) illustrates how the Kalman filter (KF) and/or ensemble Kalman filter (EnKF) estimates the state at three different times, namely, (top) t = 5, (middle) t = 150, and (bottom) t = 300. The plots show the reference/true solution (red line), measurements (yellow circles), and error estimate (error bars equivalent to one standard deviation of forecast error variance). The initial condition, observation, and model error variance is 1.0, 0.01, and 0.0001, respectively, and the ensemble size is N = 500. In each of the panels we see that the error increases downstream, but also that downstream of an observation its effect is propagated by the wave and the errors are smaller on the right than on the left of the observation point. Similarly, by comparing the three panels for increasing time, we see how the errors decreases as a function of time, indicating a good performance of the filter
[ Normal View | Magnified View ]
Illustration of the three variational problems: Weak‐constraint 4DVar (w4DVar, top panel), strong‐constraint 4DVar (mid panel), and 3DVar (bottom panel). In the w4DVar, the control variable is the entire trajectory within the window (from t k to t k+2 in Figure ), x k+2:k, so the corrections (continuous red arrow) are computed and utilized at the observation times. The analysis trajectory (red line) is moved toward the observations. In the s4DVar, the control variable is the state at the beginning of the window, x k, and the corrections are computed at observation times (black arrows) but then propagated back to t k (red arrows) using the adjoint model (see equation (31)): Once the analysis at the initial time is computed, it is then used as initial condition to run a forecast until t k+2. In the 3DVar, the corrections are computed and utilized at each observation times t k sequentially, so that the analysis at t k, $xka$, takes into account only the observations at t k and is then used as initial condition for a forecast until the next observations at tk+1 and so on. Variational methods do not explicitly compute an estimate of the uncertainty (c.f., Appendix B) so the colorful ellipsoids of Figure are not present here
[ Normal View | Magnified View ]
Time series of data assimilation diagnostics across the 24‐year reanalysis for all temperature profiles in the depths 300–800 m in the whole Arctic. The blue line is the average of all innovations, the green line is the related standard deviation (root mean square error, RMSE), the red line is the ensemble spread and the gray line is the number of temperature observations. The IPY was officially taking place between the two vertical lines, but the observations were increasing progressively
[ Normal View | Magnified View ]
Illustration of the three estimation problems: Prediction (top), filtering (middle), and smoothing (bottom). The true unknown signal is represented by the blue line. Observation (blue), forecast (green), and analysis (red) pdfs are displayed as ellipsoids of proportional size, that is, the smaller the size the smaller the estimated uncertainty, so the larger the confidence; observational error is assumed constant. The associated blue stars for the observations, green squares for the forecast, and red square for the analysis have to be intended as point estimators based on the corresponding pdfs; one, not unique, choice can be the mean of the pdfs (c.f., Section 3, for a discussion on the choice of the estimator). Prediction (top panel): An analysis is produced at t k using the forecast and the observation at t k: The analysis uncertainty is smaller than the forecast uncertainty. From the analysis at t k, a prediction is issued until t k+2. The prediction error grows in time (as exemplar of a chaotic behavior typical in geophysical systems; c.f., Section 5.1) and the forecast uncertainty at t k+2 (the green ellipsoid) is larger than the analysis uncertainty at t k (red ellipsoid). Information is propagated only forward from t k as depicted in the information flow diagram (top right). Filter (mid panels): A prediction is issued from the analysis at t k until the next observations at t k+1; the forecast uncertainty at t k+1 is larger than that in the analysis at t k. At time t k+1, a new analysis is performed by combining the forecast and the observations at t k+1. The analysis uncertainty (red ellipsoid) is smaller than both the forecast and the observational uncertainties (green and blue ellipsoid/circle). From the analysis at t k+1, the process is repeated and a new forecast until t k+2 is issued, and a new analysis is performed using the observation at t k+2. The information flow diagram (mid right) depicts how the information is carried from both the past (as in the prediction problem) and from the present using current data. Smoother (bottom panels): All observations between t k and t k+2 contribute simultaneously to the analysis, which is now the entire trajectory within the smoothing interval [t k, t k+2]. At the final time, t k+2, the smoother and filter have assimilated the same amount of observations, so their analyses at t k+2, and their associated estimated uncertainties, are approximately the same (compare the red ellipsoids at t k+2 for smoother and filter), but the smoother is more accurate at any other time in the window. The smoother solutions at t k and t k+1 provide initial conditions for predictions until t k+2 (dotted and solid green lines, respectively). At final time, t k+2, there are three forecasts initialized respectively by the analyses at t k−1 (not shown), t k and t k+1, and the associated uncertainties (green ellipsoids) are inverse proportional to the length of the prediction, with the forecast initialized at t k+1 being the most accurate
[ Normal View | Magnified View ]
Principle of the sequential importance resampling (SIR) particle filter (here N = 19). The lower panel curves are the pdfs of the prior and the observation. The initial equal‐weight particles are also displayed. The middle panel shows the updated unequal weights of the particles as computed by the likelihood. The upper panel shows the outcome of resampling with multiple copies of several of the initial particles
[ Normal View | Magnified View ]
Required data assimilation (DA) method versus model resolution and prediction time horizon; examples of corresponding natural phenomena are also shown for illustrative purposes. The degree of sophistication of the DA grows commensurately with the increase in prediction time horizon and the decrease of the model grid size
[ Normal View | Magnified View ]
Example of sea surface temperature (in color) and sea ice concentration (in white) real‐time analysis by the TOPAZ system on November 28, 2009
[ Normal View | Magnified View ]

### Browse by Topic

Climate Models and Modeling > Knowledge Generation with Models

#### Society Partners

 Royal Geographical Society (with IBG) Royal Meteorological Society