# A Simulation Study Using a Local Ensemble Transform Kalman Filter for Data Assimilation in New York Harbor

By Hoffman, Ross N Ponte, Rui M; Kostelich, Eric J; Blumberg, Alan; Szunyogh, Istvan; Vinogradov, Sergey V; Henderson, John M

ABSTRACT Data assimilation approaches that use ensembles to approximate a Kalman filter have many potential advantages for oceanographic applications. To explore the extent to which this holds, the Estuarine and Coastal Ocean Model (ECOM) is coupled with a modern data assimilation method based on the local ensemble transform Kalman filter (LETKF), and a series of simulation experiments is conducted. In these experiments, a long ECOM “nature” run is taken to be the “truth.” Observations are generated at analysis times by perturbing the nature run at randomly chosen model grid points with errors of known statistics. A diverse collection of model states is used for the initial ensemble. All experiments use the same lateral boundary conditions and external forcing fields as in the nature run. In the data assimilation, the analysis step combines the observations and the ECOM forecasts using the Kalman filter equations. As a control, a free-running forecast (FRF) is made from the initial ensemble mean to check the relative importance of external forcing versus data assimilation on the analysis skill. Results of the assimilation cycle and the FRF are compared to truth to quantify the skill of each.

The LETKF performs well for the cases studied here. After just a few assimilation cycles, the analysis errors are smaller than the observation errors and are much smaller than the errors in the FRF. The assimilation quickly eliminates the domain-averaged bias of the initial ensemble. The filter accurately tracks the truth at all data densities examined, from observations at 50% of the model grid points down to 2% of the model grid points. As the data density increases, the ensemble spread, bias, and error standard deviation decrease. As the ensemble size increases, the ensemble spread increases and the error standard deviation decreases. Increases in the size of the observation error lead to a larger ensemble spread but have a small impact on the analysis accuracy.

(ProQuest: … denotes formulae omitted.)

1. Introduction

Large advances in data gathering and ocean modeling capabilities in the last decade have started to make “operational oceanography” more than just a concept. In this new reality in oceanography, it is becoming more and more important that nowcasting and forecasting methods be both fast and accurate, provide information about uncertainties, make best use of disparate in situ and satellite datasets, and be implemented under different data constraints and dynamical regimes. Such ocean data assimilation systems (DASs) have a broad range of applications in areas as diverse as fisheries, oil, and tourism industries, search and rescue operations, oceanographie field research, and national security. Of particular interest is the coastal zone, including harbors and estuaries.

The implementation of a fully functional DAS for regional application in the oceanic coastal zones is a challenging task for several reasons, among them the large dimensionality of the problem, the need for accommodating a variety of asynchronous data in both dense and sparse sampling regimes, the stringent requirements for estimating uncertainties of analyses and forecasts, the portability and modularity needed to allow integration in an operational environment, and timeliness requirements. A number of global- and basin-scale assimilation efforts, some at eddy-resolving resolutions, are currently underway (e.g., Lermusiaux et al. 2006). Most recent efforts are focused on advanced DASs related to four- dimensional variational data assimilation (4DVAR) methods (e.g., Stammer et al. 2002; Wunsch and Heimbach 2007) or the various implementations of Kalman filter methods (e.g., Fukumori 2002; Lermusiaux et al. 2006). Application of these advanced methods for data assimilation in coastal regions is an active area of research, but optimal interpolation schemes (Mellor and Ezer 1991; Fana et al. 2004) are still in use in operational systems.

One geographical region with considerable ongoing efforts in modeling and data collection is the New York Harbor and adjacent littoral zone (Fig. 1). A quasioperational DAS based on the Estuarine and Coastal Ocean Model (ECOM) and a simple optimal interpolation scheme has been developed as part of the New York Harbor Observing and Prediction System (NYHOPS; Blumberg et al. 1999; Bruno et al. 2006). NYHOPS collects observations of the harbor and surrounding waters, but these data are not yet assimilated to improve forecasts. To explore the potential of an advanced DAS for the coastal zone, we have used the NYHOPS as a test bed for a state- of-the-art data assimilation method made possible by recent advances in adapting the Kalman filter for very large nonlinear dynamical systems. Our DAS is based on the local ensemble transform Kalman filter (LETKF), originally developed for atmospheric applications by Ott et al. (2004) and Hunt et al. (2007).

As a first step, this paper describes some simulation experiments and results. The experimental results presented here are based on a number of simplifications and assumptions appropriate for a proof- of-concept study. A long run of the model-called the nature runis taken to be the truth. To begin our data assimilation, an initial ensemble of model states is created by choosing snapshots from the nature run prior to the start of the assimilation experiment and treating them as realizations valid at the nominal synoptic time. Then, each ensemble member is advanced to the next synoptic time using the forecast model, and the observations are combined with the forecasts (i.e., the background ensemble) to produce an ensemble of analyses. This process is iterated. As it proceeds, the process fills gaps in sparsely observed regions, converts observations to improved estimates of model variables, and filters observation noise. All this is done in a manner that is physically consistent with the dynamics of the ocean as represented by the model. In our experiments, the simulated observations are created by sampling the nature run and adding random errors to those values. At each step, to monitor the quality of the system, we compare results to the truth and to a free-running forecast without any data assimilation.

In what follows, we describe our methodology (section 2), including the ocean model (ECOM), the assimilation method (LETKF), and the interface between these components (in sections 2a, 2b, and 2c, respectively). Results are discussed in section 3, including the experimental setup (section 3a), a detailed look at the baseline experiment (section 3b), and an overview of sensitivity experiments (section 3c) that explore how the quality of the analysis depends on ensemble size, data density, and observation error magnitudes. A summary of our experiments and key findings is provided in section 4, and suggested directions for future work are discussed in section 5.

2. Methodology

The goal of a DAS is to combine all available information and provide an optimal estimate of the state of the system and its respective uncertainty (e.g., Daley 1991; Kalnay 2002; Evensen 2006). Information is of two forms: past and present observations of the system and the dynamics of the system (as expressed by a model). In this section, we describe the ECOM, which we use to represent the dynamics of the ocean, the analysis algorithm for updating the state estimate based on the latest observations, and the interface between the model and analysis.

a. Estuarine and Coastal Ocean Model

The ECOM is a state-of-the-art, three-dimensional, hydrodynamic ocean model developed by Blumberg (1996) as a derivative of the Princeton Ocean Model (Blumberg and Mellor 1987). The model realistically computes water circulation, temperature, salinity, and mixing and transport in rivers, lakes, bays, estuaries, and the coastal ocean. Recent enhancements include generalized open boundary conditions, tracers, and bottom boundary layer submodels. The overall ECOM framework fully integrates sediment transport, water quality, particle tracking, heat flux, and wave modules. Anything predicted or diagnosed by the ECOM or its submodels can potentially be used by the LETKF.

The model numerically solves the continuity and nonlinear Reynolds momentum equations under hydrostatic and Boussinesq approximations. The free surface elevation is computed prognostically, and tides and storm surges can be easily simulated with minimal computational cost due to a mode-splitting technique in which the volume transport and vertical velocity are solved separately. The external-mode shallow-water equations, which are obtained from vertically integrating the three-dimensional equations, are solved by a leapfrog explicit scheme. The vertically dependent terms are solved less frequently using a leapfrog scheme with an Asselin time filter. The vertical model coordinate sigma is defined by the ratio of the depth and the local height of the water column such that the free surface is at sigma = 0 and the bottom of the ocean is at sigma = -1. A sigma-coordinate system (Phillips 1957) is preferable to a z-coordinate system in the vicinity of large bathymetric irregularities that are common in coastal areas. The parameterization of turbulence uses a secondorder closure scheme (Mellor and Yamada 1982). Successful applications of the ECOM to oceanic, coastal, and estuarine regions include studies in Chesapeake Bay (Blumberg and Goodrich 1990), Delaware Bay (Galperin and Mellor 1990), Massachusetts Bay (Signell et al. 1994), the Oregon Continental Shelf (Alien et al. 1995), New York Harbor (Blumberg et al. 1999), Onondaga Lake (Ahsan and Blumberg 1999), and Mississippi Sound (Blumberg et al. 2000). Extensive comparisons with data have shown that the model has good predictive capabilities, which suggests that the important physical processes are realistically reproduced (e.g., Vinogradova et al. 2005).

Input parameters for the ECOM include bathymetry, the initial ocean state (temperature, salinity, and surface elevation), and time- variable boundary conditions and atmospheric forcing. River discharges can also be introduced as time-variable fluxes. For the experiments described here, the domain is the New York Harbor and adjacent littoral zone, an implementation used quasi-operationally and known as the NYHOPS (Blumberg et al. 1999; Bruno et al. 2006). The spatial extent of the NYHOPS domain (see Fig. 1) incorporates the New York-New Jersey Harbor and extends beyond to include the Hudson River Estuary up to the Troy Dam, all of Long Island Sound, and the New York Bight out to the continental shelf. A 59 x 94 computational grid employs an orthogonal-curvilinear coordinate system that resolves the complex and irregular shoreline of the New York/New Jersey Harbor-New York Bight region. The resolution of the computational grid varies from 500 m in the rivers to about 42 km in the New York Bight. The local height of the water column varies from approximately 150 m to less then 2 m in the NYHOPS domain.

b. The local ensemble transform Kalman filter

The local ensemble transform Kalman filter belongs to the larger family of ensemble Kalman filter (EnKF) data assimilation schemes. Its design allows for efficient implementation on parallel, high- performance computing architectures.

1) THE KALMAN FILTER AND EXTENDED KALMAN FILTER

The goal of data assimilation is to determine the trajectory (time series of states) that best fits a set of noisy observations of the system from the past and the present. Let x be an m- dimensional vector representing the state of the system at a given time. For a grid point model, such as the ECOM, the components of ? are the state variables at the different grid points. Suppose that we have a time series of observations and assume both that the observational error is Gaussian with zero mean and a known error covariance matrix and that the observations depend on x in a known way. There are three pieces of information associated with the set of observations collected at time t^sub j^: the vector y^sup o^^sub j^, whose components are the observations; the observation operator H^sub j^, which defines the functional relation between x and y^sup o^^sub j^; and the observation error covariance matrix R^sub j^. That is,

y^sup o^^sub j^ = H^sub j^[x(t^sub j^)] + epsilon^sub j^, (1)

where epsilon^sub j^ is a Gaussian random variable with mean 0 and covariance matrix R^sub j^. Let the present time be t^sub n^. It can be shown that when the dynamics and the observation operator are linear, the present state associated with the most likely trajectory can be obtained by finding the minimum, x^sup a^^sub n^, of the cost function

… (2)

where the first term reflects the effects of all observations collected up to time t^sub n-1^ and the second term reflects the effects of observations collected at tn. The Kalman filter equations solve this least squares problem using the state estimate x^sup a^^sub n-1^ and the estimate of the analysis error covariance matrix P^sup a^^sub n-1^ from time t^sub n-1^ in the following manner:

(i) The background state x^sup b^^sub n^ is obtained by propagating the state estimate x^sup a^^sub n-1^ from t^sub n-1^ to t^sub n^ using the dynamics

… (3)

where M^sub t^sub n-1^,t^sub n^^ is the (linear) operator of the dynamics.

(ii) The background error covariance matrix P^sup b^^sub n^ is obtained by propagating the estimate of the analysis error covariance matrix P^sup a^^sub n-1^ from t^sub n-1^ to t^sub n^ using the dynamics

P^sup b^^sub n^ = M^sub t^sub n-1^,t^sub n^^P^sup a^^sub n- 1^M^sup T^^sub t^sub n-1^,t^sub n^^. (4)

(iii) The analysis error covariance matrix P^sup a^^sub n^ is given by

P^sup a^^sub n^ = (I – K^sub n^H^sub n^)P^sup b^^sub n^, (5)

where the Kalman gain matrix K is defined by

K^sub n^ = P^sup b^^sub n^H^sup T^^sub n^(H^sub n^P^sup b^^sub n^H^sup T^^sub n^ + R^sub n^)^sup -1^. (6)

(iv) The state estimate is x^sup a^^sub n^ obtained by

… (7)

This formulation of the Kalman filter assumes that M^sub t^sub n- 1^,t^sub n^^ provides a perfect representation of the true dynamics- a condition which is not satisfied when an imperfect numerical model is employed to simulate the true dynamics. Thus, in addition to the effects of the initial condition uncertainties at the beginning of the forecast step, model errors introduced during the forecast step also contribute to the uncertainty in the background x*. The effect of model errors is often taken into account by adding Q to the right- hand side of (4), where Q is the model error covariance matrix. This formulation assumes that the effect of the model errors can be represented by a bulk Gaussian random error term that has zero mean and error covariance Q and is uncorrelated with the errors hi the initial conditions (e.g., Evensen 2006, 28-29). Because the results of this paper are for the perfect model scenario, we make only two brief comments on the representation of model errors. First, in practice Q is often parameterized with a multiplicative variance inflation, that is, by modifying the right hand side of (4) by a multiplicative factor of (1 + gamma), 0

The extended Kalman filter (EKF; Jazwinski 1970) extends the applicability of the Kalman filter equations to nonlinear systems by using the nonlinear model in (3), …, and substituting the linearized (tangent linear) model for M^sub t^sub n-1^,t^sub n^^ in (4). Heuristically, this approach assumes that although the model dynamics is nonlinear, nonetheless the uncertainties in the state estimates are small; thus, their evolution can be approximated by the linearized dynamics. However, implementation of the EKF on a state-of-the-art ocean or numerical weather prediction (NWP) model is problematic for the following reasons:

(i) Explicitly forecasting P^sup b^^sub n^ for a high- dimensional nonlinear system using the tangent linear model in (4) and then calculating (5) and (7) is so computationally expensive that it is totally unfeasible without major approximations (e.g., Fukumori and Malanotte-Rizzoli 1995). This is true even for relatively small domain systems.

(ii) The use of the tangent linear model in (4) can potentially lead to unbounded linear instability of the filter (e.g., section 4.2.3 in Evensen 2006).

The most popular approach to avoid the difficulties posed by the EKF is to use a three- (3DVAR) or fourdimensional variational (4DVAR) method (see, e.g., Courtier 1997; Lorenc 1997). These techniques apply standard unconstrained optimization methods to directly minimize (2), thus eliminating the need for calculating (5) and (7), but do require an adjoint calculation of the gradient of (2). In the variational schemes the state estimate is updated by (3), but (4) is not used. Instead, 3DVAR simply uses a precomputed, time-independent P^sup b^^sub n^, whereas 4DVAR obtains P^sup b^^sub n^ starting from the same time-independent P^sup a^^sub n-1^ in each analysis cycle. It is important to note that the minimization of the 4DVAR cost function, like the EKF, requires the use of the tangent linear model (and its adjoint). Although some groups (e.g., Stammer et al. 2002; Wunsch and Heimbach 2007) use similar methods for ocean state estimation, here we consider a different approach: an ensemble Kalman filter.

2) ENSEMBLE KALMAN FILTERS

The EnKF approach makes Kalman filtering feasible by replacing (4) with a much cheaper approach for the calculation of P^sup b^^sub n^: at time t^sub n-1^, a k-member ensemble of initial conditions, [x^sup a(i)^^sub n-1^, i = 1, 2,..., k], is selected such that the spread of the ensemble around the ensemble mean x^sup a^^sub n-1^ accurately represents P^sup b^^sub n-1^; then the members of the ensemble are propagated using the nonlinear model to generate a background ensemble [x^sup b(i)^^sub n^, t = 1, 2,..., k]. Typically, the state space dimension m is orders of magnitude larger than the ensemble size k. Then the EnKF estimates of x^sup b^^sub n^ and P^sup b^^sub n^ are

… (8)

… (9)

where X^sup b^ is the m x k matrix whose ith column is x^sup b(i)^ – x^sup b^. The improvement in computational efficiency results from the fact that although the evaluation of (4) requires m linearized model integrations in the EKF, the evaluation of (8) requires only k integrations of the full model. Although the smaller number of model integrations significantly reduces the computational burden, there are a number of issues any EnKF scheme has to address before it can be implemented on a state-of-the-art ocean or NWP model, namely,

(i) Athough the rank of P^sup b^ is m, the rank of its estimate in (9) is k – 1, which could make solving (5) and (7) problematic; (ii) solving (5) and (7) for a typical model and a large number of observations can still be computationally prohibitive;

(iii) the EnKF is more sensitive to model errors than 3DVAR and 4DVAR because it uses the model dynamics to propagate the error statistics through many analysis cycles;

(iv) a computational algorithm, such as that given by Hunt et al. (2007), is needed to generate the analysis ensemble, [x^sup a(i)^, = 1, 2,..., k], such that

… (10)

… (11)

where X^sup a^ is the m x k matrix whose ith column is x^sup a(i)^ – x^sup a^; and

(v) P^sup b^ typically underestimates the actual uncertainty. One way to compensate is to multiply P^sup b^ by a constant factor (1 + gamma) before each analysis, where gamma is called the variance inflation factor.

Fortunately, all these issues can be satisfactorily addressed. Extensive numerical investigations with operational weather forecast models show that the approximations embodied in the EnKF work well and that some of the EnKF schemes are scalable to very large systems (e.g., Keppenne and Rienecker 2002; Whitaker et al. 2004, 2007; Szunyogh et al. 2005, 2008). Many different variants of the EnKF have been developed since the publication of the first EnKF scheme (Evensen 1994), and the main differences between these schemes are essentially in how the abovementioned challenges are met. A chronological history of ensemble Kalman filtering is provided by Evensen (2006, 255-265), and a detailed mathematical analysis of the relationship between the different schemes and a summary of the latest developments are presented in Hunt et al. (2007).

3) LOCAL ENSEMBLE TRANSFORM KALMAN FILTER

The goal of the local ensemble transform Kalman filter (LETKF) is to optimize the computational performance of the EnKF without a loss of accuracy. The LETKF has been shown to be a computationally efficient algorithm for NWP applications (Whitaker et al. 2007; Szunyogh et al. 2008).

The terms “local” and “transform” refer to the features of the LETKF approach described below.

* Local: The analysis ensemble members and their mean (the state estimate) are obtained independently for each model grid point G using observational information from a local volume containing G. The analysis ensemble members, [x^sup a(i)^^sub n-1^, = 1. 2,..., k], and the analysis x^sup a^ are assembled from the grid point estimates. A similar approach was first used in the local ensemble Kalman filter of Ott et al. (2004).

* Transform: The analysis perturbations x^sup a^ are obtained by linearly transforming the background perturbations x^sup b^ using the method of Hunt et al. [2007, Eqs. (22) and (23)]. Essentially, the cost function is minimized in the space spanned by the background ensemble. A similar approach was first used in the ensemble transform Kalman filter scheme of Bishop et al. (2001).

The LETKF has been coupled to the Global Forecast System (GFS), which is the global atmospheric model of the National Centers for Environmental Prediction (NCEP) and the National Weather Service (NWS; Szunyogh et al. 2008), and to the National Aeronautics and Space Administration (NASA)-National Oceanic and Atmospheric Adminstration (NOAA) finitevolume general circulation model (fvGCM) developed by Lin and Rood (1996). Tests of the implementation of the LETKF on the NCEP GFS with real observations show that the scheme is about as accurate as the operational data assimilation system of NCEP-NWS in data-dense regions and considerably more accurate than the operational system in data-sparse regions (Szunyogh et al. 2008). The interface between the ECOM and the LETKF, which we describe in the next section, has been developed based on the interface that was originally designed and coded for the NCEP GFS.

c. ECOM/LETKF interface

As noted above, the initial applications of the LETKF have been to numerical weather prediction. There is a one-to-one mapping between the prognostic variables of the GFS weather model and the ECOM (Table 1). In both models, a sigma-coordinate system is used, and the surface (top of the ocean, bottom of the atmosphere) is level 1. The two-dimensional prognostic variables that define sigma- the surface pressure for the GFS and the free surface for the ECOM- are at level 1. As a result of these similarities, only a few changes to the LETKF software were required, as will be noted in various contexts in this section.

To handle tide- or wind-forced gravity waves, the ECOM uses a time-splitting technique (Blumberg and Mellor 1987). In the NYHOPS, the time step is 5 s for the external (barotropic) mode and 50 s for the internal (baroclinic) mode. Although the model can represent very high-frequency phenomena, we do not expect to predict or analyze these frequencies accurately. Therefore, to deal with more predictable and observable quantities, we work in terms of time- averaged quantities. In fact, real observations are often time averaged over a period ranging from a few minutes to an hour. Experience shows that, in spite of the small time steps used by the ECOM, the interesting features of the NYHOPS domain in Fig. 1 can be represented by hourly averages. Therefore, in these preliminary experiments, 1-h averages of the forecasts of the prognostic variables are used as the background. Also, the observations are generated by adding random observational noise from the specified error distribution to 1-h averages of the fields from the nature run at randomly selected model grid points. The analysis solutions are then given in terms of these time-averaged quantities.

Once initialized with a first ensemble of analyses, x^sup a(i)^^sub 0^, i = 1, 2,…, k, the data assimilation cycles through the following forecast and analysis steps:

(i) the ensemble of ECOM forecasts from t^sub n-1^ to t^sub n^ provides the background ensemble, x^sup b(i)^^sub n^, i = 1, 2,…,k; and

(ii) the LETKF analysis combines the background ensemble and the simulated observations to produce the analysis ensemble x^sup a(i)^^sub n^, i = 1, 2,…, k.

After each analysis, we must restart the model after making (relatively small) changes to the model prognostic variables. For this purpose, we also run a forecast from the ensemble average and use the “hot restart” dataset that results as a template to create hot-restart files from the ensemble of analyses that only include time-averaged prognostic variables at the analysis time. In a hot restart, all fields necessary to exactly continue a model integration are stored. For example, in the ECOM, to accommodate the time stepping scheme, model prognostic variables at different times are included.1 The overall data flow in our numerical expertments is illustrated for a single data assimilation cycle in Fig. 2 and described in some detail in the appendix. Note that we carry out k + 1 forecast runs with the ECOM at every analysis time: one for each of the k ensemble members and an additional one from the ensemble mean analysis. The conversion procedures shown in Fig. 2 are needed because the LETKF and the ECOM require the ensemble data to be hi slightly different formats.

3. Results

Here we describe our baseline experiment and results in detail. We then summarize the results of sensitivity experiments in which we vary three aspects of the system: density (i.e., amount) of observations, ensemble size (i.e., number of ensemble members), and expected observation error (i.e., the standard deviation of the observation errors).

a. Baseline experiment description

The ECOM nature run was started from climatological values at 0000 UTC 1 Jan 2004. Thereafter external forcing was provided by meteorological inputs from operational NCEP Eta 12-km analyses, observed U.S. Geological Survey (USGS) river discharge data, NOAA water level observations, and lateral open ocean salinity and temperature boundary conditions from climatology and water level from a global ocean tidal model. We archived values from the ECOM nature run as hourly averages, valid at the half hour, beginning at 0030 UTC 1 March 2004 and continuing for 90 days. Our experiments have focused on the first half of an interesting event from 26 April through 4 May 2004, when strong discharge from local rivers, associated with heavy local rainfall and spring melting in the northern reaches of the Hudson River, forced a plume of warmer and fresher water to emerge from the New York Harbor and spread southward along the New Jersey coast.

Figure 3 shows two frames each from animations of T and S in the uppermost model layer (layer 1). The baseline experiment begins at 0530 UTC 26 April and the frames shown in Fig. 3 correspond to 0600 UTC 27 April and 1600 UTC 28 April 2004. Note how the plume of low- 5, high- T water from New York Harbor expands southward during this interval. (Although not described here, experiments for a period earlier in April, when the New York Harbor nature run was more quiescent, show qualitatively similar results to those presented here.)

The baseline experiment runs for 96 h and contains 32 3-h forecasts, each followed by an analysis. As mentioned earlier, 1-h average quantities are used throughout-for the background (i.e., forecast), the observations (and thus also the output from the LETKF), and in all plots presented here. The nominal ensemble size is k = 16. The LETKF analyzes the principal prognostic variables (h, T, u, v, S). At each analysis time, synthetic observations are generated by randomly selecting, on average, 10% of the elements in the model state vector x. This selection method implies that if temperature is observed at a given map location, then there may be zero or a few other observations of temperature or other prognostic variables in the vertical column at that map location. The standard deviations of the observation errors are uniform in the vertical and are 0.1 m, 0.5[degrees]C, 0.05 m s^sup -1^, and 1 psu for surface height, temperature, currents, and salinity, respectively. The variance inflation factor is gamma = 0.09. The localization is done in terms of grid distances, and the search volume extends out to two grid lengths in both horizontal directions and one to two grid lengths in the vertical direction, increasing with depth. This localization strategy provides a natural search volume because observations are generated randomly at a specified fraction of grid points. At any given grid point, between a few and a few dozen observations are used (Fig. 4).

The initial ensemble is made up of data sampled regularly in time from the nature run during March and the first half of April 2004. For the baseline experiment, nature is sampled roughly every fifth high tide (approximately every 62 h), beginning with 1030 UTC 3 March, 0030 UTC 6 March, and 1330 UTC 8 March, and ending with 0530 UTC 11 April. The times for the start of the experiment, the initial ensemble members, and the template used to create the first ensemble of initial conditions for the ECOM are all close to the time of high tide at the Battery station at the southern tip of Manhattan. In this way, the component of the flow due to the tidal forcing is approximately the same, and in fact minimized, in all these states. This should limit spurious large-amplitude transients.

The time required to perform the analysis for all grid points is typically an order of magnitude less than the time required for a single 3-h ECOM forecast. Thus, in contrast to the case reported for the NCEP GFS with large operational observational datasets by Szunyogh et al. (2008), we did not need to use domain decomposition and multiprocessing within the LETKF, but we did find it useful to run the ECOM for different ensemble members on different processors.

b. Baseline experiment results

In the baseline experiment, the LETKF analyses very quickly asymptote to the nature run (i.e., to the “truth”). For comparison, a free-running forecast (FRF) was started from the mean of the initial analysis ensemble. This comparison is important to determine the extent to which the ensemble tracks the truth because of the data assimilation and not simply because the system evolution is largely determined by the forcing fields (surface winds, freshwater inputs, fluxes of heat and moisture at the surface, and inflow boundary conditions).

We begin by examining the impact of the data assimilation at a single location indicated in Fig. 1-the model grid point at the head of Hudson Canyon, located southeast of Sandy Hook beach, New Jersey, and south of Jamaica Bay, Long Island. At this location, Fig. 3 shows that there are large temporal changes in surface temperature and salinity. Figure 5 shows timedepth cross sections at this location for T and 5 for the analysis, FRF, and nature run. The FRF begins with a state that is representative of the cooler and saltier conditions during the first half of the nature run. The FRF recovers neither the temperature nor the salinity structure of the nature run. In contrast, the baseline analysis does a very good job of representing the true state of the ocean at this location after roughly 36 h of assimilation.

The colder conditions in the FRF seen in Fig. 5 hold over the entire domain. Figure 6 shows the temporal behavior of domain- averaged temperature for the nature run, the analysis, the forecast (or background), and the FRF. The mean values are calculated over all ocean grid points or over all observed locations-roughly 10% of all ocean points for the baseline experiment. (For the rest of the paper, each statistic that is presented-here, mean temperature-is calculated for a particular sample. This sample might include one time or a range of times, a single level or all levels, and a single location or the entire horizontal domain. For all statistics presented, all grid points are given equal weight even though there is a great variation in the volume associated with a grid point.) Fig. 6 shows that the FRF has a roughly 3[degrees]C cold bias and shows little improvement even at four days. The assimilation of data quickly eliminates the domain-averaged bias from the background and the analysis.

The analysis provides unbiased estimates for the other model variables as well as for temperature. For example, Fig. 7a shows the evolution of domainaveraged bias for salinity. Note how smoothly the analysis settles down to the truth and does not track the jitter in the bias between the observations and background (see curves T – A and O – B in Fig. 7a). By design, the data errors are Gaussian and unbiased, and the expected observation errors are uniform: 1 psu for salinity. This is clearly seen in the O – T curves that follow, but here there is a slight positive (O > T) salinity bias due to resetting simulated observations to zero psu when simulated observation errors would result in negative salinity. Similarly, Fig. 7b shows the evolution of domain-averaged bias for the w- current. Here there is a clear oscillation in the FRF bias during the first day or so of the experiment, which is also seen in h and v (not shown).

Not only is the analysis unbiased, it is also accurate. Figure 8a shows the temperature error evolution. Here, and in what follows, we will use the term “error” to mean the standard deviation of the difference between the truth and some other quantity, such as the ensemble mean analysis. Note the very fast approach of the analysis error to its asymptotic value: after just a few 3-h cycles, the errors are greatly reduced and are smaller than the observation errors and much smaller than the errors in the FRF. The fact that the analysis and background errors quickly asymptote to very small values relative to the observation errors is a key indicator that the observed information is assimilated efficiently. Insofar as the analysis errors are smaller than the FRF errors, one can conclude that the small errors in the state estimates are not simply due to the model response to the strong external forcing. In Fig. 8a, the small deviations of the observation errors from the expected value (curve O – T) are due to the finite nature of the sample.

It is apparent that different variables in the ocean model respond to the external forcing on different time scales. Figure 8b shows the evolution of the surface height error. The sample sizes are roughly a tenth of those for temperature because the surface height is a two-dimensional field, whereas three-dimensional fields like temperature have 10 layers. The behavior of the assimilation is again very accurate, with negligible errors after 48 h, but the FRF is also very good and asymptotes to similar very small errors after 72 h. Salinity errors are more like temperature in that the FRF very slowly approaches the truth, whereas velocity errors are more like surface height in that the FRF quickly approaches the truth. These differences relate to the fact that u, v, and h are for the most part tidally forced and can adjust rapidly to the perfect forcing, but adjustments of the T and S fields involve thermodynamic processes that have longer time scales and depend not only on the forcing but also on advection and diffusion processes.

The fast approach of the data assimilation system to asymptotic behavior is also seen in the ensemble spread. Here, and in what follows, the ensemble spread is the rms value for the tune interval and domain under consideration of the ensemble standard deviation calculated at each grid point and synoptic time. Figure 9 shows the evolution of ensemble spread for the temperature in the entire domain, which shows that the initial ensemble spread is reduced quickly by the assimilation of observed information. Also, there is negligible growth or decay of the ensemble spread during the forecast step. This is consistent with the perfect model, perfect external forcing experiment design and suggests that the model dynamics have only a weak sensitivity to initial condition perturbations.

We began this discussion at a single point and then turned to domain-wide statistics. We now show that the spatial variation of analysis error is fairly smooth. For example, analysis errors vary quite smoothly in the vertical. Figure 10 shows vertical profiles of the errors as defined in the discussion of Fig. 8, calculated at each model layer for temperature and salinity and for the time interval from 48 to 96 h after the start of the experiment. Again, a Une is plotted that corresponds to the expected observation errors. Except for layers 2 and 3 for temperature, all other analysis errors are smaller than the expected observation errors. There is a general decrease of error with depth. Vertical profiles of errors of u- and v-currents show a monotonic decrease of error as depth increases, which mirrors the decrease of variability in the currents with depth, likely because of bottom friction (not shown).

Over most of the domain, the analysis errors are also much more spatially homogeneous than those of FRF, as evident in Fig. 11. The top two rows in Fig. 11 show the spatial dependence of the salinity error in layer 1. The analysis is doing a good job everywhere, but especially in the freshwater plume, inner harbor, estuaries, rivers, and Long Island Sound. These areas, especially the rivers, are much easier to see in the gridpoint coordinate version of the maps. Note the particularly large errors of the FRF within the range 1

All results described so far have been for our baseline experiment with observation density d = 0.10, ensemble size k = 16, and nominal error levels. We now examine the sensitivity of the results to the choice of d, k, and observation error standard deviation. The range of choices for the parameters in the sensitivity experiments is listed in Table 2. The nominal observation error standard deviations are multiplied by a factor e in these experiments. In this section, we discuss only the sensitivity of the analysis errors to the parameters.

Figures 12-13 show results for different data densities ranging from d = 0.50 to d = 0.02. As the data density increases, the analysis error and the ensemble spread decrease and the time required to reach asymptotic behavior decreases. Figure 12 shows this for temperature error. Note that in the sequence of decreasing data density, the tune to reach asymptotic behavior for d = 0.02 is much greater than for d = 0.05. However, the differences between these analyses are all small after 48 h of data assimilation. Figure 13 shows vertical profiles of the ensemble spread for hours 48 through 96 for the varying data densities. As the data density increases, the information content of the analysis increases and the ensemble spread decreases. The variation of ensemble spread is large relative to the variation of the analysis error. Furthermore, the magnitude of the ensemble spread in all cases is smaller than that of the analysis error. One reason for this is that the ensemble estimates the analysis uncertainty in the reduced dimensional space spanned by the ensemble members. Although this should be the correct uncertainty estimate for the purpose of the LETKF, for other purposes better estimates of analysis error might be based on the correlations between the analysis error and the ensemble spread observed in simulation experiments like these.

Figure 14 presents summary statistics for all sensitivity experiments listed in Table 2. There are three groupings in Fig. 14 for variations in d, k, and e. The ensemble size experiments have d = 0.50. The other experiments are variations about the baseline experiment. All statistics in Fig. 14 are calculated for the sample composed of the entire domain and hours 48 through 96 of the experiments. The values are then normalized by the baseline observation errors and presented as percentages. Of course, these are summary statistics that cover a variety of different regimes in the New York Harbor (see Fig. 11).

The analysis error is reduced for T, u, v, and S as k or d increases. Variations of analysis error are small and show evidence of saturation as the ensemble size or data density increases. Typically, doubling k reduces the error by 10%; doubling d reduces error by 2%. The percentage reduction of error for doubling k tends to decrease as the ensemble size increases. A similar relationship holds for temperature and increasing data density. If the error for 2k were reduced by a constant factor relative to the error for k, then the log of the error would be proportional to the log of k, and similarly for the observation density. Comparing the d = 0.50, k = 8 case (“k = 8″) to the d = 0.10, k = 16 (“d = 0.10″) cases, we see that even though the second experiment has only 20% of the observations, it provides an equivalent or superior analysis by doubling the ensemble size. Increasing the ensemble size from k = 16 to k = 32 further improves error levels. Figure 15 shows how the ensemble size affects the approach to asymptotic behavior as well as the quality of the analysis for T and h. In particular, for T, when k = 4, the filter still converges but a much longer time is required to reach asymptotic behavior. In this case, the slow improvement with time may be an effect of the correct external forcing acting over time. The k = 4 time evolution of the T analysis error parallels the FRF evolution of forecast error after the first few steps. For h, the FRF and k = 4 experiments converge to asymptotic behavior at approximately the same rate. Otherwise, h analysis errors vary little from experiment to experiment. This result is consistent with the interpretation that h is to a large extent externally (tidally) forced and that the external forcing is specified to exactly match that of the nature run.

The analysis bias is quite small for h, u, and v. For T and 5, bias decreases with more or better data and with increasing ensemble size. For bias, we do not discern any clear trends in the summary statistics.

The ensemble spread statistics show clear trends. The spread decreases when the number of observations or the accuracy of the observations is increased, but it increases when the number of ensemble members is increased. Relative to the observing errors, spreads are largest in the d = 0.02 case for currents, and are very small for h. Except for the comparison of d = 0.02 to d = 0.05, doubling d results in approximately constant decreases in spread. Similarly, with every doubling of ensemble size, the ensemble spread increases by a similar amount (i.e., the difference between successive experiments is approximately constant). These constants are different for each variable and for k and d. If the spread were reduced by a constant factor for doubling of d or k, then the spread would have a linear relationship with log d and log k.

In all comparisons, except for h and especially for T, the error and bias of the FRF are much larger than any of the data assimilation experiment results. The differences between the statistics of different experiments are small by comparison to the difference between any one experiment and the FRF. In the sensitivity experiments, the largest impacts are noticeably larger errors and biases for k = 4 and d = 0.02 than for other choices of the parameters. Variations in the rms error and the bias are surprisingly small when the observation error is doubled or halved. Apparently, at a data density of 10% the LETKF is very efficient at filtering out uncorrelated, unbiased observation errors. Figure 16 shows this for T. With larger observation errors, it does take more time to reach asymptotic levels of analysis error. Even though the analysis errors change little with observation error size, the ensemble spread that indicates an estimate of the analysis uncertainty is very sensitive to the specified observation error levels. Although the analysis is robust to the specification of the a priori observation error statistics, as mentioned before, estimates of the analysis uncertainty might be improved by a statistical postprocessing of the information provided by the ensemble spread. In general, the ensemble spread is sensitive to all three factors-observation density, ensemble size, and observation error magnitude.

4. Summary

We have coupled a modern coastal ocean model (ECOM) with a modern data assimilation method (LETKF) and conducted a series of simulation experiments taking a long ECOM nature run to be the truth. Observations are generated at analysis times by sampling the nature run at model grid points with a specified density of observations and perturbing these values with random errors with specified statistics (normal; unbiased; with given standard deviations). A diverse collection of model states is used for the initial ensemble. As a control, a free-running forecast is made from the initial ensemble mean. The FRF is an important point of comparison because, to a large extent, the coastal ocean is forced by tides, inflows at open lateral boundaries, and fluxes of momentum, heat, and moisture at the surface; and in all experiments described here the external forcing is fixed and identical to that used in the nature run. During the data assimilation, the ECOM advances the ensemble 3h to provide a background for the analysis step. The analysis step combines the observations and the background, using the ensemble to estimate the background uncertainty and using the specified observation standard deviations to estimate the observation uncertainty in the Kalman filter equations. The state estimation errors of the analysis and the FRF are quantified by comparing each to the nature run.

The following are some of the findings from our experiments, which may be dependent on the particulars of our study-the domain, the season, and the external forcing. The assimilation quickly eliminates the domain-averaged bias of the initial ensemble. The FRF is unable to do this for temperature or salinity. After just a few analysis cycles, errors are greatly reduced by the assimilation of observations. Analysis errors are mostly smaller than observation errors and are much smaller than the errors of the FRF. The fact that the analysis and background errors quickly asymptote to small values relative to the observation errors is a key indicator that observed information is assimilated efficiently. Insofar as the analysis errors are smaller than the FRF errors, one can conclude that the result that the analysis asymptotes to the nature run is not simply the effect of the model response to the external forcing. FRF temperature and salinity very slowly approach the truth, whereas current and surface height very quickly approach the truth. These differences relate to the different dependencies and adjustment time scales of dynamic and thermodynamic variables to external forcing. Spatially, the analysis does a good job everywhere, especially in the inner harbor, estuaries, and Long Island Sound. The southeast part of the domain is relatively quiescent, and therefore the analysis and FRF are similar there. As the data density increases, ensemble spread, bias, and error standard deviation all decrease. The filter accurately tracks the truth at all data densities examined: from observing 50% down to 2% of the model grid points. As the ensemble size increases, ensemble spread increases and error standard deviation decreases. For an ensemble of just four members, the filter still converges, but a much longer time is required to reach asymptotic behavior. Comparing the d = 0.50, k = 8 case to the d = 0.10, k = 16 case, we see that even though the second experiment has only 20% of the observations it still provides superior analyses by doubling the ensemble size. Increasing the ensemble size from k = 16 to k = 32 provides still smaller analysis errors. Increases in the size of the observation error lead to a larger ensemble spread but have a small impact on the analysis accuracy. 5. Future work

To further define the characteristics of the DAS and gain more experience in the choice of adjustable DAS parameters such as the ensemble size, localization, inflation factor, or initialization, further idealized sensitivity experiments of the kind performed here will be useful. Extra experiments might include assimilating different sets of observations with different data densities (e.g., relatively dense surface coverage from SST and very sparse subsurface observations), examining the effect of incorrect external forcing during the data assimilation, or examining the sensitivity to different lengths of the data assimilation step. Such experiments will be a helpful guide when working with real data and uncertain external forcing. In addition, continuing experiments with “simulated” data will allow flexible choices of domains and data types that are not easily realized with current actual data collection systems. Such experiments should be useful in assessing the robustness and scalability of the DAS for dense observing systems.

To demonstrate the skill of the ECOM/LETKF system in realistic and diversified settings, experiments need to make use of real observations taken at any time and location within the domain and to include the effects of uncertainties in surface forcing, model physics, open boundary conditions, and any other factors contributing to model error. The NYHOPS setup provides a focus on the dynamics of very shallow regions with convoluted coastal geometries and strong boundary forcing by river discharge, typical of estuarine and harbor domains. Studying a second domain with a focus on deeper shelf and outer shelf zones and the frontal dynamics typical of shelf-break fronts would be useful to further test the methodology in different dynamical conditions.

Without having the benefit of knowing the truth, as with the experiments in this paper, one needs validation and verification metrics (e.g., Wilks 2006) to assess the behavior of the DAS. Other tools required to work with real datasets include quality control procedures. Simple data quality control methods like the “background check,” which compares the observation minus forecast to the expected forecast error, are quite suitable for the LETKF. Model errors resulting from poorly known forcing fields, open lateral boundary conditions, or missing physics can all introduce biases in the forecasts, and methods to correct such problems can be implemented within the LETKF framework (Baek et al. 2006). Finally, with many available submodels for biogeochemistry, sediment transport, water quality, waves, and particle tracking, there are opportunities to extend the assimilation to nonstandard data such as ocean color and turbidity, chemical tracers, wave energy, and locations of drifting buoys and autonomous underwater vehicles (a.k.a. gliders). These opportunities exist because the LETKF method is completely general in the sense that when the observation errors can be assumed to be Gaussian, any observation of a physical parameter that has a known functional dependence on the variables of the dynamical model, can potentially be usefully assimilated regardless of the magnitude of the observation error.

Acknowledgments. The authors thank Gregg Jacobs and Craig Bishop for helpful discussions. This work was supported by the U.S. Navy SPAWAR SBIR program (Contract N00039-06-C-0050).

1 Many models have a so-called “warm restart” capability that uses an Euler forward time step from a given initial state. A warm- restart capability may also include some special balancing or initialization procedures to reduce undesirable initial behavior (e.g., the excitation of spurious gravity waves). Because the present version of the ECOM model does not have a warm-restart capability, we use the modified hot-restart procedure described here.

REFERENCES

Ahsan, Q., and A. Blumberg, 1999: Three-dimensional hydrothermal model of Onondaga Lake, New York. J. Hydrol. Eng., 125, 912-923.

Allen, J. S., P. A. Newberger, and J. Federiuk, 1995: Upwelling circulation on the Oregon continental shelf. Part I: Response to idealized forcing. J. Phys. Oceanogr., 25, 1843-1866.

Baek, S.-J., B. R. Hunt, E. Kalnay, E. Ott, and I. Szunyogh, 2006: Local ensemble Kalman filtering in the presence of model bias. Tellus, 58A, 293-306.

Bishop, C. H., B. J. Etherton, and S. J. Majumdar, 2001: Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects. Mon. Wea. Rev., 129, 420-436.

Blumberg, A. F., 1996: An estuarine and coastal ocean version of POM. Proc. Princeton Ocean Model Users Meeting (POM96), Princeton, NJ, Princeton University, 9.

_____, and G. L. Mellor, 1987: A description of a three- dimensional coastal ocean circulation model. Three-Dimensional Coastal Ocean Models, N. Heaps, Ed., American Geophysical Union, 1- 16.

_____, and D. M. Goodrich, 1990: Modeling of wind-induced destratification in Chesapeake Bay. Estuaries Coasts, 13, 236-249.

_____, L. A. Khan, and J. P. St. John, 1999: Three-dimensional hydrodynamic simulations of the New York Harbor, Long Island Sound, and the New York Bight. J. Hydraul. Eng., 125, 799-816.

_____, Q. Ahsan, and J. K. Lewis, 2000: Modeling hydrodynamics of the Mississippi Sound and adjoining rivers, bays, and shelf waters. Proc. OCEANS 2000 MTS/IEEE Conf. and Exhibition, Vol. 3, Providence, RI, IEEE, 1983-1989.

Bruno, M. S., A. F. Blumberg, and T. O. Herrington, 2006: The urban ocean observatory-Coastal ocean observations and forecasting in the New York Bight. J. Mar. Sci. Environ., C4, 1-9.

Courtier, P., 1997: Variational methods. J. Meteor. Soc. Japan, 75, 211-218.

Daley, R., 1991: Atmospheric Data Analysis. Cambridge University Press, 457 pp.

Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99 (C5), 10 143-10 162.

_____, 2006: Data Assimilation: The Ensemble Kalman Filter. Springer, 280 pp.

Fan, S., L.-Y. Oey, and P. Hamilton, 2004: Assimilation of drifter and satellite data in a model of the northeastern Gulf of Mexico. Cont. Shelf Res., 24, 1001-1013.

Fukumori, I., 2002: A partitioned Kalman filter and smoother. Mon. Wea. Rev., 130, 1370-1383.

_____, and P. Malanotte-Rizzoli, 1995: An approximate Kalman filter for ocean data assimilation: An example with an idealized Gulf Stream model. J. Geophys. Res., 100 (C4), 6777-6793.

Galperin, B., and G. L. Mellor, 1990: A time-dependent, three- dimensional model of the Delaware Bay and River system. Part 1: Description of the model and tidal analysis. Estuarine Coastal Shelf Sci., 31, 231-253.

Hunt, B., E. J. Kostelich, and I. Szunyogh, 2007: Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter. Physica D, 230, 112-126.

Jazwinski, A. H., 1970: Stochastic Processes and Filtering Theory. Academic Press, 376 pp.

Kalnay, E., 2002: Atmospheric Modeling, Data Assimilation, and Predictability. Cambridge University Press, 364 pp.

Keppenne, C., and M. Rienecker, 2002: Initial testing of a massively parallel ensemble Kalman filter with the Poseidon isopycnal ocean general circulation model. Mon. Wea. Rev., 130, 2951- 2965.

Lermusiaux, P. F. J., and Coauthors, 2006: Quantifying uncertainties in ocean predictions. Oceanography, 19, 90-103.

Lin, S.-J., and R. B. Rood, 1996: Multidimensional flux-form semi- Lagrangian transport schemes. Mon. Wea. Rev., 124, 2046-2070.

Lorenc, A. C., 1997: Development of an operational variational assimilation scheme. J. Meteor. Soc. Japan, 75, 339-346.

Mellor, G. L., and T. Yamada, 1982: Development of a turbulence closure model for geophysical fluid problems. Rev. Geophys. Space Phys., 20, 851-875.

_____, and T. Ezer, 1991: A Gulf Stream model and an altimetry assimilation scheme. J. Geophys. Res., 96, 8779-8795.

Ott, E., and Coauthors, 2004: A local ensemble Kalman filter for atmospheric data assimilation. Tellus, 56A, 415-428.

Phillips, N. A., 1957: A coordinate system having some special advantages for numerical forecasting. J. Meteor., 14, 184-185.

Signell, R. P., H. L. Jenter, and A. F. Blumberg, 1994: Modeling the seasonal circulation in Massachusetts Bay. Proc. Third Int. Conf. on Estuarine and Coastal Modeling, ASCE, 578-590.

Stammer, D., and Coauthors, 2002: Global ocean circulation during 1992-1997, estimated from ocean observations and a general circulation model. J. Geophys. Res., 107, 3118, doi:10.1029/ 2001JC000888.

Szunyogh, I., E. J. Kostelich, G. Gyarmati, D. J. Patil, B. R. Hunt, E. Kalnay, E. Ott, and J. A. Yorke, 2005: Assessing a local ensemble Kalman filter: Perfect model experiments with the National Centers for Environmental Prediction global model. Tellus, 57A, 528- 545.

_____, _____, _____, E. Kalnay, B. R. Hunt, E. Ott, E. Satterfield, and J. A. Yorke, 2008: A local ensemble transform Kalman filter data assimilation system for the NCEP global model. Tellus, 60A, 113-130, doi:10.1111/j.1600-0870.2007.00274.x.

Vinogradova, N., S. Vinogradov, D. Nechaev, V. Kamenkovich, A. F. Blumberg, Q. Ahsan, and H. Li, 2005: Evaluation of the northern Gulf of Mexico littoral initiative (NGLI) model based on the observed temperature and salinity in the Mississippi Bight. Mar. Tech. Soc. J., 39, 25-38. Whitaker, J. S., G. P. Compo, X. Wei, and T. M. Hamill, 2004: Reanalysis without radiosondes using ensemble data assimilation. Mon. Wea. Rev., 132, 1190-1200.

_____, T. M. Hamill, X. Wei, Y. Song, and Z. Toth, 2007: Ensemble data assimilation with the NCEP global forecast system. Mon. Wea. Rev., 136, 463-482.

Wilks, D. S., 2006: Statistical Methods in the Atmospheric Sciences. 2nd ed. Academic Press, 648 pp.

Wunsch, C., and P. Heimbach, 2007: Practical global ocean state estimation. Physica D, 230, 197-208, doi:10.1016/ j.physd.2006.09.040.

ROSS N. HOFFMAN AND RUI M. PONTE

Atmospheric and Environmental Research, Inc., Lexington, Massachusetts

ERIC J. KOSTELICH

Department of Mathematics and Statistics, Arizona State University, Tempe, Arizona

ALAN BLUMBERG

Stevens Institute of Technology, Hoboken, New Jersey

ISTVAN SZUNYOGH

University of Maryland, College Park, College Park, Maryland

SERGEY V. VINOGRADOV AND JOHN M. HENDERSON

Atmospheric and Environmental Research, Inc., Lexington, Massachusetts

(Manuscript received 11 April 2007, in final form 31 December 2007)

Corresponding author address: Dr. Ross N. Hoffman, Atmospheric and Environmental Research, Inc., 131 Hartwell Avenue, Lexington, MA 02421-3126.

E-mail: ross.n.hoffman@aer.com

APPENDIX

Interface Definitions

a. File formats

The LETKF analysis requires all ensemble members, whereas the ECOM operates on a single ensemble member. The interface requirements exist therefore to convert the LETKF analysis file to an ensemble of ECOM analysis files and, conversely, to convert an ensemble of ECOM forecast files into the LETKF background file. We call these functions LETKF to ECOM (L2E) and ECOM to LETKF (E2L).

Input to the ECOM must be in the ECOM restart format (ERF), which contains a single snapshot (i.e., instantaneous, not time averaged) of the model state and all auxiliary information required for a hot restart. The ECOM output includes an ERF file and an ECOM nature format (ENF) file that holds the time-averaged model state at the end of the forecast interval. The LETKF ensemble format (LEF) files contain ensembles of forecast or analysis state vectors. Here, each of these files contains data associated with a single time.

b. The forecast-analysis procedure

As defined in Fig. 2 the forecast-analysis procedure starts with a current analysis and advances to the next analysis. We denote the initial and final times within the procedure as t^sub i^- = t^sub n- 1^ and t^sub f^ = t^sub n^ = t^sub i^ + Deltat, where Deltat is the time increment between analyses. At the end of such an assimilation cycle we increment t^sub i^ by Deltat and continue.

To begin the forecast-analysis procedure, L2E converts the previous analysis (i.e., at t^sub i^) to ECOM initial conditions (in ERF), using as a template the ERF file from the previous ECOM forecast initialized with the ensemble mean conditions. Some of the quantities taken from the template are strictly constants for the experiment, and the others provide reasonable estimates. In particular, the turbulence parameters for each ensemble member’s initial conditions are taken from the ensemble mean forecast. Other quantities taken from the template that are not constant will be recalculated before the second time step begins. A test showed little difference resulting from setting some of these to zero instead of copying these from the template.

Next, using the ERF files created by L2E, each ensemble member is advanced in time by the ECOM from t^sub i^ to t^sub f^ + t^sub a^/ 2, where t^sub a^ is the averaging time (1 h). The resulting forecasts valid at t^sub f^ and averaged over the interval from t^sub f^ – t^sub a^/2 to t^sub f^ + t^sub a^/2 are then combined by E2L for use as the background ensemble by LETKF. At the same time (see leftmost part of Fig. 2), the ensemble mean is advanced from t^sub i^ to t^sub f^ to provide the template in the next cycle.

A LETKF observation format (LOF) file is generated at the analysis time from the nature run. In this procedure, a random number generator is seeded by making use of the current experiment time. For each variable at each potential observation location, a random normal number with zero mean and the given standard deviation is taken as the observation error. Then the observation is kept if a uniform random number on the interval (0, 1) is smaller than the specified data density (e.g., d = 0.10). With this experimental design, the errors have the same structure in all experiments. First, the data locations from an experiment with higher data density form a superset of the data for any experiment with lower data density. second, if two experiments share an observation of a particular variable at a particular grid point and time, then the observation errors scaled by the specified observation standard deviation for each experiment will be the same. This approach makes the interpretation of sensitivity experiments clearer. For diagnostic calculations of statistics, at each analysis time a second LOF file with complete coverage (d = 1.00) and with zero errors is created.

Copyright American Meteorological Society Sep 2008

(c) 2008 Journal of Atmospheric and Oceanic Technology. Provided by ProQuest LLC. All rights Reserved.

**Topics:**Estimation theory, Bayesian statistics, Control theory, Assimilation, Kalman filter, Ensemble Kalman filter, Linear filters, Data assimilation, Forcing, Likelihood function, Robotics, Statistical hypothesis testing, ALAN BLUMBERG