The objective of this project is to improve understanding of the biological integrity of stream and river systems in the United States Mid-Atlantic Region by combining information from separate monitoring surveys, available contextual information on hydrologic units and remote sensing information.

We propose to develop spatial statistical models for measures of biotic integrity on the streams and rivers that capture the spatial variation in the measures. These models will be used to estimate the biotic integrity through the riverine system based on the information from multiple sources and scales. We will also quantify the uncertainty in the estimates and develop methods to visualize the resulting estimates and uncertainties.

We hope that the project will be used as a case-study of how state--of--the--art statistical methodology can be used to leverage the information in existing monitoring surveys.

The United States Mid-Atlantic Region

The study area is the Mid-Atlantic region of the eastern United States and its watersheds. This region is defined by the EPA to be the land and near coastal area that includes all of EPA Region III and parts of Regions II and IV. An elevation map of this region is given in Figure 1. The region extends from southern New York into northeastern North Carolina. The region includes EPA Region III (i.e., Pennsylvania, West Virginia, Maryland, Delaware, and Virginia); the Susquehanna and Allegheny River basins, which extend into New York; the Delaware River basin, which extends into New Jersey; and the Chowan-Roanoke and Neuse-Pamlico basins, which extend into North Carolina. The Mid-Atlantic region encompasses the area from the Mid-Appalachian highlands to the estuaries.

This region was chosen for a number of scientific and practical regions. The Mid-Atlantic region has been extensively studied by the EPA and other scientific groups. The region is one of the most data-rich areas in the country, in part because of its dense population and proximity to Washington, D.C.

Figure 2 represents the hydrography of the region, that is, the major rivers, streams and watersheds. The watersheds are represented by the (8-digit) hydrological units within the region. The information is based on the reach file (survey d). Figure 3 provides an overview of the land cover and land use type. The information is based on the Landsat Thematic Mapper data (survey c).

The project is to be a case-study for the combination of information from multiple sources. The accessibility of the case-study will be improved by basing it on readily available, compatible and mature data sets. Most of the data are available in formats readily adapted to standard GIS and statistical analysis packages (e.g., ARC/Info, SAS and S-PLUS).

The project will use as its foundation the work of the following four interrelated initiatives:

  1. The EPA/ORD Mid-Atlantic Integrated Assessment (MAIA)
    This multi-year project undertakes a ecosystem-based evaluation of the Mid-Atlantic region and its watersheds. A major objective is to produce a "State-of-the-Region" report that addresses strategic environmental management needs.
    MAIA incorporates numerous state, regional, and national environmental monitoring programs into an assessment process specifically targeted to the management needs of EPA Region III. Examples of programs with which there are specific cooperative efforts include the Environmental Monitoring and Assessment Program (EMAP), the mid Atlantic Highlands Assessment, the National Biological Service's Gap Analysis Program, the Chesapeake Bay Program, the Delaware Estuary Program, the Virginia Coastal Bays Program, the U.S. Geological Survey's National Water Quality Assessment Program, the Forest Service's Forest Inventory and Analysis Program, and the National Oceanic and Atmospheric Administration's Coastal Change Analysis Program.
    For an description of MAIA go to
  2. An Ecological Assessment of the United States Mid-Atlantic Region: A Landscape Atlas
    The Atlas is a freely available EPA report assessing relative ecological conditions across the Mid-Atlantic region, and was published in April 1998. The Atlas identifies, with never-before achieved detail and comparability, patterns of land cover and land use across the region. The Atlas represents one of the first regional-scale ecological assessments of the Environmental Monitoring and Assessment Program (EMAP).
    The report is based on data from satellite imagery and spatial databases on biophysical features such as soils, elevation, and human population patterns. It compares nine landscape indicators on a watershed-by-watershed basis for the lower 48 states (at a relatively coarse-scale resolution of 1 km), placing the mid-Atlantic region in the context of the rest of the country. Using finer-scale spatial resolution (e.g., 30-90 meters), the report then analyzes and interprets environmental conditions of the 125 watersheds in the mid- Atlantic region based on 33 landscape indicators. Results are presented relative to four general themes identified by stakeholders in the region: (1) people (potential human impacts), (2) water resources, (3) forests (forest habitat), and (4) landscape change.
    The data underlying this Atlas will prove invaluable for the project. For an description, see the
  3. Mid-Atlantic Regional Assessment (MARA) of Climate Change Impacts conducted at the Pennsylvania State University
    The study is being conducted as part of the U.S. National Assessment, under the auspices of the U.S. Global Change Research Program. The PSU study is supported by the EPA and is scheduled for completion by January 2000. Four questions guide MARA:
1. What are the region's current stresses and related issues?
2. How would climate change and variability affect these stressors, or create new ones?
3. What actions would increase the regions resiliency to climate variability, reducing negative impacts and taking advantage of opportunities created by climate change?
4. What new information is needed to better answer questions 1) and 2) and to evaluate adaption options.

The assessment builds on data from many sources, especially the above two. It provides additional knowledge about the region that will prove useful for the project. Of particular interest is work done by the Water Resource Impacts group. For a description of see MARA and Water Resource Impacts group.

Specific Component Surveys

The components surveys we use are:

  1. EMAP Mid-Atlantic Integrated Assessment (MAIA) Survey (Larsen and Christie 1993).
We will use the Stream and River Survey that has data on 100-200 sites from 1993-96. Some of the sites are repeat visits.
For a description of the EMAP Surface Waters Mid-Atlantic Streams 1993-96 data set see data.
For a description of the Fish Metrics data see
b) Maryland Biological Streams Survey (MBSS) (Heimbuch, Seibel, Wilson and Kazyak 1998)
To provide much needed information about the ecological consequences of acid deposition and other human-related impacts, the Maryland Department of Natural Resources designed the MBSS. The MBSS is a long-term monitoring program designed to describe the current status of aquatic biota, physical habitat and water quality in first, second and third order non-tidal streams within the state of Maryland. The MBSS was implemented as a three-year study in 1995. Sampling is probability-based, and stratification is based on stream order and drainage basin. Approximately 1000 sites were sampled during 1995-1997. A "State of the Streams" report which summarizes the initial round of the MBSS will be completed in May, 1999.
For a description of the MBSS see
c) EMAP Mid-Atlantic Landscape Indicators (Kepner, Jones, Chaloud and Wickham 1995)
This project will supply landscape classification data with a resolution of 30 meters based on satellite imagery.
For a description of the Mid-Atlantic Landscape Indicators, see

These surveys will be supplemented by:

d) EMAP Streams network database (RF3) (Dewald and Olsen 1994).
This is the primary database for the locations of the rivers and streams in the Mid-Atlantic region. We will use the River Reach File Version 3, derived from the U.S. Geologic Survey Digital Line Graph - streams, 1:100,000-scale.
e) USDA Natural Resources Conservation Service soils database(USDA 1994)
This data set is a digital general soil association map developed by the National Cooperative Soil Survey and distributed by the Natural Resources Conservation Service (formerly Soil Conservation Service) of the U.S. Department of Agriculture. It consists of a broad based inventory of soils and nonsoil areas that occur in a repeatable pattern on the landscape and that can be cartographically shown at the scale mapped.
For a description of the soils database, see

We currently creating a GIS incorporating these components based on ESRI's ARC/INFO GIS software on a Unix, Sun Sparc workstation network. The GIS is being developed at the Department of Statistics, The Pennsylvania State University. As the final GIS will contain information for the Mid-Atlantic Region, the size of the GIS will be substantially smaller than the combined size of the component surveys.

Overview of Statistical Modeling

In section we briefly describe how the data sources will be combined. Let R, a subset of R2, be the set of locations on rivers and streams in the Mid-Atlantic region. We will define R operationally by those in the River Reach File Version 3 (RF3). Let HUC(x) represent the hydrologic unit that the location x belongs, and {HUC{i}:i = 1,..., H } represent the set of all hydrologic units. The units form a partition of R. Let Z(x) be a measure at each location in R. We consider a number of indicators of condition and stress related to fish or water chemistry. Most of the measures are continuous. However we also consider derived measures of exceedences:

E(x) = I[Z(x) <= L] for given L

where L is a pre-specified limit on the measure. The same fundamental modeling framework will be applied to both forms of response. We use a generalized linear modeling framework that reduces to a linear modeling framework for the most of the continuous variables. Through out we will use as an example the fish index of biotic integrity (IBI) (Karr 1986). Other fish metrics can also be used. First we describe a model for the measure at each location. We write:

Z(x) = X(x)beta_1 + C(x)beta_2 + S(x)beta_3 + phi(x;theta) + eta(HUC(x);gamma) + epsilon(x)

where the first three terms capture variation due to differences in covariates, the phi and eta terms capture residual spatial variation and the last term the unexplained variation. The first three terms are:

X(x): row vector of point measured covariates at location x.
These measures are required to be known at each location in R. Examples of covariates are latitude, longitude, and elevation. This set is restricted because we need to know it at each value in R. C(x): row vector of contextual covariates related to location x.
These measures are required to be known at each location in R. Examples, of covariates are characteristics of the reaches from RF3 such as stream order, stream level. Variables on soil types from the USDA Natural Resources Conservation Service soils database are include here. This set is restricted because we need to know it at each value in R.
S(x): row vector of complete coverage covariates related to location x.
These measures are assumed to be known at each location in the MAR, including those at each location in R. Examples of covariates are biophysical features such as soils, elevation, and human population patterns available from the Landscape Atlas data base and other satellite based landscape indicators. Each of these terms appears in a linear functional form with regression coefficient vectors (beta_1, beta_2, beta_3). The functional form of the covariate vectors themselves will need to be adapted so that this functional form is appropriate.
The spatial variation terms represent the effects of unadjusted for or unobserved covariates as well as the effects of spatial proximity.
phi(x;theta): latent hydrologic effects within the hydrologic unit of location x.
It is assumed that each {phi(x;theta): x in HUC{i}, i=1,...,H} forms a spatial random field within each HUC, and the values are dependent between HUCs. The parameter theta defines the structure of the spatial variation. The model within each HUC will be of spatial covariance form based on a neighborhood system. Consider a neighborhood system for x based on being on the same stream segment (according to RF3). That is, two location are neighbors if, and only if, they belong to the same stream segment. One would expect that, all else being equal, two location on the same stream would more likely have closer values on the measure than two locations on separate streams. We expect that a number of neighborhood schemes will drive the spatial variation. For example locations belong to the same stream segment, locations belong to the same stream segment, at the same order, locations belong to the same stream, but at different orders of the stream, locations belong to different streams, but have the same order and source. The above model can be generalized to this case:
eta(HUC(x);gamma): latent hydrologic effects between the hydrologic units.
Each location within the same hydrological unit receives the same effect. It represents the overall level differences between the units. It is assumed that {eta(i;gamma): i =1, ..., H} forms a spatial lattice random field. The simplest model has the values independent of each other. We use neighborhood based lattice model.
For a discrete response we use the formulation of Diggle, Moyeed and Tawn (1998) for pi(x) = IE(E(x)).
Based on this model, we use likelihood-based inference for Z(x) to infer the parameters, and also to determine posterior predictive distributions for {Z(x): x in R}. Based on these, we aggregate up to determine the spatial cumulative distribution function in each hydrologic unit - that is, F{i}(r), the integral over x in HUC{i} of I[Z(x) <= r ]. These distribution functions, numerical summary measures derived from them will be the basis of output from the model.


Heimbuch, D., Seibel, J., Wilson, H, and Kazyak, P. 1998. A Multi--Year Lattice Sampling Design for Maryland--Wide Fish Abundance Estimation. Presented at the "Conference on Environmental Monitoring Surveys over Time," April 20-22, 1998, University of Washington.
Kepner, W. G., Jones, K. B., Chaloud, D. J., and Wickham, J. D. 1995. Mid--Atlantic landscape Indicators Project Plan. EPA/620/R-95/003. Washington, D.C.: U.S. Environmental Protection Agency.
Larsen, D. P., and S. J. Christie, Eds 1993. EMAP-Surface Waters 1991 Pilot Report. EPA/620/R-93/003. Corvallis, Oregon: U.S. Environmental Protection Agency.
U.S. Dept. of Agriculture, Natural Resource Conservation Service, 1994. State Soil Geographic (STATSGO) Data Base: Data Users Guide. The National Cartography and Geospatial Center, Fort Worth, TX.
Dewald, T. and Olsen, M. 1994. The EPA Reach File: A National Spatial Data Resource. U.S. Environmental Protection Agency, Office of Water.