Skip to main content


This section provides a brief overview of bioassessment theory and application. Much more detailed treatments are available elsewhere, and interested individuals should consult those web sites (e.g., EPA OWOW) and the relevant literature. To facilitate access to this large literature, we have provided a list of publications organized by the following topics:

General Concepts and Treatments

Multimetric Approaches

Effects of Sample Processing

RIVPACS-type Predictive Models

Diagnosing the Causes of Biological Impairments

Effects of Taxonomic Resolution and Rare Species

Other Predictive Models

Survey Design

Data Quality Control

Artificial Neural Networks

Sample Representativeness and Comparability

Sampling Protocols.

You can find the list of literature in our Bioassessment Literature page.

Developing and implementing a bioassessment or monitoring project requires that the specific objectives of the project be clearly defined. Once objectives are defined, it is possible to consider which sampling design, sample methods, and statistical analyses are most appropriate.

Questions of general interest to many practitioners include:

  1. What is the current status of biological conditions at specific sites and what is the trend in condition over time?
  2. What is the overall status of sites within a region and how is that condition changing over time?
  3. What are the causes of observed biological impairment?
  4. Are restoration programs improving the biological condition at a site or within a region?

We and others are actively addressing basic science questions to help practitioners answer these management questions. The questions we are specifically interested in include:

  1. How can we best predict (e.g., classification and modeling) the natural biological potential of specific sites to most effectively account for natural variability in ecological systems and thus allow for the most accurate and precise assessments possible?
  2. How do different biotic indicators (endpoints) compare in their accuracy and precision in detecting and quantifying known or assumed biological impairment?
  3. How does sampling design and sample processing affect the accuracy and precision of bioassessments?
  4. How can we most effectively estimate the tolerances of different taxa to specific stressors?
  5. How can we best summarize the tolerances of taxa observed at a site to infer the likely cause of biological impairment?
  6. How can we measure and control data quality in bioassessment?

To evaluate biological conditions at a site, we need to know what biological assemblage should be expected at the site if no human impact has occurred (see Hawkins et al. 2010 for review). However, unlike in toxicity tests, true controls are seldom available for natural ecosystems. Steam assemblages naturally change across space and over time. Undisturbed sites upstream or downstream of potentially impaired sites, even if available, have proven inadequate as controls in many cases because of the effect of natural variation along longitudinal gradients (Hughes et al. 1986) and the inability to avoid pseudo-replication of samples used as controls (i.e., sites within a common reach are not statistically independent of one another). A group of sites that are geographically independent from one another and whose biological assemblages are under minimal or least stress in a region can serve as controls or reference conditions against which we can judge the effect of human activities on a site (Hughes et al. 1986). However, because the biota found at reference sites vary with local and regional environmental conditions (e.g., natural vegetation, hydrology, biogeographical history, habitat characteristics, etc.), reference sites need to be classified so that assessed sites are compared with only those sites that best describe the biological potential of each assessed site. Many classification systems are possible (e.g., geographic region, ecoregion, stream size, thermal regime, hydrologic regime, dominant geology, etc), but it is not yet clear which system is most effective in partitioning natural variation in biotic composition among reference sites and how accurately the expected biotic composition of a site can be predicted (Hawkins et al. 2000b).

There is also considerable confusion in the use of the term ‘reference condition’ in literature. Stoddard et al. (2004) differentiate four types of reference conditions:

Historical Condition is that condition of an ecosystem at some point of their historical development prior to significant alteration by agriculture or industry.

Minimally-Disturbed Condition is the condition of an ecosystem in the absence of significant human disturbance.

Least-Disturbed Condition is that condition found in at the best available physical, chemical, and biological habitat conditions given the current state of the landscape. The best condition will vary from one region to another and over time.

Best Attainable Condition is that condition equivalent to the ecological condition that would obtain under best possible management practices and can fall anywhere between Minimally-Disturbed condition and Least-Disturbed condition.


More details regarding the definition of reference condition and the and application of the concept can be found in Hughes et al. (1986), Reynoldson and Wright (2000), Bailey et al. (2004), Stoddard et al. (2004), and Hawkins et al. (2010).

Human disturbances influence the structure and function of ecosystems in a variety of ways, depending on the type, magnitude, and scale of disturbance and the environmental and biological characteristics of individual ecosystems. Responses of ecosystems to disturbances include changes in species composition, species relative abundance, species diversity, food-web structure, productivity, decomposition, and nutrient recycling (e.g., Rapport et al. 1985, Schindler 1987, Schindler 1990, Karr 1991, Cao et al. 1997). Of the numerous methods that have been proposed to summarize ecosystem responses, most focus on measuring aspects of assemblage structure. The methods on which these assessment tools are based range from characterization of simple biotic indexes to sophisticated multivariate analysis and statistical modeling (Rosenberg and Resh 1993, Wright et al. 2000).

The assessment endpoints produced by these methods (e.g., Indices of Biotic Integrity values, O/E values derived from predictive models such as the River InVertebrate Prediction And Classification System (RIVPACS), or statistical tests associated with multivariate comparisons of assemblage structure based on the BEnthic Assessment of Sediment Toxicity (BEAST) model or Analysis of Similarity (ANOSIM), are assumed to be correlated with overall ecosystem health. However, we do not know at this time if these methods lead to the same conclusions regarding the status of biological conditions. We need to understand how well each method measures the intended biological attribute and how comparable their assessment endpoints are (e.g., Cao and Hawkins 2005, Hawkins et al. 2010).

After evaluating the biological condition of an ecosystem, a logical next step is to determine the causes of ecological degradation for the purpose of targeting specific management and restoration activities. Stressor identification is a critical first step in conducting a meaning Total Daily Maximum Load (TMDL) analysis (NRC 2001). Given the spatial and temporal scales at which stressors occur and the complexity of ecosystem response to stressors, laboratory toxicity tests and mesocosm experiments alone are of limited use for diagnosing the cause of ecosystem impairment, although these approaches can be effectively employed in combination with field surveys (e.g., Clements 2004). Several frameworks have been developed for stressor identification (Norton 2000, 2001, Suter et al. 2002, de Zwart et al. 2004).

These procedures typically start by establishing a list of possible causes of impairment based on correlative statistical analyses. The level or frequency of exposure of a potential stressor (e.g., heavy-meal concentration or temperature) at a site is then compared with the levels required to cause biological impairment in laboratory tests (Suter et al. 2002). These approaches depend heavily on weighting multiple lines of evidence, much of which may not be direct evidence. We are working on methods based on stressor-specific responses of different taxa to more directly identify the stressors that are likely affecting biotic condition (e.g., Yuan and Hawkins 2004).

 North Fork

The U.S. Clean Water Act (section 303D) requires state environmental agencies to identify which water bodies (e.g., a lake or a stream reach) have been impaired and to take action to restore their ecological integrity. Restoration programs also need to determine the effectiveness of management practices designed to restore ecological integrity. Two conditions are critical for site-scale bioassessment: (1) establishing appropriate reference conditions and (2) adequately characterizing the biological indicator of interest.

The two most commonly used approaches for establishing reference condition are Before-After, Control-Impact (BACI) or Impact versus Reference Condition (IVRS) designs (Stewart-Oaten and Bence 2001). BACI designs estimate error variance by measuring variation in indicator values over time, whereas error in IVRS designs is based on variation among reference sites. The IVRS approach is equivalent to the Reference Condition Approach discussed earlier and is more commonly used in assemblage-based bioassessment because before-impact data are often not available.

A meaningful assessment is not possible without adequately characterizing both the observed and expected biotic assemblage. Adequate characterization is important regardless of whether assessments are based on samples taken from targeted habitat types (e.g., riffles) or the range of habitats that occur in a reach (e.g., Ostermiller and Hawkins 2004). We define adequacy in terms of both the precision and accuracy required by a particular project. Most biological monitoring programs in the US collect multiple samples from a reach, pool them together, and then randomly pick a given number of individuals in the laboratory from the composite sample (i.e., fixed-count subsampling). The length of reach that is sampled also varies among different programs. Barbour et al. (1999) recommends 100 m long reaches, whereas the USEPA’s EMAP sampling protocols target reaches that are 40 times as long as wet-channel widths (Lazorchak et al. 2001). Much debate has occurred regarding how many individuals should be sub-sampled (e.g., Barbour and Gerritsen 1996, Vinson and Hawkins 1996, Coutemanch 1996, Walsh 1997, Grown et al. 1997, Cao et al. 1998, 2002, Sumers et al. 1998, Doberstein et al. 1999, King and Richardson 2002, Ostermiller and Hawkins 2004). Most of these studies evaluated how well different fixed counts detected difference in taxa richness, biotic indices, or community similarity between a set of reference sites and a set of assumed impaired sites. Cao et al. (2002, 2003) suggest that standardizing samples based on the percent of the taxa captured at a site is a better way of ensuring adequate and consistent characterization of an assemblage than fixed-count subsampling. This topic remains an active area of research.

Assuming that we can establish meaningful reference expectations and can adequately characterize both the observed and expected the biota, site-specific assessments generally involve answering two somewhat different questions.

Question 1 - Are conditions at a site outside the expected range of natural variability, i.e., is the site impaired? We can answer this question with known confidence with a single sample taken at a site by determining if the indicator value estimated from this sample falls within or outside the distribution of values expected under reference conditions. This is a simple statistical test requiring only that we agree on what probability threshold we will use to infer if a new observation is different than expected. For example, we might choose to define any new observation that is less than the 5th or greater than the 95th percentiles of reference values as different from reference. Note, that in this case, we would mistakenly conclude that a reference-quality site was impaired 10% of the time, something called a type I error in statistical jargon and, at face value, something we would want to minimize. However, we also know that type II statistical errors occur, which is the tendency to conclude that a site is in reference condition when it is actually impaired. The probability of committing these two types of errors are inversely related to one another, and if we set type I error rates too low, significant impairment can occur before we detect it. We therefore need to balance these errors in such a way that we ensure that continual resource degradation does not occur while also being fair to the regulated community.

Question 2 - What is the magnitude of impairment, i.e., how far from reference is a site’s current condition? Managers often classify the magnitude of biological impairment into categories based on the value of the index used (Karr 1991, Furse 2000). Although an estimate based on a single sample will provide insight regarding the degree of impairment at a site, we cannot describe how confident we are in that estimate and hence the ‘class’ of impairment the observation belongs in. If we need to know the confidence with which we have estimated biotic condition, we can use replicate samples to both generate a more robust estimate of the actual condition at a site as well as an estimate of the confidence associated with the estimate.

These questions cannot be answered in the same way with all bioassessment approaches. The tests described above apply to both IBI-type indices and RIVPACS-type assessment values (O/E), which summarize biological condition into one value. However, assessments that simultaneously compare entire observed and expected assemblages with multivariate statistical models, such as the BEAST approach developed by Reynoldson et al. (1995), base inferences of impairment on whether a new observation falls outside pre-defined confidence ellipses (90%, 95%, etc.) that characterize the variability among appropriate reference sites in multivariate ordination space.

Research Needs - One of the critical research areas that the Center is pursuing is the extent to which a site’s assessed condition varies with either sampling effort (e.g., Ostermiller and Hawkins 2004) or the bioassessment method used (e.g., IBI versus RIVPACS or different RIVPACS models), particularly when the impairment is subtle or intermediate (Cao and Hawkins 2005).

Regional Assessments

 Lower Font

Assessment of biological conditions at regional and national levels is critical for informing the public of the condition of the Nation’s aquatic resources and providing objective data on which to help inform environmental policy (Heinz Center 2002). We need to estimate how both average conditions as well as how biological conditions vary in time and space within a region. An effective monitoring design is critical in achieving these goals. Two types of survey designs have been commonly used for sampling within a region, empirical and statistical designs (Stevens 1994). Empirical designs target sites of different types in proportion to their occurrences in the environment. In a statistical design, every element in a ‘population’, such as all streams or lakes in a region, has some chance to be sampled; and site selection is carried out by a probability-based design (Stevens 1994, Hughes et al. 2000).

Number of sampling sites - Neither of these designs informs us of how many sites should be sampled (Cao et al. 2003). Just as it is critical that we adequately characterize the biota at a site, it is important that regional sampling designs adequately characterize the distribution of resource conditions within a region. One way of determining how many sites are needed in a regional survey is to calculate the average compositional similarity among pooled randomly selected sites in which the number of sites pooled in comparisons increases until all sites have been pooled into two large samples (Cao et al. 2003). This ‘autosimilarity’ value will increases with the number of sites that are pooled, and the number of sites needed to reach an agreed upon autosimilarity value can be extracted from this relationship. In general, the autosimilarity value is interpreted as the proportion of taxa collected across all samples that are encountered when N sites are sampled. Although the number of sites needed to ensure collection of all taxa will exceed the resources available for such surveys, an autosimilarity analysis based on pilot data can identify the number of sites needed to achieve a consistent characterization across all regions of interest. Not surprisingly, more biologically heterogeneous regions require more sites than less heterogeneous regions.

Combining data sets – Because regional assessments are expensive, water resource managers may be interested in combining data sets collected by different agencies within a region. Combing data sets can be problematic because data may not be comparable, i.e., samples may have been collected with different methods or targeted different habitats, etc. At the national level, many state and several federal agencies (NPS, USGS, USDA, EPA) conduct independent monitoring and assessment programs. These programs differ considerably in sampling design, sample collection and processing, and data analysis (Carter and Resh 2001, Houston et al. 2002). The comparability of these data is largely unknown. This problem has prevented us from estimating the overall biological condition of freshwater bodies in individual states and the United States as a whole (GAO 2002). There is an urgent need to standardize bioassessment methods at national levels as much as possible, and determine what data sets are comparable enough that they can be combined. The Center is collaborating with others to determine ways of quantifying the comparability of existing data sets and is working with both state and federal agencies to communicate the value of sampling standardization (Cao et al. 2004).

Regional characterization of the status of individual taxa – A complementary way to assess the impact of human disturbances on ecosystems is to compare observed frequencies of detection for individual taxa with their expected frequencies of detection under reference conditions. For a given number of randomly chosen sites (N), if a taxon is recorded at n sites, n/N estimates its frequency of detection given the sampling effort used at a site. Only a few taxa tend to occur in many samples (common taxa), and the vast majority of taxa occur only in a small number of samples (rare taxa). To determine whether and how severely human disturbances have altered the frequency of detection of taxa, we have to estimate its expected frequency of detection under reference conditions (Hawkins and Yuan 2004). The departure of the observed frequency of detection estimated from a set of randomly selected sites from the expected frequency measures the effect of human disturbances on that taxon. These estimates can be aggregated by taxonomic group to identify types of taxa that are especially sensitive to human-caused alterations in environmental conditions. We are assessing the utility of RIVAPCS-type models in generating robust estimates of expected frequencies of detection.

 South Fork Rogue River

Many methods have been devised for measuring aquatic biological conditions (reviews by Metcalfe 1989, Johnson et al. 1993). The Saprobic System was developed in the early twentieth century (Kolkwitz and Marsson 1909), and diversity indices were widely used in the 1950s and 1960s because of their simplicity and mathematical elegancy. However, in the past few decades, diversity indices have been increasingly criticized for failing to detect and quantify human disturbances (Washington 1984, Cao et al. 1997), and now they are rarely used alone. Several biotic indices were developed during the same period, including the Trent Biotic Index, the Chandler Score System, and the Biological Monitoring Working Party (BMWP) scoring system. These indices are basically an extension or simplification of the Saprobic System and are used in many European countries. In North America, two biotic indices, Hilsenhoff’s Biotic Index (Hilsenhoff 1987) and the North Carolina Biotic Index (Lenat 1993) are in use. The Hilsenhoff Biotic Index is most commonly used as one of several metrics in an IBI, whereas the North Carolina Index is often used alone.

In the early 1980s, two new bioassessment techniques were developed in the United Kingdom and the United States, respectively. The method developed in the United Kingdom (River Invertebrate Prediction and Classification System or RIVPACS) assesses site condition based on a comparison of observed and predicted taxa. The method developed in the United States (the Index of Biotic Integrity or IBI) assesses condition based on a comparison of observed values of an index that is the sum of several indices with values expected under reference conditions. Both methods require that reference condition be estimated from samples collected at an appropriate set of reference sites.

Ecologists and statisticians at the Institute of Ecology, UK developed the RIVPACS approach (e.g., Moss et al. 1987, Wright 1995, 2000). This approach starts by classifying reference sites based on similarity in their taxonomic composition. The class membership of an assessed site, and hence information regarding that site’s expected taxa, is predicted with discriminant functions models (DFMs) based on a set of time-invariant environmental features (e.g., catchment size, elevation). By weighting frequencies of detection of each taxon within reference classes by DFM-derived probabilities of class membership, RIVPACS predicts the probabilities of detecting specific taxa at individual sites under reference conditions. Assessments are based on a comparison of the number of predicted taxa that are observed at a site (O) with the number of taxa that were expected (E). This approach has been adopted or explored in several other countries including Australia (e.g., Pearson and Norris 1996, Turak et al. 1999, Marchant et al. 2002, Smith et al. 2001), Canada (Reynoldson et al. 1995, 2001), New Zealand (Joy and Death 2000,2002, 2003), Sweden (Johnson 2003), and the United States (Hawkins et al. 2000, Ostermiller and Hawkins 2004).

During the same period, the IBI (a multimetric index), was developed in the United States (Karr 1981, 1991). The method was first developed for fish but has been extended to stream macroinvertebrates (Plafkin et al. 1989, Barbour et al. 1999), periphyton (Bahls 1993, Barbour et al. 1999, Hill et al. 2000, Hill et al. 2003), and birds (Bryce et al. 2002). This approach starts by calculating values of multiple biotic metrics (e.g., total richness; combined mayfly, stonefly, and caddisfly richness; proportion of tolerant individuals), and rescales these values against the values obtained from reference sites (Barbour et al. 1999). These rescaled metrics are then summed to estimate the IBI, a measure of overall biological integrity. The value estimated at an assessed site is then compared to the distribution of values observed at reference sites. Ecoregions (Ormernik 1987) are often used to classify reference sites into biological similar groups and thus make predictions about IBI values at assessed sites within an ecoregion. An ecoregion is a geographic area within which stream assemblages are assumed to be relatively homogenous. Ecoregions are delimited based on land cover, land surface form, potential natural vegetation, and soil type. In the United States, levels III and IV ecoregions are most commonly used in bioassessment. The multimetric approach has been widely implemented in the United States and several other countries.

Several other approaches have also been developed. These methods include comparisons of observed assemblages with those predicted with neural networks (Spitz et al. 1999, Brosse et al. 2001, Hoang et al. 2001), E-Ball (Linke et al. 2004), and direct multiple regression (Chessman 1999). These methods have the potential to supplement, support, or replace existing methods, but they have yet to be rigorously and thoroughly evaluated.


Multivariate versus multimetric approaches. With many methods available for bioassessment, it is no surprise that disagreements have arisen among ecologists regarding which methods should be used. In particular, the relative merits of multivariate and multimetric approaches have been debated actively in the past few years (Gerritsen 1996, Fore and Karr 1996, Reynoldson et al. 1997, 2001, Karr and Chu 1999, Karr and Chu 2000, Norris and Hawkins 2000). Much of the debate is regarding the conceptual and technical frameworks, and practicability (Gerritsen 1995, Karr and Chu 2000, Norris and Hawkins 2000). However, empirical evidence for or against either approach is limited and inconclusive because the real biological impairment at a site is almost always unknown. Rigorous scrutiny of ideas and methods is a critically important aspect of the scientific method, something that ultimately assures that we achieve the best understanding of ecological phenomena possible and the most effective tools to address applied questions. The Center is committed to evaluating the performance of both existing and developing assessment methods. We are especially interested in the use of simulation models to create known biological impairment against which we can examine how accurately and precisely different methods detect and quantify true impairment. Cao and Hawkins (2005) describe a simple model to simulate biological impairment and a procedure to evaluate how well different methods quantify known taxa loss and assemblage-composition changes, and Hawkins et al. (2010) applied this model to evaluate the performance of several MMIs and O/E indices.

Inclusion or exclusion of rare taxa. The bioassessment community has also vigorously debated the role of rare taxa in detecting and quantifying ecological impairment (e.g., Cao et al. 1998, 1999, 2001, Marchant 1999, 2002, Karr and Chu 1999, Lenat and Resh 2001, King and Richardson 2002, Turak and Koop 2003, Ostermiller and Hawkins 2004). This debate involves several questions. First, is it logical to remove taxa of a given rarity at all sites? Minimally-disturbed sites often support more rare taxa than disturbed sites. Intentionally removing rare taxa from analyses or effectively removing them by using a small sampling effort may therefore affect the characterization of richness and assemblage structure more strongly at minimally or least-disturbed sites than impaired sites, therefore introducing bias into assessments (Cao et al. 1998, 2002). Second, are rare taxa generally more sensitive to disturbance than common ones? Cao et al. (1999) suggested such a rarity-sensitivity relationship based on general observations that (1) rare taxa are typically associated with restricted distributions and abundant ones with wide geographic ranges or environmental conditions and (2) extinction risk increases with decreasing population size. However, empirical evidence supporting this relationship is scarce, particularly for freshwater ecosystems, and many rare taxa have been shown to increase in response to disturbance (Hawkins et al. 2000a). It is likely that the sensitivity of specific taxa differs with the type and magnitude of disturbances (Cao et al. 2001), and this question therefore needs much more work. Third, does removal of rare taxa really change the assessment of biological conditions at a site? Marchant (2002) examined the ability of O/E values derived from several different probability of capture thresholds to detect the effect of dams and concluded that rare taxa played no useful role in predictive models, because O/E based on probabilities of capture > 0.5 (i.e., common taxa) adequately detected effects. In contrast, Turak and Koop (2003) demonstrated that the inclusion of rare taxa improved the capability of AusRivas to detect human impacts. Nijboer and Schmidt-Kloiber (2004) observed that the exclusion of rare taxa lead to an underestimation of human disturbances. Ostermiller and Hawkins (2004) offered a new understanding of the rare taxa issue. They found that the standard deviation (SD) of O/E values at reference sites markedly decreased when fixed counts increased from 100 to 300. Because large samples basically mean more rare taxa are included in a sample, this observation implied that inclusion of rare taxa aided in detecting biological impairment. On the other hand, they showed that SD of O/E was lower at P > 0.5 than at P > 0.0, which suggested that rare taxa also added ‘noise’ in predicting taxa occurrences. They concluded that it was taxa detectability, rather than rarity per se, that was important in affecting the performance of RIVPACS-type models. Since then Van Sickle et al. (2007) showed that O/E indices based on P > 0.5 tended to perform better (higher precision, greater sensitivity in detecting disturbance) than O/E indices based on all taxa (P > 0) for a variety of different models. More empirical studies are clearly needed to understand the role of rare taxa in bioassessments with particular attention paid to the interactions between observed rarity, sampling effort, and the type and magnitude of disturbances.

Low versus high taxonomic resolution. Rapid bioassessment aims to evaluate the biological conditions at a large number of sites quickly and economically. Because it takes increasing time and money to identify organisms to increasingly high levels of taxonomic resolution, many studies have examined whether family-level identifications are sufficient to detect impairment. These studies can be classified into three types. The first type of study focused on the classification of assemblages. Several studies report that family-level and species- or genus-level data yield similar site clusters or ordination patterns (e.g., Marchant et al. 1995, Bowman and Bailey 1997, Bailey et al. 2001, Hewlett 2000). These observations agree well with many studies conducted in the 1960-70s that show that variation in assemblage composition along strong, large-scale environmental gradients can be detected with coarse taxonomy, subsets of the assemblage, or abundant taxa only. However, the classification of assemblages is just one step in assessing the condition of a site and should not be equated with the actual assessment of a site (Cao et al. 2001). A second type of study examined how taxonomic resolution affected the relationships between assemblage structure and stress measures (e.g., heavy metals, pH, nutrients, land use) or environmental variables (e.g., stream width, substrate, and slope). For example, King and Richardson (2002) demonstrated that species-genus level data yielded stronger correlations between similarity matrices derived from biological and environmental data. Hill et al. (2001) and Waite et al. (2004) reached similar conclusions. A tight assemblage-environment relationship is required for precisely predicting the assemblage composition expected at a site and for estimating the environmental optima and tolerance ranges of taxa to specific environmental conditions, e.g., pH and nutrients (e.g., Pan et al. 1996). Hawkins and Norris (2000) showed that assemblage groups derived from species-genus data could be more accurately predicted from environmental variables than those derived from family-level data. The last type of study investigated how taxonomic resolution affects bioassessment per se. Hawkins et al. (2000a) showed that their RIVPACS-type model based on species-genus level data was more sensitive in detecting the effect of logging than one based on family-level data. Similarly, King and Richardson (2002) found that non-metric multidimensional scaling based on species-genus level data more accurately detected human disturbances than that based on family-level data. However, Reynoldson et al. (2001) observed that their predictive model based on family-level data performed better than one based on finer taxonomic resolution. Lenat and Resh (2001) discuss the importance of fine taxonomic resolution for general ecology, biodiversity conservation, and bioassessment.

Because the biology and environmental requirements of different species in the same family can differ greatly, there is no doubt that species or genus-level data will usually provide more information about the assemblage of interest and their environments than family-level data. Whether this extra information is important depends on the objective of a particular study. For example, family-level data may be sufficient to classify assemblages in a large region (e.g., Hewlett 2000). Coarse identification might also be sufficient to detect a severe impairment; however, finer resolution identification is probably required to detect slight to intermediate impairment or differentiate between levels of impairment (Waite et al. 2004). However, the assumption that coarse identification is generally sufficient for bioassessment could lead to a failure in detecting and quantifying actual biological impairments. For that reason, we generally recommend that taxa be identified to the lowest taxonomic resolution possible.

Bahls, L.L. 1993. Periphyton bioassessment methods for Montana streams. Montana Water Quality Bureau, Department of Health and Environmental Science, Helena, Montana

Bailey, R.C., R.H. Norris, and T.B. Reynoldson. 2001. Taxonomic resolution of benthic macroinvertebrate communities in bioassessments. Journal of the North American Benthological Society 20:280-286.

Bailey, R.C., R.H. Norris, and T.B. Reynoldson. 2004. Bioassessment of Freshwater Ecosystems using the Reference Condition Approach. Kluwer Academic Publishers, New York.

Barbour, M.T. and J.Gerritsen. 1996. Sub-sampling of benthic samples: a defense of the fixed-count method. Journal of the North American Benthological Society 15:386-391.

Barbour, M.T., J. Gerritsen, B.D. Snyder, and J.B. Stribling. 1999. Rapid bioassessment protocols for use in wadeable streams and rivers: Periphyton, benthic macroinvertebrates, and fish. EPA 841-B-99-002. US-EPA, Washington, D.C.

Bowman, M.F., and R.C. Bailey. 1997. Does taxonomic resolution affect the multivariate description of the structure of freshwater benthic macroinvertebrate communities? Canadian Journal of Fisheries and Aquatic Sciences 54:1902-1807.

Brosse, S., S. Lek, and C.R. Townsend. 2001. Abundance, diversity, and structure of freshwater invertebrates and fish communities: an artificial neural network approach. New Zealand Journal of Marine and Freshwater Research 35:135-145.

Bryce, S.A., R.M. Hughes, and P.R. Kaufmann. 2002. Development of a bird integrity index: using bird assemblages as indicators of riparian condition. Environmental Management 30:294-310.

Cao, Y., and C. P. Hawkins. 2005. Simulating biological impairment to evaluate the accuracy of ecological indicators. Journal of Applied Ecology 42: 954-965.

Cao, Y., C.P. Hawkins, and M.R. Vinson. 2003. Measuring and controlling data quality in biological assemblage surveys with special reference to stream benthic macroinvertebrates. Freshwater Biology 48:1898-1911.

Cao, Y., D.P. Larsen, and R. St-J. Thorne. 2001. Rare species in multivariate analysis for bioassessment: some considerations. Journal of the North American Benthological Society 20:144-153.

Cao, Y. and D.D. Williams. 1999. Rare species are important in bioassessment (Reply to the comment by Marchant). Limnology and Oceanography 44:1841-1842.

Cao, Y., W.P. Williams, and A.W. Bark. 1997. The change of macroinvertebrate communities along a pollution gradient: a framework for developing biotic indices. Water Research 31:805-813.

Cao, Y., D.D. Williams, and D.P. Larsen. 2002. Comparison of biological communities: the problem of sample representativeness. Ecological Monographs 72:41-56.

Cao, Y., D.D. Williams, and N.E. Williams. 1998. How important are rare species in community ecology and bioassessment. Limnology and Oceanography 43:1043-1049.

Carter, J.L., and V.H. Resh. 2001. After site selection and before data analysis: sampling, sorting, and laboratory procedures used in stream benthic macroinvertebrate monitoring programs by USA state agencies. Journal of the North American Benthological Society 20:658-682.

Chessman, B.C. 1999. Predicting the macroinvertebrate faunas of rivers by multiple regression of biological and environmental differences. Freshwater Biology 41: 747-757.

Clements, W.H. 2004. Small-scale experiments support causal relationships between metal contamination and macroinvertebrate community responses. Ecological Applications 14:954-967.

Coutemanch, D.L. 1996. Commentary on the sub-sampling procedures used for rapid bioassessments. Journal of the North American Benthological Society 15:381-385.

de Zwart, D., S. D. Dyer, L. Posthuma, and C. P. Hawkins. 2006. Use of predictive models to attribute potential effects of mixture toxicity and habitat alteration on the biological condition of fish assemblages. Ecological Applications 16:1295-1310.

Doberstein, C.P., J.R. Karr, and L.L. Conquest. 1999. The effect of fixed-count subsampling on macroinvertebrate biological monitoring in small streams. Freshwater Biology 44:355-366.

Fore, L.S., and J.R. Karr. 1996. Assessing invertebrate responses to human activities: evaluating alternative approaches. Journal of the North American Benthological Society 15:212-231.

Furse, M.T. 2000. The application of RIVPACS procedures in headwater streams – an extensive and important national resources Page 72-79 in J.F. Wright, D.W. Sutcliffe, and M.T. Furse, editors. Assessing he biological quality of fresh waters: RIVPACS and other techniques. Freshwater Biological Association, The Ferry House, Far Sawrey, Cumbria, England.

GAO. 2002. Water quality: inconsistent state approaches complicate nation’s efforts to identify its most polluted waters. GAO-02-186. United State General Accounting Office, 441 G. Street NW, Washington, DC 20548.

Gerritsen, J. 1995. Additive biological indices for resource management. Journal of the North American Benthological Society 14: 451-457.

Growns, J.E., B.C. Chessman, J.E. Jackson, and D.G. Ross. 1997. Rapid assessment of Australian rivers using macroinvertebrates: cost and efficiency of 6 methods of sample processing. Journal of the North American Benthological Society 16:682-693.

Hawkins, C. P. 2006. Quantifying biological integrity by taxonomic completeness: its utility in regional and global assessments. Ecological Applications 16:1277-1294.

Hawkins, C. P., Y. Cao, and B. Roper. 2010. Method of predicting reference condition biota affects the performance and interpretation of ecological indices. Freshwater Biology DOI: 10.1111/j.1365-2427.2009.02357.x.

Hawkins, C. P., J. R. Olson, and R. A. Hill. 2010. The reference condition: predicting baselines for ecological and water-quality assessments. Journal of the North American Benthological Society 29:312-358.

Hawkins, C.P., and R.H. Norris. 2000. Effects of taxonomic resolution and use of subsets of the fauna on the performance of RIVPACS-type models. Pages 217-228 in J.F. Wight, D.W. Sutcliffe, and M.T. Furse, editors. Assessing the biological quality of fresh waters: RIVPACS and other techniques. Freshwater Biological Association, Ambleside, Cumbria, UK.

Hawkins, C.P., R.H. Norris, J.N. Hogue, and J.W. Feminella. 2000a. Development and evaluation of predictive models for measuring the biological integrity of streams. Ecological Applications 10:1456-1477.

Hawkins, C.P., R.H. Norris, J. Gerritsen, R.M. Hughes, S.K. Jackson, R.K. Johnson, R.J. Stevenson. 2000b. Evaluation of the use of landscape classifications for the prediction of freshwater biota: synthesis and recommendations. Journal of the North American Benthological Society 19:541-556.

Heinz Center 2002. The State of the nation's Ecosystems: measuring the lands, waters, and living Resources of the United States. The H. John Heinz III Center for Science, Economics and the Environment, 1001 Pennsylvania Ave, NW Suite 735 South, Washington, DC 20004.

Hewlett, R. 2000. Implications of taxonomic resolution and sample habitat for stream classification at a broad geographic scale. Journal of the North American Benthological Society 19:352-361.

Hill, B.H., A.T. Herlihy, P.R. Kaufmann, R.J. Stevenson, F.H. McCormick, and C.B. Johnson. 2000. Use of periphyton assemblage data as an index of biotic integrity. Journal of the North American Benthological Society 19:50-67.

Hill, B.H., A.T. Herlihy, P.R. Kaufmann, S.J. DeCelles, M.A. Vander Borgh. 2003. Assessment of streams of the eastern United States using a periphyton index of biotic integrity. Ecological Indicators 2:325-338.

Hill, B.H., R.J. Stevenson, Y.D. Pan, A.T. Herlihy, P.R. Kaufmann, and C.B. Johnson. 2001. Comparison of correlations between environmental characteristics and stream diatom assemblages characterized at genus and species levels. Journal of the North American Benthological Society 20:299-310.

Hilsenhoff, W.L. 1987. An improved biotic index of organic stream pollution. The Great Lakes Entomologist 20:31-39.

Hoang, H., F. Recknagel, J. Marshall, and S. Choy. 2001. Predictive modeling of macroinvertebrate assemblages for stream habitat assessments in Queensland (Australia). Ecological Modeling 146:195-206.

Houston, L., M.T. Barbour, D. Lenat, and D. Penrose. 2002. A multi-agency comparison of aquatic macroinvertebrate-based stream bioassessment methodologies. Ecological Indicators 1:279-292.

Hughes, R.M., D.P. Larsen, and J.M. Omernik. 1986. Regional reference sites: A method for assessing stream potentials. Environmental Management 10:629-635.

Hughes, R.M., S.G. Paulsen, and J.L. Stoddard. 2000. EMAP-Surface Water: a multiassemblage, probability survey of ecological integrity in the USA. Hydrobiologia 422/423:429-443.

Johnson, R.K. 2003. Development of a prediction system for lake stony-bottom littoral macroinvertebrate communities. Archiv für Hydrobiologie 158:517-540.

Johnson, R.K, T. Wiederholm, and D.M. Rosenberg. 1993. Freshwater biomonitoring using individual organisms, populations, and species assemblages of benthic macroinvertebrates. Page 40-158 in D.M. Rosenberg and V.H. Resh, editors. Freshwater biomonitoring and benthic macroinvertebrates. Chapman and Hall, London.

Joy, M.K., and R.G. Death. 2000. Development and application of a predictive model of riverine fish community assemblages in the Taranaki region of the North Island, New Zealand. New Zealand Journal of Marine and Freshwater Research 34:241-252.

Joy, M.K., and R.G. Death. 2002. Predictive modeling of freshwater fish as a biomonitoring toll in New Zealand. Freshwater Biology 47:2261-2275.

Joy, M.K., and R.G. Death. 2003. Biological assessment of rivers in the Manawatu-Wanganui region of New Zealand using a predictive macroinvertebrate model. New Zealand Journal of Marine and Freshwater Research 37:367-379.

Joy, M.K., and R.G. Death. 2004. Predictive modeling and spatial mapping of freshwater fish and decapod assemblages using GIS and neural networks. Freshwater Biology 49:1036-1052.

Karr, J.R. 1981. Assessment of biotic integrity using fish communities. Fisheries 6(6): 21-27.

Karr, J.R. 1991. Biological integrity: a long-neglected aspect of water resource management. Ecological Applications 1:66-84.

Karr, J.R., and E.W. Chu. 1999. Restoring life in running waters: better biological monitoring. Island Press, Washington, D.C.

Karr, J.R., and E.W. Chu. 2000. Sustaining living rivers. Hydrobiologia 422/423:1-14.

King, R., and C. Richardson. 2002. Evaluating subsampling approaches and macroinvertebrate taxonomic resolution for wetland bioassessment. Journal of the North American Benthological Society 21:150-171.

Kolkwitz, R., and M. Marsson. 1908. Okologie der pflanzlichen saprobien. Berichte der Deutschen botanischen Gesellschaft 26a: 505-519. (Translated 1967. Ecology of plant saprobia, pp. 47-52 in L.E. Kemp, W.M. Ingram, and K.M. Mackenthum, editors. Biology of water pollution, Federal Water Pollution Control Administration, Washington, DC.

Lazorchak, J. M., D.K. Averill, D. J. Klemm, and D. V. Peck (editors). 2000. Environmental Monitoring and Assessment Program – Surface waters: field operations and methods for measuring the ecological condition of non-wadeable streams. U.S. Environmental Protection Agency, Cincinnati, Ohio.

Lenat, D.R. 1993. A biotic index for the southeastern United States: derivation and list of tolerance values, with criteria for assigning water-quality rating. Journal of the North American Benthological Society 12:279-290.

Linke S., R. H. Norris R.H., D. Faith, and D. P. Stockwell. 2005. ANNA: A new prediction method for bioassessment programs. Freshwater Biology 50:147-158.

Marchant, R. 1999. How important are rare species in aquatic community ecology and bioassessment? A comment on the conclusions of Cao et al. 1999. Limnology and Ocenography. 44:1840–1841.

Marchant, R. 2002. Do rare taxa have any place in multivariate analysis for bioassessment. Journal of the North American Benthological Society. 21:311-313.

Marchant, R., L.A. Barmuta, and B.C. Chessman. 1995. Influence of sample quantification and taxonomic resolution on the ordination of macroinvertebrate communities from running waters in Victoria, Australia. Marine and Freshwater Research 46:501-506.

Marchant, R., A, Hirst, R.H. Norris, R. Butcher, L. Metzeling, and D. Tiller. 2002. Classification and prediction of macroinvertebrate communities from running waters in Victoria, Australia. Journal of the North American Benthological Society 16:664-681.

Metcalfe, J.L. 1989. Biological water quality assessment of running water based on macroinvertebrate communities: history and present status in Europe. Environmental Pollution 69:101-139.

Moss, D., M.T. Furse, J.F. Wright, and P.D. Armitage. 1987. The prediction of the macroinvertebrate fauna of unpolluted running-water sites in Great Britain using environmental data. Freshwater Biology 17:41-52.

Nijboer, R.C., and A. Schmidt-Kloiber. 2004. The effect of excluding taxa with low abundances or taxa with small distribution ranges on ecological assessment. Hydrobiologia 516:347-363.

Norris, R.H., and C.P. Hawkins. 2000. Monitoring river health. Hydrobiologia 435:5-17.

Norton, S.B., S.M. Cormier, G.W. Suter II, B. Subramanian, E. Lin, D. Altfater, and B. Counts. 2000. Determining probable causes of ecological impairment in the Little Scioto River, Ohio, USA: Part 1. Listing candidate causes and analyzing evidence. Environmental Toxicology and Chemistry 21:1112-1124.

Norton, S.B., S.M. Cormier, M. Smith, and C. Jones. 2001. Can biological assessments discriminate among types of stress? a case study from the Eastern Corn Belt Plains Ecoregion. Environmental Toxicology and Chemistry 21:1112-1124

NRC (National Research Council). 2001. Assessing the TMDL Approach to Water Quality Management. National Academy Press, Washington, D.C.

Omernik, J.M. 1987. Ecoregions of the conterminous United States. Annals of the Association of American Geographers 77:118-125.

Ostermiller, J.D., and C.P. Hawkins. 2004. Effects of sampling error on bioassessments of stream ecosystems: application to RIVPACS-type models. Journal of the North American Benthological Society 23:363-382.

Pan, Y., R. J. Stevenson, B. H. Hill, A. T. Herlihy, and C. B. Collins. 1996. Using diatoms as indicators of ecological conditions in lotic systems: A regional assessment. Journal of the North American Benthological Society 15: 481-494.

Parson, M., and R.H. Norris. 1996. The effect of habitat-specific sampling on biological assessment of water quality using a predictive model. Freshwater Biology 36:419-434.

Plafkin, J.L., M.T. Barbour, K.D. Porter, and R.M. Hughes. 1989. Rapid bioassessment protocols for use in streams and rivers: benthic macroinvertebrates and fish. EPA/440/4-89/001, US-EPA, Washington, DC.

Rapport, D. J., H. A. Regier and T. C. Hutchinson. 1985. Ecosystem behavior under stress. The American Naturalist, 125:617-640.

Reynoldson, T.B., R.C. Bailey, K.E. Day, and R.H. Norris. 1995. Biological guidelines for freshwater sediment based on Benthic Assessment of SedimenT (the BEAST) using a multivariate approach for predicting biological state. Australian Journal of Ecology 20:198-219.

Reynoldson, T. B., R. H. Norris, V. H. Resh, K. E. Day, and D. M. Rosenberg. 1997. The reference condition: a comparison of multimetric and multivariate approaches to assess water-quality impairment using benthic macroinvertebrates. Journal of the North American Benthological Society 16:833-852.

Reynoldson, T.B., D.M. Rosenberg, and V.H. Resh. 2001. Comparison of models predicting invertebrate assemblages for biomonitoring in the Fraser River catchment, British Columbia. Canadian Journal of Fisheries and Aquatic Sciences 58:1395-1410.

Reynoldson, T.B. and J. F. Wright. 2000. The reference condition: problems and solutions. Pages 293-303 in J. F. Wright, D. W. Sutcliffe, and M. T. Furse, editors. Assessing the biological quality of fresh waters: RIVPACS and other techniques. Freshwater Biological Association, Ambleside, Cumbria,UK.

Rosenberg, D.M., and V.H. Resh. 1993. Freshwater biomonitoring and benthic macroinvertebrates. Chapman and Hall, New York.

Schindler, D. W. 1987. Detecting ecosystem response to anthropogenic stress. Canadian Journal of Fisheries and Aquatic Sciences. 44(S1):6-25

Schindler, D.W. 1990. Experimental perturbations of whole lakes as tests of hypotheses concerning ecosystem structure and function. Oikos 57:25-41.

Somers, K.M., R.A. Reid, and S.M. David. 1998. Rapid biological assessments: how many animals are enough? Journal of the North American Benthological Society 17:348-358.

Smith, M.J., W.R. Kay, D.H.D Edward, P.J. Papas, K. ST-J Richardson, J.C. Simpson, A.M. pinder, D.J. Cale, P.H. Horwitz, J.A. Davies, F.H. Yung, R.H. Norris, and S.A. Halse. 2001. AusRivAS: using macroinvertebrates to assess ecological condition of rivers in Western Australia. Freshwater Biology 41:269-282.

Spitz, F., and S. Lek. 1999. Environmental impact prediction using neural network modeling: an example in wildlife damage. Journal of Applied Ecology 36:317-326.

Stevens, D.L. 1994. Implementation of a National Monitoring Program. Journal of Environmental Management 42:1-29.

Stewart-Oaten, A., and J.R. Bence. 2001. Temporal and spatial variation in environmental impact assessment. Ecological Monographs 71:305-339.

Stoddard, J. L., P. Larsen, C. P. Hawkins, R. K. Johnson, and R. H. Norris. 2006. Setting expectations for the ecological condition of running waters: the concept of reference condition. Ecological Applications 16:1267-1276.

Suter II, G.W., S.B. Norton, S.M. Cormier. 2002. A methodology for inferring the causes of observed impairments in the aquatic ecosystems. Environmental Toxicology and Chemistry 21:1101-1111.

Turak, E., L.K. Flack, R.H. Norris, J. Simpson, and N. Waddell. 1999. Assessment of river condition at a large spatial scale using predictive models. Freshwater Biology 41:283-298.

Turak, E., and K. Koop. 2003. Use of rare macroinvertebrate taxa and multiple-year data to detect low-level impacts in rivers. Aquatic Ecosystem Health and Management 6:167-175.

Van Sickle, J., D. P. Larsen, and C. P. Hawkins. 2007. The effects of excluding rare taxa on the performance of RIVPACS-type predictive models. Journal of the North American Benthological Society 26:319-331.

Vinson, M.R., and C.P. Hawkins. 1996. Effects of sampling area and subsampling procedure on comparisons of taxa richness among stream. Journal of the North American Benthological Society 15:392-399.

Waite, I.R., A.T. Herlihy, D.P. Larsen, N.S. Urquhart, D.J. Klemm. 2004. The effects of macroinvertebrate taxonomic resolution in large landscape bioassessments: an example from the Mid-Atlantic Highlands, U.S.A. Freshwater Biology 49:474-489.

Walsh, C.J. 1997. A multivariate method for determining optimal subsample size in the analysis of macroinvertebrate samples. Marine and Freshwater Research 48:241-248.

Washington, G.H. 1984. Diversity, biotic and similarity indices: a review with special relevance to aquatic ecosystems. Water Research 18:653-694.

Wright, J.F. 1995. Development and use of a system for predicting the macroinvertebrate fauna in flowing waters. Australian Journal of Ecology 20:181-197.

Wright, J.F., D.W. Sutcliffe, and M.T. Furse., 2000. Assessing the biological quality of fresh waters: RIVPACS and other techniques. Freshwater Biological Association, Ambleside, Cumbria, England.