Statistical Evaluation of Riparian Land Use on Virginia Water Quality

Victoria Graf
Thomas Jefferson High School for Science and Technology

Abstract

Careful land management in riparian areas can serve as a valuable tool in the mitigation of pollution in Virginia waterways. Prior research has shown that variation in the composition of riparian vegetation can lead to disparities in filtering non-point source pollution, but disputed which natural cover is most effective. This empirical study evaluates the interaction of the types and diversity of riparian land use and observed pollutant levels around Virginia waterways for three pollution types chosen to provide a holistic view: lead sediments, E. coli and enterococci bacteria, and nitrogen and phosphorous nutrients. Specifically, three model types are considered using data from the Probabilistic Monitoring Program of the Virginia Department of Environmental Quality. The first two models use linear regression to assess the association of riparian land use, site attributes, and point pollution sources on absolute and log levels of each pollutant. The third model uses a logit framework treating pollution as a dichotomous variable reflecting whether measured pollution reached regulatory thresholds. These analyses revealed no single best land use as a policy prescription for Virginia planners. Forest land use contributed to lower lead sediments. While wetlands are typically considered sinks for nutrients, this study showed the opposite for phosphorus. In fact, both natural and agricultural land uses were associated with increased phosphorus nutrients. Unlike prior studies, this study found no significant relationship between non-agricultural land use and bacterial counts. Additionally, the diversity of land use was associated with pollutant levels for lead sediments and nutrients, but not bacteria measures.

Introduction

The purpose of the study is to help understand how land use in riparian zones can mitigate pollution in Virginia waterways. Prior research has shown that variation in the composition of the riparian vegetation can lead to disparities in filtering non-point source pollution. However, researchers disagree on which natural covers and vegetation are most effective. Further, there exists very little research on how the diversity of land use affects pollution levels. This empirical research will address and evaluate the effectiveness of both the types and diversity of natural and artificial land use surrounding Virginia waterways.

Three criteria were chosen to provide a holistic view of pollution: metals, bacteria, and nutrients. Lead was chosen as the investigated metal because it is listed as an Environmental Protection Agency (EPA) priority pollutant. For bacteria, this research used E. coli and enterococci because they have been identified as key pollution indicators for the EPA’s Recreational Water Quality Criteria. Total nitrogen and phosphorous levels were used in the nutrient criteria to study the eutrophication of Virginia waterways.

Numerous prior scientific studies have considered the interaction of landscape factors with measures of surface water quality. This interaction can be complicated when considering non-point pollution including runoff from agricultural and urban areas. Thus, land use near waterways can contribute to rather than mitigate the delivery of pollution.

Prior studies have found substantial variation in the effects of riparian areas on water quality depending on the types of measured pollutants, size and composition of riparian area vegetation, and land use in the area. Natural land covers, particularly forests and natural grasses, have been found to effectively trap sediment [2]. Various studies have shown that forest and grass riparian areas could reduce sediment runoff by up to 60 to 90 percent. However, these same studies found that the measured ability of riparian areas to mitigate sediment runoff is highly dependent on the extent and type of runoff. Riparian areas can become saturated and ineffective for further pollution mitigation. In contrast to the mitigation effects of forest land use, agricultural pasture and crop lands have been directly associated with increased bacteria and nutrient levels [5, 9].

Measuring Diversity of Land Use

To measure the diversity of land use, entropy was calculated for land use in each watershed. Entropy, as defined by Shannon, is a probabilistic measure of chaos and complexity often used as a diversity index, like that of species diversity (1948). The common form of the equation for entropy is

where pi is the relative occurrence of land use i,

and n is the number of possible land uses. Note that this index measure is not normalized to the range 0 to 1, for which the entropy level would be divided by the log of n or, alternatively, divided by the maximum observed diversity level, Hmax.

As seen in this equation, entropy is inversely related to how evenly land uses (or other signals) are distributed. In other words, it captures an index of the rarity or commonness of land uses. If land use was completely evenly spread, land use would be unpredictable and entropy would be high. Conversely, low entropy would imply high concentration of land use among one or a few dominant land uses. Note that entropy does not distinguish which land use is dominant. It captures information that is complementary to direct observation of the relative levels of each land use. Thus, this study includes land use entropy as an additional measure in conjunction with direct land use variables.

Hypotheses

This study focuses on two main areas of empirical hypotheses. In the evaluation of land use effectiveness in pollution mitigation, the first pair of hypotheses center on which land uses were most closely associated with reduced measured pollutants. Among natural land covers, forest covers were expected to be most effective because of the extensive root systems of trees that can affect both surface and subsurface water runoff. However, due to the complicated interactions between land use and water quality, agricultural land uses, though vegetated, are expected to contribute to rather than mitigate measured pollution levels. An important contribution of this research is the direct incorporation of land use diversity. It is expected that more diverse land use would provide multiple methods for the filtering of pollution that can work in conjunction with one another. Thus, high land use diversity is expected to be significantly associated with lower measured pollution.

Materials

This study investigates Virginia water quality using data collected by the Probabilistic Monitoring Program (ProbMon) of the Virginia Department of Environmental Quality (VDEQ). The ProbMon program was initiated in 2001 to provide new information on Virginia waterways to aid in monitoring regulatory compliance and tracking local pollution events. The program was intended to complement the traditional targeted monitoring programs that relied on fixed monitoring sites that are biased toward mountainous ecoregions, locations near bridges, and known point source pollution sources.

The design of ProbMon aimed to provide a randomized survey of identified sites where the findings could be extrapolated to overall Virginia waterway quality with statistical confidence. Since the Strahler stream order was expected to be an important determinant of aquatic conditions, the survey was stratified by stream order to further ensure broad applicability of the results. The ProbMon program collected water samples during spring and fall months between 2001 and 2009. Overall, approximately 60 sites were sampled each year so that the assembled data include roughly 600 total observations.

Figure 1 is a map generated in ArcGIS showing the ProbMon sample sites that are used for this study. For the purposes of illustration, this map includes land use as determined by the U.S. Department of Forestry. The highly localized land use in specific riparian areas around ProbMon sample sites were determined by the VDEQ. Figures 2 and 3 illustrate the variation in the diversity of land use by ProbMon site and a comparison of ProbMon and VDEQ targeted monitoring sites.

Figure 1. Map of ProbMon Sample Sites included in Statistical Analysis.

Figure 2. Map of ProbMon Sample Sites and Fixed Virginia Water Quality Monitoring Sites. Orange dots indicate ProbMon sites while green dots indicate fixed monitoring sites.

Figure 3. Map of ProbMon Sample Sites by Land Use Diversity. Larger dots indicate higher land use diversity.

Extensive data were collected for each ProbMon sample site. Table 1 shows descriptive statistics for the variables of interest for this empircal study. The first set of variables are the specific pollution measures for metals, bacteria, and nutrients. Lead was chosen because of its EPA designation as a priority pollutant used in the testing and regulation of waterway pollution [4]. As shown in Table 1, lead was found in sediments in the majority of ProbMon samples. However, most of these sites had low lead levels. In order to evaluate the levels of lead found, this study relied on lead sediment thresholds obtained from interim guidelines published by the Canadian Council of Ministers of the Environment (CCME). Based on these guidelines, a threshold of 35,000μg/kg dry weight was set for identification of high lead findings (CCME, n.d.). Thus, only a fraction of ProbMon samples in the top quartile exhibited high lead levels.

The study used E.coli and enterococci colony forming units (cfu) to identify bacteria pollution. Enterococci measures were chosen for this study because the EPA’s current guidelines for indicators of fecal contamination for water quality now favor enterococci and E. coli measurements instead of prior emphasis on fecal coliform bacteria [3]. For this study, thresholds for hazardous bacteria levels were chosen to match the EPA recreational water quality criteria recommendations for maximum geometric mean of sampled levels. These thresholds were 126 cfu / 100mL for E. coli, and 35 cfu / 100mL for enterococci [3].

Nitrogen and phosphorus were chosen to represent nutrient pollution because it is of increasing concern that runoff of fertilizer made primarily of these elements is jeopardizing aquatic life through the eutrophication of waterways. The VDEQ guidelines for optimal measured levels of total nitrogen and phosphorus levels were 1 mg/L and 0.02 mg/L, respectively [8]. Note that both nitrogen and phosphorous are naturally found in waterways absent of pollution. Thus, optimal nitrogen and phosphorus levels cannot be zero in order to account for natural sources of the nutrients.

To control for conditions around the sample sites, the study included several control variables summarized in Table 1 for watershed characteristics and known sources of point pollution. Since this study focused on the effects of land uses on mitigation of non-point source pollutants, it is important to account for site-level variation that arises from these point sources. The point pollution sources are based on permits issued by the Virginia Pollutant Discharge Elimination System (VPDES) to all industrial and municipal major and minor polluters. The distinction between major and minor permits is determined by the quantity and content of pollution discharged. The point pollution sources potentially impacted only a relatively small fraction of sites, and the majority of sites were not associated with known point pollution sources. For this study, the counts of minor municipal and industrial permits were combined into a single measure due to substantial collinearity that is explained later in this paper. Watershed characteristic controls also include road density and population density.

Land use was measured using several variables that reflect the percentage of land within 30 meters of the sample site that consists of a given land use. Virginia is a highly forested state. As shown in Table 1, nearly 67% of land use near ProbMon sites was forested. Agricultural pasture, urban land cover, and wetlands were also prominent. In contrast, many land uses such as shrubland and man-made barren were uncommon.

Not all ProbMon data observations could be used for statistical analysis. The data included extensive notes regarding potential data contamination, data reliability, observations where the value reflected only the measurement threshold of the sampling test, and whether resamples were taken. Some observations were simply noted as not meaningful, or that the sample was held beyond the standard holding time. Based on these data collection notes, unreliable data points were omitted from the analysis. As a result, statistical analysis for each pollutant was conducted on a subset of observations. The descriptive statistics shown in Table 1 for each pollutant reflect only those observations included in the statistical study.

Methods

For this study, three distinct regression models were employed to consider different mechanisms of land use and diversity effects on measured pollutants. The first two models were linear regressions and the final one utilized a logistic framework. Model 1 was a linear regression in levels as specified in the equation for site i and land use type k:

The term Xi includes the site controls and ei is an error term. This model exhibits constant level effects for each independent variable so that unit changes in the independent variables were associated with unit changes in the dependent variable. An alternative model including squared terms was considered to test for additional nonlinearity. However, this extension was not found to be useful and was omitted from the final analysis.

Model 2 used log levels of the dependent variable. Coefficients from this model should be interpreted as a constant percentage change in the dependent variable for each unit change in a given independent variable.

Model 3 was a logistic regression that treated pollution levels as dichotomous of whether pollution reached a governmental guideline threshold discussed previously. It models change in the odds of pollution event for each independent variable.

Logistic regressions can be used when the dependent variable is dichotomous, such as a binary variable (e.g., polluted=1, not polluted=0). It transforms the dependent variable into a continuous function of the odds-ratio of the event. The formula for a logistic regression is:

Estimated coefficients reflect the change in log odds so that e^Bk reflects the estimated change in odds ratio for a unit change in xk . Here, dependent variables were converted to binary with a value of one indicating pollution reaching the governmental threshold for each pollutant.

Multicollinearity

Multicollinearity is a statistical problem that can lead to imprecise coefficient estimates and potentially misleading statistical inference. It occurs when two or more of the independent variables in a regression are so highly related that a linear function the other independent variables can predict these variables with high accuracy. Due to the high relation between these independent variables, the regression cannot separately identify the contributions of each variable, affecting the precision of coefficient estimates.

The ProbMon survey data present multiple sources of multicollinearity. First, the percentage land use variables add to 100% by definition. In regression analysis, it is necessary to omit one land use variable. For this study, the regressions omitted urban land use so that urban land use reflects the base case. Each land use coefficient should be interpreted as the effect of that land use relative to urban land use.

As noted earlier, VPDES permits were very highly correlated between municipal and industrial minor point sources. It would not be possible to precisely estimate separate effects between these sources. To avoid multicollinearity, municipal and industrial minor point sources were combined into one variable for the regressions.

Table 2. Correlation of Land Use Variables between Adjacent and 30 Meter Areas.

The final source of multicollinearity arises in measures of land use by distance from the survey site. The ProbMon data includes land use measures for the adjacent (i.e, zero meters), 30 meter, and 120 meter riparian areas. These measures are highly correlated because the land use in the 30 meter area is closely related to the land use within 120 meters, which includes the 30 meter area. Table 2 illustrates this high correlation for the adjacent and 30 meters areas. For each land use, the correlations exceed 0.9 for variables regarding the same land use in different area distances. Due to the high multicollinearity of the land use variables by distance, only 30 meter areas were included in the regression.

Results

The results of the lead sediment pollution regressions analysis are summarized in Table 3. Model 1 showed no statistically significant association between land uses and lead levels. However, lower lead was associated with higher land use diversity. In Model 2, forest land use was statistically significant. A 1 percentage point increase in forest land use was associated with a 0.013% decrease in measured lead. Similar to Model 1, higher land use diversity was associated with lower measured lead. In Model 3, neither land use nor its diversity were statistically significant.

As shown in Table 4, riparian land use and land use diversity were not generally associated with the bacteria levels in any of the models. However E. coli levels were higher by a statistically significant amount near agricultural pasture in Models 1 and 2.

Nutrient levels were significantly associated with natural land uses. As shown in Table 5, shrubland was associated with lower nitrogen while forests and wetlands were associated with higher phosphorus levels. Nutrient levels were elevated by a statistically significant amount near agricultural pasture and crop land. Higher land use diversity was associated with lower measured nutrient levels.

Conclusions

The study found no single best land use to propose as a policy prescription for Virginia planners. Different land uses were significantly associated with each of the measured pollutants. Consistent with the hypothesis, the study found forest land use contributed to lower lead sediments. However, while wetlands and forests would typically be considered sinks for nutrients, this study showed the opposite for phosphorus, as shown by the positive coefficients. In fact, both natural and agricultural land uses were associated with increased phosphorus nutrients, although agriculture contributed to the pollution more strongly. Unlike prior studies of land use association with bacterial pollutants, this study found no significant relationship between land use and bacterial counts.

As expected in the hypothesis, this study found a statistically significant relationship between the diversity of watershed land use and measured lead sediment and nutrient pollutants. However, this relationship was not found for bacteria levels because these levels are associated with particular land uses rather than the variety of land use. Thus, a goal of higher land use diversity to provide multiple filtering methods would not be effective against bacteria pollution.

Even with precautions taken to ensure reliability during the course of analysis, certain sources of potential error were identified. There may have been unobserved site factors affecting point and nonpoint source pollution. Additionally, different data collectors were used across locations and over time causing potential data inconsistencies. Possible improvements include controlling for seasonality in monitoring, notably in sampling non-point source nutrients. More precise and better documented quality thresholds could also improve the experiment. Finally, the study could use a larger sample size that includes or controls for targeted water quality monitoring stations throughout Virginia.

This study can be further extended in several ways. Specific plant species or groups in riparian areas could be included in the controls. Bioavailability of metal sediments could be measured using Acid Volatile Sulfide/ Simultaneously Extracted Metals (AVS/SEM) methods, which would provide a more general view of metal pollution than any one metal. Potential upstream pollution could be controlled for using watershed data. Finally, panel data could be assembled to control for riparian area evolution.

References

[1] Canadian Council of Ministers of the Environment. (n.d.). Sediment quality guidelines for the protection of aquatic life.

[2] Environmental Protection Agency. (1995, August). Water quality functions of riparian forest buffer systems in the Chesapeake Bay watershed (Report No. EPA-903-R-95-004).

[3] Environmental Protection Agency. (2012, December). 2012 recreational water quality criteria (Publication No. EPA-820-F-12-061).

[4] EPA Appendix A to Part 423—126 Priority Pollutants, 40 C.F.R. § 423 (1974).

[5] Nash, M., Heggem, D., Ebert, D., & Hall, R. (2008). Multi-scale landscape factors influencing stream water quality in the state of Oregon. Environmental Monitoring and Assessment, 156(1), 343-360. http://dx.doi.org/10.1007/s10661-008-0489-x

[6] Shannon, C. E. (1948). A Mathematical Theory of Communication. Bell System Technical Journal, 27(3), 379-423.

[7] Virginia Cooperative Extension. (2009, May). Understanding the science behind riparian forest buffers: effects on water quality (J. C. Klapproth & J. E. Johnson, Authors).

[8] Virginia Department of Environmental Quality. (2014). Virginia water quality assessment 305(b)/303(d) integrated report 2014.

[9] Zhu, W., Graney, J., & Salvage, K. (2008). Land-Use Impact on Water Pollution: Elevated Pollutant Input and Reduced Pollutant Retention. Journal of Contemporary Water Research & Education, (139), 15-21.

Nov 2 Riparian Land Use: Evaluation

Statistical Evaluation of Riparian Land Use on Virginia Water Quality

Abstract