Factors challenging our ability to detect long-term trends in ocean chlorophyll

C. Beaulieu, S. A. Henson, J. L. Sarmiento, J. P. Dunne, S. C. Doney, R. R. Rykaczewski, and L. Bopp Program in Atmospheric and Oceanic Sciences, Princeton University, Princeton, New Jersey, USA National Oceanography Centre, Southampton, UK Geophysical Fluid Dynamics Laboratory, Princeton, New Jersey, USA Woods Hole Oceanographic Institution, Woods Hole, Massachusetts, USA Marine Science Program and Biological Sciences Department, University of South Carolina, Columbia, South Carolina, USA Laboratoire des Sciences du Climat et de l’Environnement, Gif sur Yvette, France


Introduction
Global climate change is predicted to alter the ocean's biological productivity with implications for fisheries and climate.Results of coupled physical-biogeochemical models are sometimes inconsistent in their estimate of the magnitude and location of changes in marine primary production, depending on the region (e.g.Steinacher et al., 2010).Long-term (100 yr), rapidly declining trends in phytoplankton have been suggested through examination of shipboard measurements (Boyce et al., 2010).However, several authors have contested the methodology and implications of this study, some arguing that the long-term trend detected is an artifact of changes in the measurement techniques (Rykaczewski and Dunne, 2011;Mackas, 2011), and others reporting increases in chlorophyll a (hereafter chlorophyll) concentrations for regions that have been studied with consistent sampling methods over multi-decadal scales (e.g.Karl et al., 2001;Corno et al., 2007;Aksnes and Ohman, 2009; C. Beaulieu et al.: Factors challenging our ability to detect long-term trends in ocean chlorophyll Lomas et al., 2010;Saba et al., 2010;McQuatters-Gollop et al., 2011).
Satellite-derived ocean color and temperature data allow comprehensive estimates of the global distribution of ocean primary productivity, estimated from data provided by the Coastal Zone Color Scanner (CZCS), Ocean Color and Temperature Sensor (OCTS), Sea-viewing Wide Field-of-view Sensor (SeaWiFS), Moderate Resolution Imaging Spectroradiometer (MODIS) and Medium Resolution Imaging Spectrometer (MERIS) ocean color instruments.These data sets have allowed many scientific advances over the past decades as illustrated, for example, in McClain (2009).The CZCS sensor generated the first satellite ocean color data from November 1978 to June 1986; although focused primarily on coastal regions, CZCS also provided a picture of global patterns.However, it is not possible to determine trends from the CZCS record; the mission was a proof-of-concept, and thus the sensor was not continuously validated and suspected to drift after the first year of operation (Hooker and McClain, 2000;NRC, 2004).Ten years later the OCTS was launched and operated from July 1996 to June 1997.SeaWiFS became operational in September 1997 and remained remarkably stable for a decade, offering new opportunities for ocean biogeochemistry and climate research.Nevertheless, SeaWiFS began having telemetry problems in January 2008 and failed completely in December 2010.Three other ocean color sensors, MODIS-Terra, MERIS and MODIS-Aqua have been operating since December 1999, March and July 2002, respectively, but MERIS failed in May 2012.Several years of overlap between SeaWIFS and MODIS-Aqua allowed successful cross-calibrations to merge data from the two sensors (e.g.Fargion and McClain, 2003;Maritorena and Siegel, 2005;Pottier et al., 2006;Meister et al., 2012), increasing our potential for the detection of secular trends in ocean chlorophyll.However, MODIS Aqua and Terra are now beyond their operational lifetimes.Another ocean color instrument, the Visible Infrared Imager Radiometer Suite (VIIRS), is currently operational, but the data quality is, as of yet, undetermined.
Several authors have studied these satellite records in order to investigate trends in global ocean chlorophyll concentration and primary productivity (Gregg et al., 2005;Antoine et al., 2005;Behrenfeld et al., 2006;Vantrepotte and Mélin, 2009;Siegel et al., 2013) and to examine natural variability at the interannual and decadal time scales (Yoder and Kennelly, 2003;Martinez et al., 2009).However, none of these studies explicitly considers how the presence of autocorrelation may bias the ability to detect significant trends in the data.In climate time series, the autocorrelation is often represented by a first-order autoregressive process (red noise).The red noise arises from temporal persistence and it roughly approximates internal variability in the climate system in which slower response components such as the ocean and large ice sheets provide memory by responding slowly to a white noise forcing coming from weather systems (Hasselmann, 1976).
Internal variability is also produced by coupled interactions between components of the climate system, such as climate oscillations (Hegerl et al., 2007).There is a great risk of misinterpreting changes in a relatively short time series when red noise is present, as it creates patterns that may be interpreted as trends or shifts with underlying mechanistic causality, but that are generated from a random process (e.g.Wunsch, 1999;Rudnick and Davis, 2003).
Long-term trends are detectable if the signal-to-noise ratio is large enough and a sufficient number of observations are available.Recent studies suggest that climate-change driven trends in satellite ocean color and inferred productivity are not yet distinguishable from red noise (Henson et al., 2010;Yoder et al., 2010).Due to the degree of internal variability in ocean productivity time series, approximately 39 yr of continuous data could be necessary to detect global climatechange-driven trends in ocean chlorophyll concentration and primary production (with a probability of detection of 0.9 and a significance level of 5 %) (Henson et al., 2010).This time frame assumes no interruption in satellite data -an unlikely scenario given the age of the current ocean color satellites (MODIS) and the unproven potential for VIIRS to provide the necessary data quality.The European Space Agency and the European Organisation for the Exploitation of Meteorological Satellites are planning the launch of the Ocean and Land Colour Instrument (OLCI) in 2013.If this is unsuccessful or MODIS fails before OLCI is operational, a discontinuity due to the change of instrument in the time series could seriously inhibit our ability to detect trends in ocean chlorophyll and productivity.Similarly, when OLCI exceeds its lifetime, a gap before launching a future satellite would again affect our ability to detect trends.Additional satellites have been launched or planned, but if the measurements are not made available for cross-calibration, the issue of potential discontinuity remains.Discontinuity can be introduced in the satellite records when a change of instrument occurs without an overlapping period during which the sensors in orbit may be cross-calibrated.While not ideal, the discontinuity due to a change of sensor might be estimated with some degree of uncertainty even without a period of overlap through careful calibration in orbit.However, with a period of overlap in orbit, discontinuities between sensors could be more accurately characterized through cross-calibration.Then, the magnitude and uncertainty of the discontinuity can be incorporated in the regression model used to detect long-term trends.
Discontinuities challenge the detection of trends in climate data since the discontinuity effect represents an additional parameter that must be estimated along with the magnitude of the trend, and therefore, longer time series of observations are required to achieve the same level of statistical confidence (Box and Tiao, 1975;Tiao et al., 1990;Weatherhead et al., 1998).For example, Weatherhead et al. (1998) estimated that in the worst-case scenario, a discontinuity could increase the number of years of data necessary to detect a linear trend (with a probability of detection of 90 %) by as much as 50 %.Any potential discontinuity in satellite ocean color data must be taken into account in assessing the number of years of observations necessary to distinguish trends from internal variability in ocean chlorophyll and productivity.
The general objective of this study is to investigate the statistical factors that challenge the detection of trends in ocean color data and show why globally, and in most ocean basins, linear trends are not yet distinguishable from red noise (i.e., the internal variability) on a statistical basis.We use generalized least squares regression to detect trends in ocean chlorophyll satellite data and test the hypothesis that those trends detected are not an artifact of red noise.We quantify how a discontinuity in the time series would affect our ability to detect trends in ocean chlorophyll concentration given the observed variability and the expected trends estimated from a range of ocean models.More specifically, we assess how many additional years of satellite data would be needed to detect a trend if the current satellite fails before new satellite data are available.We also quantify how red noise affects the number of years of observations needed to detect trends in ocean chlorophyll concentration.

Data and models
We use monthly mean chlorophyll concentration data covering the January 1998-December 2007 period collected by SeaWiFS (version R2010.0;available at http://oceancolor.gsfc.nasa.gov/)averaged globally and in 14 biomes (shown in Fig. 1).The biome definition separates the regions where phytoplankton growth is seasonally light limited (for mid to high latitudes), regions where the ocean is gaining heat (equatorial regions) and oligotrophic regions (Henson et al., 2010).Observations taken after 2007 were not used as they are not continuous due to intermittent problems with the SeaWiFS instrument.The seasonal cycle was removed from the monthly means by subtracting from each month the mean of all observations taken during the same month for all years.To estimate long-term trends in surface ocean chlorophyll, we use the same three coupled physical-biogeochemical models as presented in Henson et al. (2010): GFDL-TOPAZ (Dunne et al., 2005(Dunne et al., , 2007)), IPSL-PISCES (Aumont and Bopp, 2006) and NCAR-CCSM3 (Doney et al., 2009;Thornton et al., 2009).We estimate the trends in future climate change simulations forced with the IPCC A2 global warming scenario from 2001-2100 (28.9 Gt C yr −1 from fossil fuel emissions by 2100).This scenario represents high cumulative carbon emissions due to human population growth and an increasing gap between the industrialized and developing nations (Nakicenovic et al., 2000).More details of the three model projections and biome definition are presented in Henson et al. (2010).

Trend detection in presence of autocorrelation and discontinuity
A linear temporal trend can be expressed as where y t is the data (chlorophyll concentration) at time t, µ is the intercept, ω is the trend and N t represents the residual noise at time t.This regression model was used in Henson et al. (2010) to represent trends in monthly mean chlorophyll and productivity data.When this model is fitted using ordinary least squares regression (OLS), it is assumed that the residuals are independent (white noise).However, it is often not reasonable to assume that successive observations of monthly chlorophyll concentration are independent from each other since there is memory being carried from month to month (red noise).In the presence of red noise, OLS tends to underestimate the variance and therefore inflates the test statistics on the regression coefficients, so that a trend can appear statistically significant when it is not (e.g.Wunsch, 1999).Technically, we assume that the errors follow a first-order autoregressive process (AR(1)): where φ is the first-order autocorrelation and ε t are normally distributed random errors (white noise) with a mean of zero and a common variance of σ 2 ε .It should be noted that the variance of the white noise process (ε t ) is directly related to the variance of the noise (N t ) by where σ 2 N is the noise variance.In this case, the first-order autoregressive process expresses the strength of the memory www.biogeosciences.net/10/2711/2013/Biogeosciences, 10, 2711-2724, 2013 being carried from one month to the other.The decorrelation time (or the time it will take in months to forget the current state of the process) is a more tangible measure of the strength of the memory and is expressed as a function of the first-order autocorrelation value ((1 + φ) / (1 − φ)).The regression parameters and their associated variance can be estimated using generalized least squares regression (GLS) to account for the presence of autocorrelation in the errors.
To simplify these expressions we use matrix notation here.
In matrix notation, the regression model presented in Eq. ( 1) is where y is the n × 1 data vector, X is a n × 2 design matrix with ones in the first column and the time in the second column, b is a 2 × 1 vector representing the intercept and trend and N is a n × 1 vector representing the noise.The generalized least squares parameters (b) and their variance (V (b)) are given by b where S is the n × n error-covariance matrix.The entries in the error covariance matrix represent the covariance of two errors depending on their separation in time s.For a firstorder autoregressive process, the covariance of two residuals separated by s time units (months in this case) can be expressed as More details on GLS estimation can be obtained in Brockwell and Davis (2002).Statistical analyses of chlorophyll concentrations are sometimes performed on log-transformed data (e.g.Campbell, 1995).However, a log-transformation did not help in stabilizing the variance of the model errors or making them more normally distributed, so we used the untransformed chlorophyll concentration by principle of parsimony.Tiao et al. (1990) developed a simple equation allowing the estimation of the number of observations required to distinguish a trend (with a specified magnitude of ω 0 ) from red noise.The number of observations required depends on the signal-to-noise ratio and on the desired confidence level and power of detection for the test.It can be expressed as where n * is the number of observations required to detect the trend.The magnitude of the trend can be expressed in absolute terms (e.g.changes in chlorophyll concentration in mg m −3 per year) or in relative terms (e.g.changes in percent per year or per decade).The factor 3.3 accounts for a power of detection of at least 0.90 and a significance level of 5 % (or a confidence level of 95 %) (Tiao et al., 1990;Weatherhead et al., 1998).The null hypothesis for the regression is that there is no trend.The significance level is the probability of incorrectly rejecting the null hypothesis when, in fact, it is true (false positive rate).The power of detection is the probability that the regression analysis will reject the null hypothesis when there is a trend (true positive rate).This means that if GLS regression is applied using a 5 % significance level to thousands of series having an approximate length of n * and with the same parameters as presented in Eq. ( 8) (i.e. a lag-1 autocorrelation of φ, a variance of σ 2 N and a trend with a magnitude of ω 0 ), in at least 90 % of the cases the trend would be detected if it exists.For a smaller significance level and/or a larger probability of detection, n * would increase.
If a change of instrumentation or measurement procedures occurs, a discontinuity might be introduced in the observations and should be taken into account in the trend detection.An indicator function representing the effect of the change can be added to the model: where I t is a binary variable having a value of zero before the discontinuity and one after, T 0 represents the number of observations before the discontinuity and δ represents the magnitude (or the effect) of the discontinuity.The fraction of data before the discontinuity can be represented by where n is the total number of observations.Weatherhead et al. (1998) provided an estimate of n * (with a specified trend magnitude of ω 0 ) in the presence of a discontinuity.They showed that n * is larger in comparison to continuous data due to the additional parameter (δ) that needs to be estimated and depends on the time of the discontinuity: It is important to note the distinction between a discontinuity and a gap.In the context of satellite data, a discontinuity would occur if no cross-calibration between one instrument and another was possible.In this case, n * can be estimated using Eq. ( 12).In the presence of a gap only (i.e. the measurements are taken with the same instrument, but there is a gap in the time series), n * can be estimated using Eq. ( 8) as there is no discontinuity effect to estimate.
Furthermore, it must be noted that the timing of the discontinuity (T 0 ) is known since we assume the discontinuity is caused by a change of instrument.However, if the timing of the discontinuity is unknown and needs to be estimated, Eq. ( 12) is not applicable and techniques developed to detect undocumented discontinuities would be required.Such techniques have been developed for in situ climate data for example (e.g.Peterson et al., 1998;Beaulieu et al., 2008).

Estimation of trends and noise in ocean color satellite data and biogeochemical models
To test for the presence of global and regional trends in ocean chlorophyll in SeaWiFS data, we fit the regression model presented in Eq. ( 1) to deseasonalized monthly mean SeaWiFS observations for 1998-2007, globally and in the 14 biomes presented in Fig. 1.We also estimate the first-order autocorrelation and standard deviation (Eqs.2-3) in the ocean chlorophyll concentration from the deseasonalized monthly mean SeaWiFS observations.We estimate the trends predicted by the three ocean biogeochemical models also using the regression model presented in Eq. ( 1).For all the regression analyses, we verify the underlying assumptions of independently distributed normal errors with a constant variance using the Anderson-Darling normality test, the Breusch-Pagan homoscedasticity test and the Durbin-Watson independence test.All the statistical tests are performed using R (R Development Core team, 2008) and using a 5 % significance level.We compute n * for discontinuities at different points in the time series.The range of expected magnitudes in trends is estimated from the annual global and regional means of ocean chlorophyll for the 2001-2100 period in the three model runs described in the data and models section.Furthermore, we compute the number of years necessary to detect a trend for no discontinuity and the multi-model mean trend for different values of autocorrelation and standard deviation to show their effects on trend detection.

Trends in satellite data and in ocean models
Table 1 presents the GLS trends for 1998-2007 in SeaW-iFS satellite chlorophyll concentration for the global mean and average in 14 biomes.Globally and in most biomes, trends are not significant with the exception of the highlatitude North Atlantic (−1.3 % per year) and the Southern Ocean Pacific (0.65 % per year).In all cases, the Durbin-Watson test for the independence of the residual noise is significant, showing the necessity to take into account the autocorrelation in the analysis through GLS.Additional evidence for the presence of red noise is also presented in the Appendix A. Figure 2 presents the standardized (the mean is subtracted from the time series and then the time series is divided by its standard deviation) deseasonalized SeaWiFS chlorophyll concentration monthly means for all the regions.We present the standardized chlorophyll concentration to display all regions on the same scale since they have different ranges of concentrations.For the two biomes with significant trends (high-latitude North Atlantic and Southern Ocean Pacific), the fits are also presented.The deseasonalized SeaW-iFS chlorophyll concentration monthly anomalies (not standardized) for all the regions are presented in the Appendix A.
Table 2 presents the GLS trend estimates of global and biome-specific annual chlorophyll concentration in the three model projections for the period 2001-2100 (Fig. 3).We use output from three different models to give an estimate of the mean and range of possible future trends in surface chlorophyll concentration.Globally, IPSL-PISCES shows a strong and significant decreasing global trend.GFDL-TOPAZ also has a significantly decreasing global trend, but its magnitude is weaker than IPSL-PISCES.The global trend in NCAR-CCSM3 is not significant.In many biogeochemical models, the global trend reflects a balance between decreasing trends in some regions and increases in other regions.Thus, we also analyze the biomes trends that may give a clearer signal than the global mean.In the high-latitude North Atlantic, the IPSL-PISCES and NCAR-CCSM3 models project decreasing trends, while the GFDL-TOPAZ model projects an increasing trend.In the Southern Ocean Pacific, the trends projected by the GFDL-TOPAZ and NCAR-CCSM3 models are increasing.The IPSL-PISCES model projects a decreasing trend for the Southern Ocean Pacific.Whether the trends detected in the high-latitude North Atlantic and in the Southern Ocean Pacific in the SeaWiFS data might represent climate change or decadal variability cannot be answered without a detection and attribution study.Answering this question is even more challenging since the models often do not agree on the sign of the projected trends.Other regions where the three biogeochemical models do not all agree on the sign of the trend are Oligotrophic South Atlantic, Equatorial Pacific, Oligotrophic South Pacific and the Southern Ocean Indian.

Discontinuity and red noise effects on trend detection
Figure 4a presents the number of years of observation necessary to detect a global trend in satellite data for different trend magnitudes and different timing of a discontinuity.The range of the trend magnitude was set according to the global trend estimated from output of the three biogeochemical models.
It can be seen that n * increases with the fraction of data before the discontinuity.A discontinuity that occurs halfway in the time series (same number of observations before and after) has the most negative impact on trend detection (the results are symmetric above and below τ = 0.5).Furthermore, trends of a small magnitude also need more observations to be detectable.
Approximately 27 yr of continuous observations would be needed to identify a trend in globally averaged chlorophyll concentration given the trend magnitude projected by the multi-model mean (−1.53 × 10 −4 mg m −3 yr −1 ) of the three biogeochemical models.This estimate is lower than the 39 yr estimate of Henson et al. (2010), because the latest SeaW-iFS reprocessed version (R2010.0)used here has less variability than the version 5.2 used in the Henson et al. (2010) study.If a discontinuity occurs halfway through the time series, we estimate this would increase n * from 27 to 43 yr.If the "real" global trend in the chlorophyll concentration were best represented by the IPSL-PISCES trend, then it would take approximately 13 yr of observations without discontinuity.On the other hand, if the "real" trend in the chlorophyll concentration is closer to the GFDL-TOPAZ trend, then it would take much longer to detect (from 66 yr of continuous observations to 105 observations with a discontinuity halfway through the time series).The NCAR-CCSM3 trend is not significantly different from zero, thus the estimation of the number of years of observations is not applicable here.
Figure 4b presents the effect of the first-order autocorrelation and standard deviation on the ability to detect trends of the same magnitude as the multi-model mean trend.One can see that strong, positive first-order autocorrelation and high standard deviation seriously inhibit the ability to detect trends.For example, in monthly observations with a standard deviation of 0.02 mg m −3 and a first-order autocorrelation of 0 (each month independent from the others), it would take approximately 25 yr of observations to detect a trend (without discontinuity) with the same magnitude as the multi-model mean trend (−1.53 × 10 −4 mg m −3 yr −1 ).In the presence of large monthly first-order autocorrelation (0.9), it would take approximately 65 yr of observations to detect the same trend.Figure 4b also presents the values observed in global monthly chlorophyll by SeaWiFS: a first-order autocorrelation of 0.52 and standard deviation of 0.01 mg m −3 .The observed satellite autocorrelation corresponds to a decorrelation time of approximately 3 months, which is high enough that it must be taken into account in trend analyses (Fig. A2).
Figure 5 presents n * in satellite data in six different biomes for different trend magnitudes and different times of discontinuity.The biomes were chosen to represent larger and smaller variability (quantified by the autocorrelation and variance of the red noise) and weaker and stronger trend projections.In the Equatorial Atlantic, which has relatively large variability, detecting a trend with the same magnitude as the biome multi-model mean trend would require approximately 54 yr of continuous observations and up to more than 90 yr of observations in presence of a discontinuity.In the Equatorial Pacific, even though the projected multi-model mean trend is smaller than in the Equatorial Atlantic, we would still require fewer years of observations to detect the trend (between 45 continuous years and up to 75 yr in presence of a discontinuity) since the variability is smaller in this biome.In the highlatitude North Pacific, which has large variability, we would require approximately between 37 yr (continuous) to 58 yr (discontinuity halfway) of observations to detect the multimodel mean trend.In the Oligotrophic North Pacific, which has very small variability and a relatively small projected decline, the multi-model mean trend should be detectable with approximately 29 yr of observations (or as much as 45 yr in presence of a discontinuity).The number of observations necessary to detect a trend in the eight remaining biomes is presented in the Appendix A (Fig. A4).

Discussion and conclusion
We have shown that ten-year trends in SeaWiFS ocean chlorophyll are distinguishable from the observed red noise in only two of our fourteen biomes: the high-latitude North Atlantic and the Southern Ocean Pacific (Fig. 1; Table 1).(Nakicenovic et al., 2000).The projections are standardized (the mean is subtracted from the time series and then the time series is divided by its standard deviation) to display concentrations on the same scale.The trends magnitudes are presented in Table 2.
The magnitude and sign of linear trends estimated in ten years of monthly SeaWiFS observations are in agreement with the trends detected by Vantrepotte andMélin (2009) andSiegel et al. (2012), even though the analysis was performed on biome means in the present study.The high-latitude North Atlantic and the Southern Ocean Pacific biomes exhibit strong linear trend signals, but a formal detection and attribution study to distinguish between trends expected from natural variability (e.g.decadal oscillations, natural forcings) and climate change would be necessary to assign causality.Alternatively, a simple methodology that would facilitate the distinction of a long-term trend from a suspected dominant source of variability, such as the El Niño Southern Oscillation (ENSO) for example, could be applied.Such a method might include an ENSO index term in the regression model (Henson et al., 2010) to account for the variability related to ENSO and thus highlight any residual trend that may be attributable to other factors such as climate change.
We discussed the importance of accounting for the red noise in trend analyses of ocean productivity using ocean color satellite data and quantified its effect on the time nec-  2).We present the fraction of data before the discontinuity between 0 and 0.5 only since the results are symmetric.For example, the number of years necessary to detect a trend of the same magnitude will be the same if the discontinuity occurs after 25 % or 75 % of the data were collected.essary to detect a trend signal.In addition, we showed that a discontinuity in the satellite data measurements could have a large negative impact on our ability to understand ocean productivity's response to climate change.We estimate approximately 27 yr of continuous observations are required to detect global trends in surface chlorophyll concentration.
The above presents an idealized scenario.In reality, the scenario could be that the remaining ocean color sensors, MODIS Aqua and Terra, fail during the present year (2012) or next year (2013) before a new satellite is launched, and VIIRS fails to achieve the necessary data quality.During this period, no measurements would be taken, cross-calibration would not be possible and calibration would rely exclusively on in situ observations.Assuming that the real trend in chlorophyll is best represented by the mean trend of the three models, the trend would be distinguishable from the noise only after at least 40 yr (in 2037) of observations.This is because the discontinuity would occur at a time when approximately 43 % (17 yr) of the data were collected (as opposed to 27 yr of continuous observations, if there is no interruption).In this case, the discontinuity would amount to 13 additional years of observations necessary before a trend could be detected.This estimation does not include the duration of the gap (e.g. if the gap lasts two years, a trend would not be detectable until 2039).If the discontinuity occurs during  2).We present the fraction of data before the discontinuity between 0 and 0.5 only since the results are symmetric.For example, the number of years necessary to detect a trend of the same magnitude will be the same if the discontinuity occurs after 25 % or 75 % of the data were collected.
2018 instead, we would need 43 yr of observations, and the trend would not be detectable before 2040.In this case, the discontinuity occurs at a time when approximately 50 % of the data were collected.Overall, it could take an additional 13-16 yr of observations to detect a trend in satellite ocean chlorophyll under the idealized scenario if the two MODIS sensors fail and VIIRS remains of questionable reliability before OLCI is launched.Of course, this could be reduced if a cross-calibration was possible with another instrument that made overlapping measurements for a few years, so that a consistent, continuous time series was available.Other satellites have been launched or are planned, but if the data is not available for cross-calibration, the same problem will occur.Several assumptions were made in this study and our results are valid only if they are reasonable.These assumptions include: 1. Following Henson et al. (2010), we made the assumption that the trends in ocean chlorophyll concentration are linear since the biogeochemical models used projected trends that are approximately linear over time.More development would be necessary in order to detect spatio-temporal trends, nonlinear trends or step-like behavior changes or to assess the number of years of observations required to do so.
2. To assess n * , we used the trends projected by three ocean biogeochemical models coupled with climate models.We used several models to represent the uncertainty associated with the trend magnitude.The different trends in chlorophyll concentration estimated by each of the three models are due in part to their different representations of the ecosystem.In particular, changes in the relative proportions of large (diatoms) versus small phytoplankton can contribute substantially to the magnitude of the estimated climate-change driven trend (Steinacher et al., 2010).Diatoms and other large phytoplankton are expected to decline more rapidly in response to increasing nutrient limitation than small phytoplankton (e.g.Bopp et al., 2005), and so models that exhibit a more substantial contribution by large phytoplankton under current conditions may project larger climate-driven declines in chlorophyll than those that exhibit larger contributions by small phytoplankton as nutrient limitation increases in response to increased vertical stratification under climate change (Sarmiento et al., 2004).If real observed trends are greater than or less than the range predicted by the ocean biogeochemical models, n * may be fewer or more than our estimates, respectively.For example, the trends could be different if other coupled climate-ocean biogeochemistry models were used.Coupled climate-ocean biogeochemistry models projections from the Coupled Model Intercomparison Project Phase 5 database that are or will be made available should be considered in future work.
C. Beaulieu et al.: Factors challenging our ability to detect long-term trends in ocean chlorophyll 3. Similarly, n * could vary if a different biome definition was used and if we assumed the biomes were also expanding as suggested in Polovina et al. (2008Polovina et al. ( , 2011)).
For example, if we were using a different biome definition exhibiting smaller internal variability and/or larger signal, n * would decrease.For chlorophyll at greater depths, n * may also vary, as it cannot be estimated using satellite data.Gliders and floats provide complementary information about the vertical structure of ocean chlorophyll as well as surface ocean chlorophyll in cloudy conditions (Boss et al., 2008;Perry et al., 2008).A global network of bio floats could provide additional opportunity to detect long-term trends in ocean chlorophyll concentration.
4. The results are based on the assumption that the average seasonal cycle remains the same year after year.Violation of this hypothesis might confuse trend detection.Some studies have suggested that the seasonal cycle in chlorophyll concentration may be changing with time in some regions (Vantrepotte and Mélin, 2009;Henson et al., 2013).However, since we do trend detection on biomes means, the changing seasonal cycle seems to be cancelling out when averaging and it seems reasonable to assume that the cycle approximately repeats itself year after year.
5. The results are also based on the assumption that the red noise estimated by the standard deviation and first-order autocorrelation from ten years of satellite data observations is representative of long-term internal variability.We assumed that these statistical properties are stationary, but the results could vary slightly if these properties change in time or if it takes longer to estimate internal variability.
6.We have used SeaWiFS satellite data only to make sure that we analyze a self-consistent time series.However, the results could vary if longer merged time series from different satellites (e.g.SeaWiFS and MODIS-Aqua) were used.Furthermore, we ignore the possible utility of other platforms such as ships, moorings, floats, gliders, and aircraft for estimating long-term trends in ocean chlorophyll and the use of these platforms to eliminate or reduce potential discontinuities when merging satellite data records.
7. Finally, we fixed the desired significance level at 5 % and a probability of detection of 0.9 to estimate the number of years of observations necessary to detect a trend, but if a larger significance level or a smaller probability of detection were targeted, it would require fewer observations to detect trends in ocean chlorophyll.
This work demonstrates the necessity of continuous monitoring of global ocean chlorophyll.This requires ensuring overlap in operation between satellites and collecting consistent in situ observations, so that validation, monitoring sensor degradation and cross-calibration of instruments is possible (NRC, 2004).In situ measurements have been successfully used to validate and reduce uncertainty in satellite ocean color data (e.g.McClain et al., 2009).Data from careful cross-calibrations that fully eliminate discontinuities across satellites should be capable of detecting trends with the same confidence as data from a single satellite.Cross-calibration methods allowing generation of unbiased time series from SeaWiFS-MODIS-VIIRS-OLCI would be crucially useful to increase our potential for the detection of climate change effects on ocean productivity.

Appendix A Additional details on the data and models and additional results
In this appendix, we provide more details about the SeaWiFS data and the three models projections that were used.For an autoregressive model of order p (AR(p)), the theoretical autocorrelation function tails off as an exponential decay or damped sine wave, and the theoretical partial autocorrelation function is equal to zero past lag-p (Wei, 1990).
The exponential decay shape of the autocorrelation function (Fig. A2a) and the partial autocorrelation function drop after lag-1 (Fig. A2b), indicating that a first-order autocorrelation model appropriately fits the noise and justifies the choice of GLS regression to study trends in ocean color chlorophyll concentration.(Nakicenovic et al., 2000).The trends magnitudes and significance are presented in Table 2.
In order to present differences in variability of chlorophyll concentrations anomalies from the three models, we also present the model projections in all biomes (Fig. A3).The variability depends on the biome and model.In general, the oligotrophic regions also show the smallest variability, in agreement with the satellite data.
Figure A4 presents the number of observations necessary to detect a trend in the biomes that were not presented in We present the fraction of data before the discontinuity between 0 and 0.5 only since the results are symmetric.For example, the number of years necessary to detect a trend of the same magnitude will be the same if the discontinuity occurs after 25 % or 75 % of the data was collected, assuming a linear trend.

Fig. 2 .
Fig.2.Deseasonalized anomalies in SeaWiFS ocean chlorophyll concentrations from 1998-2007 averaged globally and in 14 biomes.Since the magnitude and variability varies among the different regions, the chlorophyll concentrations are standardized (the mean is subtracted from the time series and then the time series is divided by its standard deviation) to display on the same scale.The dotted lines represent the trends.The two panels with a star represent the trends that are significantly different from zero.

Fig. 3 .
Fig.3.Annual ocean chlorophyll concentrations projected for 2001-2100 averaged globally and in 14 biomes from three-ocean biogeochemical models forced with the A2 scenario from the Special Report on Emission Scenarios(Nakicenovic et al., 2000).The projections are standardized (the mean is subtracted from the time series and then the time series is divided by its standard deviation) to display concentrations on the same scale.The trends magnitudes are presented in Table2.
Fig. 4. (a) Number of years of observations (n * ) necessary to detect a statistically significant trend in satellite monthly ocean chlorophyll according to the magnitude of the trend and of the fraction of data before the discontinuity (τ ).The standard deviation and autocorrelation used in the calculations were estimated from global Sea-WiFS data from 1998-2007.The range for the trend magnitude was obtained from three biogeochemical models, and these magnitudes (in absolute value) are shown on the figure as well as the model mean trend (Table2).We present the fraction of data before the discontinuity between 0 and 0.5 only since the results are symmetric.For example, the number of years necessary to detect a trend of the same magnitude will be the same if the discontinuity occurs after 25 % or 75 % of the data were collected.(b) Number of years of continuous observations necessary to detect a trend (with the magnitude of the multi-model mean trend) in satellite ocean chlorophyll according to the autocorrelation and standard deviation in the data.The star in the figure shows the values observed in global SeaWiFS data from 1998-2007.
Fig. 4. (a) Number of years of observations (n * ) necessary to detect a statistically significant trend in satellite monthly ocean chlorophyll according to the magnitude of the trend and of the fraction of data before the discontinuity (τ ).The standard deviation and autocorrelation used in the calculations were estimated from global Sea-WiFS data from 1998-2007.The range for the trend magnitude was obtained from three biogeochemical models, and these magnitudes (in absolute value) are shown on the figure as well as the model mean trend (Table2).We present the fraction of data before the discontinuity between 0 and 0.5 only since the results are symmetric.For example, the number of years necessary to detect a trend of the same magnitude will be the same if the discontinuity occurs after 25 % or 75 % of the data were collected.(b) Number of years of continuous observations necessary to detect a trend (with the magnitude of the multi-model mean trend) in satellite ocean chlorophyll according to the autocorrelation and standard deviation in the data.The star in the figure shows the values observed in global SeaWiFS data from 1998-2007.

Fig. 5 .
Fig. 5. Number of years of observations (n * ) necessary to detect a trend in satellite monthly ocean chlorophyll in six different biomes according to the magnitude of the trend and of the fraction of data before the discontinuity (τ ).The standard deviation and autocorrelation used in the calculations were estimated from SeaWiFS data from 1998-2007.The range for the trend magnitude was obtained from three biogeochemical models, and these magnitudes (in absolute value) are shown on the figure as well as the model mean trend (Table2).We present the fraction of data before the discontinuity between 0 and 0.5 only since the results are symmetric.For example, the number of years necessary to detect a trend of the same magnitude will be the same if the discontinuity occurs after 25 % or 75 % of the data were collected.
Fig. A1.SeaWiFS ocean chlorophyll concentrations anomalies from 1998-2007 averaged globally and in 14 biomes.The dotted lines represent the trends.

Figure
FigureA2presents the sample autocorrelation function and partial autocorrelation function of the SeaWiFS globally averaged anomalies in chlorophyll concentration.The autocorrelation function and partial autocorrelation functions are commonly used in autoregressive moving average model selection.For an autoregressive model of order p (AR(p)), the theoretical autocorrelation function tails off as an exponential decay or damped sine wave, and the theoretical partial autocorrelation function is equal to zero past lag-p(Wei, 1990).The exponential decay shape of the autocorrelation function (Fig.A2a) and the partial autocorrelation function drop after lag-1 (Fig.A2b), indicating that a first-order autocorrelation model appropriately fits the noise and justifies the choice of GLS regression to study trends in ocean color chlorophyll concentration.
Fig. A3.Ocean chlorophyll concentrations anomalies projected for 2001-2100 averaged globally and in 14 biomes from three-ocean biogeochemical models forced with the A2 scenario from the Special Report on Emission Scenarios(Nakicenovic et al., 2000).The trends magnitudes and significance are presented in Table2.
Fig. A4.Number of years of observations (n * ) necessary to detect a trend in satellite monthly ocean chlorophyll in eight different biomes according to the magnitude of the trend and of the fraction of data before the discontinuity (τ ).The standard deviation and autocorrelation used in the calculations were estimated from SeaWiFS data from 1998-2007.The range for the trend magnitude was obtained from three biogeochemical models and these magnitudes (in absolute value) are shown on the figure as well as the model mean trend.We present the fraction of data before the discontinuity between 0 and 0.5 only since the results are symmetric.For example, the number of years necessary to detect a trend of the same magnitude will be the same if the discontinuity occurs after 25 % or 75 % of the data was collected, assuming a linear trend.

www.biogeosciences.net/10/2711/2013/ Biogeosciences, 10, 2711-2724, 2013Table 1 .
Generalized least square trends, variability and autocorrelation of the SeaWiFS monthly ocean chlorophyll data.The trends (ω), noise standard deviation (σ N ) and autocorrelation (φ) are computed for the global mean and for the mean of each biome from January 1998-December 2007.For each region, we test whether the trends are significantly different from zero at the 5 % significance level.
a Linear trend as expressed in Eq. (1).The trends are computed using monthly data, but expressed yearly to be consistent with the trends computed using the models that are presented in Table2.*Thetrend is significantly different from zero, 5 % significance level.bFirst-orderautocorrelation of the noise estimated as presented in Eq. (2).cStandard deviation of the noise as presented in Eq. (3).d Normality hypothesis rejected, Anderson-Darling normality test, 5 % significance level.e Homoskedasticity hypothesis rejected, Breusch-Pagan test, 5 % significance level.f Independence hypothesis rejected, Durbin-Watson test, 5 % significance level.

Table 2 .
Ocean chlorophyll generalized least square trends for 2001-2100 estimated from the model outputs.

Table 3 .
List of notations.