Microplastic Effect Tests Should Use a Standard Heterogeneous Mixture: Multifarious Impacts among 16 Benthic Invertebrate Species Detected under Ecologically Relevant Test Conditions

Microplastics require a risk assessment framework that takes their multidimensionality into account while exclusively considering robust data. Therefore, effect tests should use a diverse, environmentally relevant microplastic (ERMP) standard material that adheres to high-quality requirements. In this study, we provide chronic dose–effect relationships and effect thresholds for 16 benthic species exposed to ERMP. The ERMP was created from plastic items collected from natural sources and cryogenically milled to represent the diversity of microplastics. The test design met 20 previously published quality assurance and quality control criteria. Adverse effect thresholds (EC10) were determined at ERMP concentrations of 0.11 ± 0.17% sediment dry weight (Gammarus pulex, growth), 0.49 ± 0.68% sediment dry weight (Lumbriculus variegatus, growth), and 1.90 ± 1.08% sediment dry weight (L. variegatus, reproduction). A positive effect of microplastics, such as decreased mortality, was observed for Cerastoderma edule (EC10 = 0.021 ± 0.027% sediment dry weight) and Sphaerium corneum (EC10 = 7.67 ± 3.41% sediment dry weight), respectively. Several of these laboratory-based single-species effect thresholds for ERMP occurred at concentrations lower than those found in the environment. For other species, no significant effects were detected up to an ERMP dose of 10% dry weight.

showed that for the experiments with A. aquaticus, H. azteca and C. edule 80 to 90% of the nominal concentration was maintained throughout the tests (Figure S7).As P. platycheles moved the sediment around quite a bit, 64 ± 5.4% of the nominal concentration was maintained in the sediment throughout the tests (Figure S7).That bioturbation can lead to partial exposure to ERMP via the aqueous phase for some organisms was considered part of the desired ecological relevance of the tests.

Verification of background contamination
During the microplastic exposure experiments, additional experimental units (n=34) were added with DSW alone or filtered seawater to quantify background contamination from atmospheric deposition, if any.Water samples (n=11) were filtered on a 20 µm aluminium filter whereafter rinsed and immersed in 15 ml H 2 0 2 for 48h at 37 °C.Consequently, the solution was filtered on an Anodisc (pore size 0.2 µm, Ø=25 mm, Whatman) and dried in an oven at 37 °C for at least 2 days.The Anodisc filters were placed on a calcium fluoride crystal window and analysed with a Cary 620 FT-IR Imaging Microscope with Focal Plane Array detector and 4x objective. 3 data were analysed with Simple and MAPP software respectively. 4100-200 green spherical PE particles of ~ 90 µm were added to milli-Q in order to determine the recovery S3 (89.7% ± 4.5%, n=3).All handling took place in a laminar flow cabinet.While measures were taken to avoid contamination 26.8 ± 33.1 background particles were found per experimental unit (blank and recovery corrected).Background particle mass was calculated using particlespecific volume and polymer ID 4 , which resulted in 3.14 (± 5.63) × 10 -3 g/experimental unit (Table S4).This constitutes a negligible fraction of 1.6% of the lowest exposure concentration of 0.1% weight percentage.

Preparation of environmentally relevant microplastic particles
Microplastic particles with varying polymer type, size, shape and colour were created in the laboratory, in proportions that match those occurring in the environment as specified below. 1,5 irst, naturally aged macroplastic items such as bottles, buckets, packaging materials and rope were collected in the National park 'de Biesbosch', The Netherlands (Figure S1).The items were washed in a dishwasher and consequently analysed with ATR-FTIR (Agilent Cary 630) in order to determine polymer identity and purity, and sorted accordingly.Per polymer type (PE, PP, PET), plastic items were manually cut into pieces of approximately 0.5 to 1.0 cm, whereafter the fragments were grinded (Ultra Centrifugal Mill ZM 200) with a stainless steel 2 mm ring sieve (Figure S2).Liquid nitrogen was used in order to cool the mill and to prevent the plastics from melting.This process was repeated with half of the resulting particles with a ring sieve size of 1.0 and 0.5 mm respectively.With the exception of the PP particles which melted when using the 0.5 mm ring sieve size.
Additionally it was not possible to mill the PP fibres or PET fragments with a 2.0 mm ring sieve as these melted even when an ample amount of liquid nitrogen was added.
Per polymer, microplastic size fractions < 0.5 mm, 0.5 -2 mm and > 2mm were analysed separately.In order to illustrate the shapes of the different size fractions, high resolution pictures were made with an Olympus SZX10 stereomicroscope (Figure S3, 4).Particles were analysed for size (width, length, height, diameter) with Laser Direct Infra-Red spectroscopy (LDIR 8700, Agilent). 6Smaller particles, resulting from grinding with a sieve size of 0.5 mm, were analysed with automatic particle characterization, with maximum particle sensitivity and a selected size range of 10 to 5000 μm.All resulting particle identifications were visually inspected in order to check for overlapping particles and false identifications.Particles which resulted from grinding with the 2.0 mm sieve were measured manually with the LDIR measurement tool.To enable weight to particle conversions, microplastic LDIR samples were accurately weighed before being measured and counted.The amount of particles per gram in the final mix was approximately 5.9 x 10 6 .This translated to a particle concentration of 5.9 x 10 6 , 1.8 x 10 7 , 5.9 x 10 7 , 1.5 x 10 8 , 2.9 x 10 8 , 5.9 x 10 8 /kg dw soil for the weight concentrations of 0.1, 0.3, 1.0, 2.5, 5.0, 10.0%/dw sediment used in the bioassays.
After particle characterization of the separate size and polymer fractions, they were combined to form an environmentally relevant microplastic particle mixture, similar to relative abundances of polymers and sizes as found globally in the aquatic environment. 7weight distribution of polymers used to create the microplastic mix consisted of irregular PE fragments (34%), irregular PP fragments (15.9%),PP fibres (10.5%), and irregular PET fragments (20.6%).Additionally, irregular PS fragments (19%) were added to the mixture, which were acquired from Axalta Coating Systems GMBH, Cologne, Germany. 2 PS polymer type was confirmed with FTIR and the size distribution was measured with a Mastersizer 3000 particle size analyser (Malvern Instruments).Additionally, high resolution pictures were made with an Olympus SZX10 stereomicroscope and analysed with ImageJ 8 for length and width of the PS particles.The reason for conducting an additional analysis based on laser diffraction (Mastersizer 3000) was that we wanted to be able to compare the PS powder with the PS powder used in our previous study (Redondo-Hasselerharm et al., ES&T, 2018).Note that laser diffraction assumes all particles are spheres and thus provides only equivalent sphere parameters.Such data cannot be directly compared with the PSD data reported by Kooi et al. (2021), which we used to mimic the power law slope for ERMP, as the latter data are based on particle length information from IR-based Image analysis.
Therefore, LDIR data were used, as LDIR spectroscopy provides actual lengths, width, and shape (ir-)regularity data, which were fit for the purpose of our study.
In order to remove any additives present in the plastic, microplastics were washed with methanol and hexane three times and mixed on a shaker table for at least two hours per wash. 1,9  37 μm metal sieve was used to squeeze the methanol and hexane out of the particle mixture, after which particles were gently dried in a fume hood for two days.Note that contamination precautions at this stage were not needed because any addition from external sources would be minor and would only add to the desired diversity of the material, which was thoroughly characterised anyway.

Microplastic characterisation
The size distribution based on the particle distribution of the ERMP mix with PS ranged from 9 to 5386 μm in their longest dimension with a modus around 48 μm (Figure S5).The 75 th percentile of the ERMP mix with PS particles is situated at 97 μm.The distribution of ERMP with PS was fitted to a power law , where α is the slope.For the complete ERMP  =  -α mix, with PS, a slope of 3.28 ± 0.02 was measured which is equal to the slope found for microplastic particles in freshwater sediment (3.25 ± 0.19). 10,11 he mean slope is almost identical to microplastics found in freshwater sediments α = 3.25 ± 0.19. 11The polymer types PET and PE that were grinded with a 0.5 mm sieve had a mean particle size of 202 μm and a median at 128 μm, with sizes ranging from 15 to 2245 μm.The PP and PE particles that were grinded with the 2.0 mm sieve had a mean of 281 μm and a median of 150 μm, with sizes ranging from 53 to 2165 μm.The PET particles grinded with a 2.0 mm sieve had a had a mean particle size of 2230 μm and a median of 2456 μm, with sizes ranging from 96 to 4588 μm.Microplastic fibres produced from PP rope had a mean length of 2012 μm and a mean width of 224 μm, with sizes ranging from 329 to 5386.The size distribution of PS MPs measured with ImageJ 8 ranged from 9 to 366 μm with a median of 67 μm and a mean of 78 μm.The size distribution of PS MPs measured with Mastersizer ranged from 14.5 to 400 μm with a modus centred at 32 μm (Figure S6).

Sediment
Sediments were sieved with 2 mm sieve and stored at -20°C in order to preserve organic matter and kill any organisms present.A subsample was set aside in order to analyse background contamination.Prior to use in the experiments, sediment was thawed, the top layer of water disposed, homogenized and total organic matter content (TOM) was analysed as loss on ignition.The freshwater sediment had a TOM content of 6.8% ± 0.42 (n=5).The marine sediment had a TOM content of 3.8% ± 0.16 (n=10).

Experimental set up
In total 17 chronic, single species bioassays were performed.The systematic testing approach in this study is similar to the one followed by Redondo-Hasselerharm et al. (2018), however in the previous study sediment had a TOM content of 31.6% ± 3.5 (n=4). 2 While ecologically justifiable, this high TOM content could possibly mask adverse effect of microplastics, and hence we chose a sediment with a lower, more common TOM content.In order to maintain comparability between our systematic approaches, we repeated the previous experiment by Redondo-Hasselerharm et al. (2018) with G. pulex.Once with the lower TOM content sediment and PS fragments as used by Redondo-Hasselerharm, and once with the lower TOM content and with ERMP instead of PS fragments.For experiments 1 and 2, experimental units were made by either adding PS fragments or ERMP without PSfragments to sediment (Table S2, 3) in the following concentrations 0, 0.5, 1, 3, 5, 10 and 20 weight %.For experiments 3 t/m 17 ERMP including PS fragments (Table S2, 3) were added to the sediment in the following concentrations 0, 0.1, 0.3, 1.0, 2.5, 5.0 and 10.0 weight %.
Concentrations ranging from environmentally relevant (0 to 1.0%) to high concentrations (2.5-20%) are included in order to cover criteria related to relevance as well as to statistical rigour in finding an effect threshold. 1,12 ediment-microplastic mixtures were thoroughly, manually homogenized with a stainless steel spoon after which DSW or filtered seawater was gently added with a 3:1 water to sediment ratio.This ratio has been demonstrated to provide good water quality and habitat conditions during chronic exposures in earlier tests. 2,12 Eachexperimental unit was made in quadruplicate and additionally four blanks (containing only DSW or seawater) were added in order to measure background microplastic contamination, e.g. from atmospheric composition.Systems were randomized and subsequently left to acclimatize for two weeks before adding the organisms.To each experimental unit, 11 to 22 organisms were added depending on the size of the organisms (Table S2, 3).Experimental units of C. riparius were covered with a 1 mm mesh after seven days to retain emerged adults.Freshwater organisms, excluding the crustaceans, were fed weekly with 0.1g/bioassay organic nettle powder dissolved in DSW.In contrast, the crustaceans G. pulex, H. azteca and A. aquaticus were fed dried poplar leaves.Marine filter feeders were fed weekly with 20 ml high density algal solution (Halamphora coffeaeformis).
A. virens was fed 5.6% of its wet weight feed pellets supplied by Topsy Baits (Wilhelminadorp, The Netherlands).C. volutator was fed ground JBL Novo Prawn.Exposure duration for all organisms was 28 days.Dissolved oxygen, pH, temperature, conductivity/salinity and NH 3 concentrations were measured twice a week.In order to keep water quality parameters optimal, DSW and seawater were refreshed every other day (Table S5, 6).

Selection of best dose-response model and assessment of threshold effect parameters
All single species dose-response data was analysed using the dose response curve (drc) package in R. 13 This package contains a range of models including 2 to 4 parameter loglogistic models and Weibull models.For the continuous response data growth, reproduction and feeding rate, log-logistic models with a Gaussian distribution were fitted.In order to evaluate the quality of the model fit, the normality of residuals were tested with the Shapiro-Wilk test and visually inspected with a Q-Q plot.The homogeneity of variance was checked with Levene's test.If the assumption of homogeneity of variance was not satisfied, robust standard errors were provided (package lmtest). 13,14 or the endpoints mortality and emergence a log-logistic model with a binomial distribution was fitted.The best fitting model was selected based on the lowest AIC (mselect function R) 13 and visual inspection.
The best fitting type of dose-response model, and corresponding threshold effect concentrations are reported (Table S7, S8, S9, S10).

Statistical significance of dose-dependency
For the best fitting model from the previous step, the selected best fitting dose-response model was compared to a linear regression model with a slope of 0, which corresponds to absence of a dose-response relationship.This was done using the log-likelihood ratio test (noEffect function in R) 13 .If the p-value of the log-likelihood ratio test (p noEffect ) is less than p= 0.05, the null hypothesis of 'no difference' was rejected and we concluded that there is a significant difference between the two models, implying that the dose-response relationship is statistically significant.If the p-value of the log-likelihood ratio test (p noEffect ) was > 0.05, we concluded, in combination with visual inspection, that the dose-effect relationship was not significant.The p-values of the log-likelihood ratio tests are reported for all endpoints and organisms (Table S7, S8, S9, S10).

Testing for differences between effects on freshwater versus marine species, and for differences between feeding traits
In order to explore if the effects of microplastics on mortality were different for species from the freshwater vs marine environment, a Generalized Linear Mixed model (GLMM) was used (lme4 package R). 15 As mortality is the response variable, the model was fitted using the binomial family with a logit link.The explanatory variables 'concentration' and 'environment' S10 were added as an interaction term.In order to allow for the variability between organisms, a random effect for the different organisms was added.
Similarly, a Generalized Linear Mixed model (GLMM) was used to explore if the effects of microplastics on mortality were different for the different feeding traits; filter feeders, sediment/deposit feeders, sediment grazers and facultative deposit feeders (Table S12).As mortality is the response variable, the model was fitted using the binomial family with a logit link.The explanatory variables 'concentration' and 'feeding trait' were added as an interaction term.In order to allow for the variability between organisms a random effect for the different organisms was added.The results of the GLMMs were interpreted by examining the estimated coefficients of the interaction term and accompanying standard error and p-value.These are provided for each GLMM.All statistical analyses and graphs were performed in RStudio.
The feeding rate (mg dw leaf/organism/d) of G. pulex was calculated from the loss of poplar leaves.Where L1 and L2 are the initial and final dry weight of poplar leaves respectively.Cl is the decomposition factor measured in the control systems.Li1 and Li2 are the initial and final amount of organisms in the system and t is the exposure time.b) The variable 'concentration' alone does not appear to be a significant predictor of mortality.
c) The variable 'environment' alone does not appear to be a significant predictor of mortality.
d) The interaction term indicates whether the mortality of marine or freshwater organisms is more affected by the concentration of microplastics, which is statistically significant.b) The variable 'concentration' alone does not appear to be a significant predictor of mortality.
c) The variable 'feeding trait' alone does not appear to be a significant predictor of mortality.
d) The interaction term indicates whether one of the 'feeding traits' is more affected by the concentration of microplastics, which is statistically significant for the feeding trait Sediment/deposit feeder vs Filter feeders.b) The variable 'concentration' alone does not appear to be a significant predictor of mortality.

S32
c) The variable 'feeding trait' alone does not appear to be a significant predictor of mortality.
d) The interaction term indicates whether one of the 'feeding traits' is more affected by the concentration of microplastics, which is statistically significant for the feeding trait Grazer/scavengers vs Filter feeders.b) The variable 'concentration' alone does is a significant predictor of mortality for the group Filter feeders.

S33
c) The variable 'feeding trait' alone does not appear to be a significant predictor of mortality.
d) The interaction term indicates whether one of the 'feeding traits' is more affected by the concentration of microplastics, which is statistically significant for the feeding trait Filter feeders vs Sediment grazer/scavengers, Sediment/deposit feeder and Facultative deposit feeders.b) The variable 'concentration' alone does not appear to be a significant predictor of mortality.

S34
c) The variable 'feeding trait' alone does not appear to be a significant predictor of mortality.
d) The interaction term indicates whether one of the 'feeding traits' is more affected by the concentration of microplastics, which is statistically significant for the feeding trait Facultative deposit feeders vs Filter feeders.

Figure S1 .
Figure S1.Variety of plastic items collected in the Biesbosch, The Netherlands.Starting left upper corner, clockwise: Brown PE bucket, blue PE bucket, black PP rope, orange PP rope, blue PP crate, green, blue and transparent PET bottles.

Figure S7 .
Figure S7.Verification of exposure concentrations using loss on ignition (LOI) at end of exposure for experiments with A. aquaticus, H. azteca, C. edule and P. platycheles.

Figure S8 .
Figure S8.Mean feeding rate ± s.d. of G. pulex.expressed in mg dw leaf per organisms consumed during 28 days of exposure to A) PS and B) ERMP with concentrations up to 20% in sediment d.w..Note that concentrations are on a log scale, additionally the zero concentration has been converted to 0.1 to allow plotting on the log scale.

16 S11 Table S1 .
Quality Assurance/Quality control (QA/QC) criteria score for testing effects of MP in aquatic

Table S2 .
Overview of endpoints for the tested benthic freshwater macroinvertebrates.

Table S3 .
Overview of the tested benthic marine macroinvertebrates. S21

Table S5 .
Mean water quality parameters of freshwater experiments (Mean ± s.d.)

Table S6 .
Mean water quality parameters of marine experiments (Mean ± s.d.)

Table S7 .
Estimates and statistical significance of effect threshold concentrations and best-fitting doseresponse models for mortality freshwater species.
Significant differences (p < 0.05) between the dose-dependent and the dose-independent model are highlighted in bold.Dose-response curve package in R provides the following models: Weibull type I model (W1.x) and log logistic (LL.x), with x giving the number of parameters fitted.

Table S8 .
Estimates and statistical significance of effect threshold concentrations and best-fitting doseresponse models for mortality marine species

Table S9 .
Estimates and statistical significance of effect threshold concentrations and best-fitting dose-response models for 303Significant differences (p< 0.05) between the dose-dependent and the dose-independent model are highlighted in bold.. 304 Dose-response curve package in R provides the following models: Weibull type I model (W1.x) and log logistic (LL.x), with x 305 giving the number of parameters fitted.S28

Table S10 .
Estimates and statistical significance of effect threshold concentrations and best-fitting dose-response models for growth Marine species. S29

Table S11 .
Estimated coefficients of the Generalized Linear Mixed Model with binomial distribution, with response variable Mortality, and explanatory variables Concentration and Environment (Marine and Freshwater).The organisms Lumbriculus variegatus and Chironomus riparius were excluded from analysis as they have different endpoints, reproduction and emergence, respectively.

S30 322 Table S12 .
Division of organisms with different feeding traits.

Table S13a .
Estimated coefficients of the Generalized Linear Mixed Model with binomial distribution, with response variable Mortality, and explanatory variables 'Concentration' and "Feeding trait" with four levels; Filter feeders, Sediment/deposit feeder, Sediment grazer/scavengers, Facultative deposit feeders, see tableS12.The group Sediment/deposit feeder is used as a reference to compare the other groups.The organisms Lumbriculus variegatus and Chironomus riparius were excluded from the analysis as they have different endpoints, reproduction and emergence, respectively.

Table S13b .
Estimated coefficients of the Generalized Linear Mixed Model with binomial distribution, with response variable Mortality, and explanatory variables 'Concentration' and "Feeding trait" with four levels; Filter feeders, Sediment/deposit feeder, Sediment grazer/scavengers, Facultative deposit feeders, see tableS12.The group Grazer/scavengers is used as a reference to compare the other groups.The organisms Lumbriculus variegatus and Chironomus riparius were excluded from the analysis as they have different endpoints, reproduction and emergence, respectively.

Table S13c .
Estimated coefficients of the Generalized Linear Mixed Model with binomial distribution, with response variable Mortality, and explanatory variables 'Concentration' and "Feeding trait" with four levels; Filter feeders, Sediment/deposit feeder, Sediment grazer/scavengers, Facultative deposit feeders, see tableS12.The group Filter feeders is used as a reference to compare the other groups.

Table S13d .
Estimated coefficients of the Generalized Linear Mixed Model with binomial distribution, with response variable Mortality, and explanatory variables 'Concentration' and "Feeding trait" with four levels; Filter feeders, Sediment/deposit feeder, Sediment grazer/scavengers, Facultative deposit feeders, see tableS12.The group Facultative deposit feeders is used as a reference to compare the other groups.