Quantification of Photosynthetic Pigments in Neopyropia yezoensis Using Hyperspectral Imagery

Phycobilisomes and chlorophyll-a (Chla) play important roles in the photosynthetic physiology of red macroalgae and serve as the primary light-harvesting antennae and reaction center for photosystem II. Neopyropia is an economically important red macroalga widely cultivated in East Asian countries. The contents and ratios of 3 main phycobiliproteins and Chla are visible traits to evaluate its commercial quality. The traditional analytical methods used for measuring these components have several limitations. Therefore, a high-throughput, nondestructive, optical method based on hyperspectral imaging technology was developed for phenotyping the pigments phycoerythrin (PE), phycocyanin (PC), allophycocyanin (APC), and Chla in Neopyropia thalli in this study. The average spectra from the region of interest were collected at wavelengths ranging from 400 to 1000 nm using a hyperspectral camera. Following different preprocessing methods, 2 machine learning methods, partial least squares regression (PLSR) and support vector machine regression (SVR), were performed to establish the best prediction models for PE, PC, APC, and Chla contents. The prediction results showed that the PLSR model performed the best for PE (RTest2 = 0.96, MAPE = 8.31%, RPD = 5.21) and the SVR model performed the best for PC (RTest2 = 0.94, MAPE = 7.18%, RPD = 4.16) and APC (RTest2 = 0.84, MAPE = 18.25%, RPD = 2.53). Two models (PLSR and SVR) performed almost the same for Chla (PLSR: RTest2 = 0.92, MAPE = 12.77%, RPD = 3.61; SVR: RTest2 = 0.93, MAPE = 13.51%, RPD =3.60). Further validation of the optimal models was performed using field-collected samples, and the result demonstrated satisfactory robustness and accuracy. The distribution of PE, PC, APC, and Chla contents within a thallus was visualized according to the optimal prediction models. The results showed that hyperspectral imaging technology was effective for fast, accurate, and noninvasive phenotyping of the PE, PC, APC, and Chla contents of Neopyropia in situ. This could benefit the efficiency of macroalgae breeding, phenomics research, and other related applications.


Introduction
Red macroalgae (Rhodophyta) are a group of ancient photosyn thetic eukaryotes distributed worldwide [1,2]. Phycobilisomes and chlorophylla (Chla) are the main photosynthetic pigments of red algae. Phycobilisomes play essential roles in red algae, serving as the primary lightharvesting antennae for photo system II, and can transfer energy to photosystem I [3,4]. Phycobilisomes are considered an indicator of total protein content because they make up ~40% of cell proteins [5]. As large protein complexes, phycobilisomes consist of phycobiliproteins including phycoerythrin (PE), phycocyanin (PC), allophyco cyanin (APC), and linker proteins. Chla is a critical pigment in all photosynthetic organisms, and its content is commonly used to estimate the N content in higher plants [6].
Neopyropia, also known as nori or laver, is an economically important red macroalga widely farmed in East Asian countries, including China, Japan, and Korea [7]. The global production of Neopyropia was almost 3 million tons (fresh weight) and was valued at approximately US$2.66 billion in 2019 [7]. Neopyropia spp. thalli have a delicious flavor and contain numerous nutri tional components. In particular, it has a high protein content of ~25 to 40% dry weight, which is even higher than that of soybean [8][9][10]. For Neopyropia, the contents and ratios of 3 main phycobiliproteins (PE, scarlet color; PC, blue color; APC, indigoblue color) and Chla (green) not only contribute to the total protein content but also determine the color of the blade and is a visible trait to evaluate its commercial quality [8,11,12].
The traditional analytical methods used for measuring phy cobiliprotein and chlorophyll contents include ultravioletvisible (UVVis) spectrophotometry [13], highperformance liquid chromatography [14], and liquid chromatography-mass spec trometry [15]. Although stable and accurate, these conventional methods are costly, laborious, timeconsuming, destructive, highly experiencedependent, cannot be performed on a large scale, and cannot satisfy the requirements of highthroughput and nondestructive phenotyping [16]. Thus, rapid and efficient approaches for evaluating phenotraits must be developed.
In recent decades, phenomics has been indicated as an im portant tool to address and understand causal links between genotypes and environmental factors and phenotypes [17,18]. However, the lack of fast, highthroughput, nondestructive, and accurate phenotyping method is limiting the efficiency of phenomics research and modern plant breeding [19,20]. With the development of sensors and technology, hyperspectral im aging (HSI) is a promising method for solving this bottleneck [6,21,22]. As an integration of imaging and conventional spec troscopy, HSI can generate data in the form of a 3dimensional (3D) spatial map of spectral variation, with the first 2 dimen sions providing spatial information and the third accounting for spectral information [23]. With this combination, the HSI system can simultaneously extract spatial and spectral infor mation related to plants' structural texture and physiology [24,25]. In general, HSI systems record a wide range of reflec tance spectra, including visible (Vis; 400 to 700 nm), near infrared (NIR; 700 to 1000 nm), and shortwave infrared (SWIR; 1100 to 2500 nm) ranges. The reflectance's strength and wave length depend on the nature of biological materials [26]. Typ ically, for terrestrial plants, the Vis reflectance is mainly influenced by photosynthetic pigments, such as chlorophyll, carotenoids, and anthocyanins; the NIR reflectance is determined by light scattering by the tissue structure; and the SWIR reflectance is dominated by dry matter and water [23,27,28]. However, HSI is usually influenced by external factors, imaging components, and sample complexities. Spectral preprocessing methods are commonly used before applying any modeling tools to reduce the variations and obtain more homogeneous and less noisy data from HSI [26]. Multivariate analysis methods based on chemometrics or machine learning methods, such as partial least squares regression (PLSR), support vector machine regres sion (SVR), random forest, and convolutional neural networks, provide robust models for plant biochemical data and spectral information [29][30][31].
Numerous studies based on HSI systems have been used to accurately assess crop traits in a highthroughput manner [32][33][34]. As traits can be accurately measured across hundreds of individuals of a target species, quantitative genetic tools can be used to identify regions of the genome or specific genes controlling variation in the target trait [34]. Herzig et al. [35] applied HSI to estimate the concentrations of 15 grain elements in 1,420 barley lines and located 75 quantitative trait loci using a genomewide association study (GWAS). Ikeogu et al. [36] predicted carotenoids in 173 cassava root samples using HSI combined with the random forest method. Using the developed models, GWAS was conducted to analyze the substantial ge nomic regions associated with carotenoids using 594 cassava clones. Sun et al. [37] used HSI combined with GWAS to dis sect the protein content of rice seeds and identified 65 genes. Barnaby et al. [38] evaluated rice grain quality using high throughput HSI for GWAS and revealed plausible candidate genes. Comparatively, algae have been studied less in high throughput phenotyping research than terrestrial plants [39,40].
Most studies quantify algal biochemical contents using non imaging optical spectroscopy, which provides information on a limited surface area [41][42][43][44]. The literature on the use of HSI technology to quantitatively measure the biochemical prop erties of algae is scant. Vahtmäe et al. [45] predicted the Chla + Chlb concentrations in the brown alga Fucus vesiculosus and the green alga Cladophora glomerata and Ulva intestinalis using HySpex (Norsk Elektro Optikk, Norway) with a spectral range from 410 to 988 nm. The lipid concentration and fatty acid un saturation of the green microalgae Scenedesmus obliquus were investigated using HSI [46,47]. Nevertheless, to our knowledge, no such study has been conducted on red macroalgae.
In this study, experiments were conducted to evaluate the potential of HSI for determining phycobiliproteins (PE, PC, and APC) and Chla content in red macroalgae Neopyropia yezoensis. This was done to establish a rapid, highthroughput, and non destructive method to displace traditional timeconsuming and destructive methods for macroalgal phenotyping. Two types of multivariate modeling analysis methods based on PLSR and SVR were evaluated to establish optimal prediction models.

Algal materials
The genetically pure line PYL201306440 (RZ) of N. yezoensis cultured in the laboratory was used in this study. Thalli were cultured in sterilized natural seawater with Provasoli's en richment solution (PES) medium at 10 °C, with a light con centration of 60 μmol photons m −2 s −1 and a 12h light:12h dark photoperiod. The culture medium was renewed every 3 d until the thalli reached 5 cm in length. The thalli were then transferred to other containers to perform experimen tal incubation.
The N. yezoensis thalli were treated under different levels of nutrients and different light qualities to obtain different pig ment concentrations. This was based on previous studies that indicated that nutrient limitation and different light qualities would change the concentration and composition of phyco biliprotein and Chla in red algae [48,49].
The thalli were subjected to 3 different nutrient levels (low, normal, and high) at 10 °C, with a light concentration of 60 μmol photons m −2 s −1 and a 12h light:12h dark photoperiod. Ar tificial seawater [50] without nitrogen and phosphorus supply was used for the lownutrient level treatment and sterilized sea water with 1× PES and 3× PES for normal and highnutrient levels, respectively. The culture medium was changed daily to ensure an adequate nutrient supply. The experiment was con ducted over 9 d. Three thalli were collected from each nutrient level from the second day and denoted as one sample. Three replicate samples for each nutrient level were collected daily for further analysis.
An MC 1000 test tube photobioreactor (PSI Photon Systems Instruments, Drasov, Czech Republic) was used to cultivate the thalli at different light qualities. The thalli were subjected to 8 light qualities with a light concentration of 60 μmol photons m −2 s −1 and a 12h light:12h dark photoperiod at 10 °C. The thalli were cultured in sterilized natural seawater with 1× PES medium, which was changed daily to ensure an adequate nutri ent supply. Three thalli were collected from each light quality after 7 d of cultivation in the bioreactor and denoted as one sample. Three replicate samples of each light quality were col lected for further analysis.

Hyperspectral image acquisition
An HSI system that mainly consisted of a hyperspectral camera (Specim IQ, Specim Ltd., Oulu, Finland), two 150W halogen lamps, and a piece of 70% reflection Teflon sheet (Jingyi Optoelectronics Technology Co. Ltd., Guangzhou, China) was designed to obtain the hyperspectral image data (Fig. 1). The spectral measurements were obtained with a line scanner and comprised a wavelength range of 400 to 1000 nm (Vis-NIR) with a 7nm spectral resolution and 204 spectral bands in total. The image size was 512 × 512 pixels [51]. Two halogen lamps were placed on the sides, pointing to the camera's viewing area. During the acquisition of the hyperspectral images, the thalli were spread on a glass plate on a Teflon sheet and brought into the camera's viewing area with a horizontal white reference panel (99% light reflection) beside the sample. In the present study, the lenstosample distance was approximately 40 cm, and the exposure time was 8 ms. The camera recorded the dark reference automatically, whereas the white reference was re corded simultaneously with the sample. Image correction was performed automatically using the software loaded into the equipment. The image data were processed and stored in the default recording mode. The study acquired hyperspectral im ages of 96 laboratory samples for further analysis.

Image segmentation and spectral data extraction
After the conversion of raw spectral radiance images to spectral reflectance images, the average reflectance of the region of inter est (ROI) for each sample was extracted using the Environment for Visualizing Images (ENVI) 5.1 software (Exiles Visual Information Solutions, USA). When selecting ROIs, shadows and highlights in the image were excluded.

Spectral data preprocessing
Radiometric calibration using a white reference was performed to compensate for the effect of horizontal inplane inhomoge neous illumination [27]. However, the hyperspectral image collection process is usually influenced by many factors [52]. The preprocessing methods are valid tools for highlighting the desired spectral characteristics and for reducing the data var iability before modeling [29,53]. In this study, 5 preprocessing methods, i.e., Savitzky-Golay (SG, polynomial order: 2; points of window: 5) smoothing method + standardization, SG + standard normal variate (SNV), SG + first derivative, SG + second derivative, and multiplicative scatter correction (MSC), were performed prior to building prediction models to improve the performance.

Extraction and determination of the pigments
Phycobiliproteins and Chla extractions were performed imme diately after hyperspectral data acquisition to obtain the pig ment concentration of each sample. Each fresh sample was first weighed and then frozen in liquid nitrogen for grinding using a tissue lyser (TissueLyser24L, Jingxin Industrial Development Co. Ltd, Shanghai, China) after removing all surface moisture. Next, 4 ml of precooled phosphatebuffered saline (50 mM, pH 6.8) was added. The homogenized sample was maintained in an ice bath in the dark and frequently vibrated for 1 h. The sample was then centrifuged at 13,000 × g and 4 °C for 20 min. The sample supernatant was transferred to a new tube and maintained in an ice bath under dark conditions. Then, 4 ml of precooled phosphatebuffered saline (50 mM, pH 6.8) was added to the precipitate for repeated extraction. The mixture of the 2 supernatants was treated as the initial crude phyco biliprotein extract of each sample. Subsequently, 6 ml of 80% (v/v) acetone was added to the precipitate, and the sample was incubated in the dark for 20 min with continuous oscillation and then centrifuged at 10,000 × g and 4 °C for 10 min to obtain the crude chlorophyll extracts. This process was performed in an ice bath. For each sample, the measurements were performed in triplicate. The absorbance of the phycobiliprotein was mea sured at 498, 614, and 651 nm, and the PE, PC, and APC con tents (mg/g) were calculated according to the equations described by Kursar et al. [54]. The absorbance of Chla was measured at 646 and 663 nm. The Chla content (mg/g) was calculated ac cording to the equation described by Wellburn [55].

Multivariate modeling analysis
PLSR and SVR were used to predict phycobiliprotein and Chla content (mg/g) in the thalli of N. yezoensis using spectral infor mation. The 96 laboratory samples were split into 2 groups: 79 (approximately 80%) for model training and 17 (approximately 20%) for testing.
PLSR is one of the most commonly used methods for hyper spectral image data analysis [53]. It is used to optimize the co variance between the label and linear combinations of fea tures by simultaneously decomposing the multivariate input data [56]. A leaveoneout crossvalidation scheme was em ployed to calibrate the model. Fifty latent factors were consid ered for the PLSR model establishment, and the size of the models was determined by the number of latent variables (n LV ), representing the minimum root mean square error of cross validation (RMSECV).
SVR is a nonlinear machine learning method that can ef ficiently perform nonlinear regression using a kernel trick [57]. SVR is appealing in the spectral regression field because of its ability to handle small training datasets successfully [58]. A radial basis function kernel with a Gaussian profile was used to reduce the computational complexity of the SVR training procedure. Regularization parameters C and g were op timized to improve the performance of the SVR model. C controls the tradeoff between minimizing model complexity and training error, and g is the width of the radial basis function [41]. The performance of the SVR model depends on the advisable choice of the 2 parameters to determine a hyperplane with the mini mum predictive error.

Model performance evaluation
The model performance was statistically evaluated using the coefficient of determination (R 2 ), RMSE, ratio of performance to deviation (RPD), and mean absolute percent error (MAPE). The RPD is a widely used criterion for predictive model evalua tion in chemometrics modeling. Higher RPD values correspond to better analytical performance [59,60]. In this study, we divided the models into 3 groups according to different RPD values based on previous studies. The model with RPD > 3.0 means excellent quantitative prediction with high confidence. The model with an RPD between 2.0 and 3.0 means a good quantitative predic tion that may not be used for quantitative analysis but could be used for qualitative analysis. If RPD < 2.0, then the model may not be useful and should be improved further. The MAPE value was used to assess the accuracy of the model prediction. If the value was <15%, then the model had high accuracy. The values of these indices were calculated using the following equations: where N is the number of N. yezoensis in the training set (79) or test set (17); y i and ŷ i are the laboratorymeasured and model predicted values, respectively; and SD and mean are the stand ard deviation and mean of the laboratorymeasured values.

Validation of the optimal models using field samples
Twentyfour field samples collected from the Qingdao coast were used for further validation of optimal models. The hy perspectral images were acquired under the same conditions as the laboratory samples, and pigment extraction was per formed as described above. The R W 2 , RMSE, MAPE, and re siduals were used to evaluate the model performance for the field samples.

Prediction maps of phycobiliproteins and Chla contents in N. yezoensis
The best models for the 4 pigments were selected for the pre diction at the individual pixel level. Prediction maps are bene ficial for quantitative observation of phycobiliproteins and Chla at any point in the thalli. The 4 pigments were visualized using MATLAB 2019b.

Spectral characteristics of N. yezoensis
The mean reflectance spectra of the ROI were extracted from 96 hyperspectral images to develop prediction models. Figure  2A shows the raw average spectra of the N. yezoensis samples under different treatments in the Vis-NIR range. Overall, con sistent spectral patterns were observed for different treatments in the 400 to 1000nm wavelength region. The typical absorp tion wavelengths of N. yezoensis in the Vis spectral region were 440, 500, 570, 620, and 660 nm ( Fig. 2A). There were also 5 typical reflectance peaks at approximately 475, 522, 557, 595, and 651 nm ( Fig. 2A). A dramatic increase in reflectance was observed at the transition from the Vis to the NIR wavelength and maintained high values throughout the NIR domain. Ap parent differences in reflectance values were observed in the Vis-NIR spectral region among the samples. There were also overlaps and fluctuations in the original spectra, which adversely affected the model performance. To minimize the impact of interference and achieve an accurate and reliable model for prediction, 5 preprocessing combination methods, i.e., SG smoothing + standardization (Fig. 2B), SG + SNV (Fig. 2C), MSC (Fig. 2D), SG + first derivative preprocess ing (Fig. 2E), and SG + second derivative preprocessing (Fig.  2F), were performed before model generation.

Reference data of phycobiliproteins and Chla contents in N. yezoensis
It is essential to ensure an adequate range and precision of trait data as a reference for developing spectral calibrations and mod eling. In this study, fresh N. yezoensis thalli treated with different levels of nutrients and light qualities and collected from the field contained varied phycobiliprotein and chlorophyll contents. The descriptive statistics for the PE, PC, APC, and Chla content of the samples are summarized in Table 1.

PLSR models based on full spectra (400 to 1000 nm)
Linear PLSR models were performed based on full spectra (400 to 1000 nm) with preprocessing methods to predict the content of PE, PC, APC, and Chla in N. yezoensis thalli. As listed in Table  2, the 5 preprocessing methods resulted in diverse prediction performances. Among them, the model established with SG smoothing followed by standardization could predict all 4 pig ments optimally, especially PE, which was more predictable than the others in the Vis-NIR ranges. With this combination, the optimal number of latent variables (n LV ) for PE was 10, with the highest training R 2 (R Train 2 ) value of 0.92 and test R 2 (R Test 2 ) value of 0.96. The RMSE Train and RMSE Test values of PE content pre diction were 0.6614 and 0.4804 mg/g for the training and test sets, respectively. For PC and Chla, the prediction models were satisfactory, with the R Test 2 above 0.9. However, for APC predic tion, the R 2 values were low at 0.74 and 0.79, with an RMSE of 0.2369 and 0.2520 mg/g for the training and test datasets, respectively. In this study, MAPE values for the best model of PE, PC, and Chla were 8.31%, 9.94%, and 12.77%, respectively. However, the MAPE value of APC was 21.03%, which meant lower accuracy. The PE, PC, and Chla models developed in this study using the PLSR method had RPD values of more than 3.0, which could be used for quantitative evaluation; however, the APC model could only be used for qualitative analysis.

SVR models based on full spectra (400 to 1000 nm)
The same training and test datasets as those of the PLSR model were used to construct the SVR model. The PE content was best predicted by the SVR model combining MSC preprocessing, and the R Test 2 , RPD, and MAPE T were 0.96%, 4.50%, and 9.08%, respectively ( Table 3). The PC content was well predicted by the SVR model with SG + SNV processing and the Chla content by SVR with SG + first derivative. All PE, PC, and Chla con tents were quantitatively predicted, as indicated by RPD > 3.0. The SVR models for APC were still insufficient for quantitative prediction, with an optimal RPD of 2.53. Compared with the  Fig. 4. Scatter diagrams and residual analysis plots of reference value vs. the predicted value of PE, PC, APC, and Chla contents in field samples using the PLSR and SVR models.
PLSR models, the best prediction models of the 4 pigments in SVR modeling showed a slight difference in model performance. The optimal SVR model of PE using the preprocessing method of MSC showed less accuracy than the optimal PLSR model of PE using SG + standardization preprocessing. However, for PC and APC, the optimal SVR models combining SG + SNV preprocessing showed better performance than PLSR models combining SG + standardization preprocessing with R Test 2 improved from 0.93 to 0.94 and from 0.79 to 0.84, respectively. The RMSE Test values of SVR optimal models for PC and APC were reduced to 0.3068 and 0.2232 mg/g, corresponding to the improvements in the prediction accuracy with MAPE values of 7.18% and 18.25%, respectively. Meanwhile, the RPD values of PC and APC were further improved to 4.16 and 2.53. The per formance of the Chla prediction models constructed by PLSR and SVR showed almost the same results, indicating that the linear and nonlinear regression methods had no significant dif ference for Chla content prediction. The dispersions between the reference and predicted values of PE, PC, APC, and Chla in the optimal integrated PLSR and SVR models (bold in Tables 2 and 3) are shown in Fig. 3. In conclusion, the best prediction for PE and Chla was the PLSR model with spectra preprocessed by SG smoothing, followed by standardization calculation. The optimal method for PC and APC prediction was the SVR model with SG smoothing followed by SNV preprocessing.

Validation of the optimal prediction models
On the basis of the optimal models established above, 24 field collected N. yezoensis samples were used for validation. Among the 4 pigments, PE content was still predicted best by the PLSR model combining SG + standardization with R W 2 , RMSE, and MAPE of 0.9481, 0.3507 mg/g, and 14.64%, respectively. The re sid uals of the PE prediction ranged from −0.5063 to 0.8265 mg/g. For PC and Chla, the prediction R W 2 was also greater than 0.9, and MAPE was less than 15%, indicating high prediction accu racy. The prediction of APC was still unsatisfactory, with R W 2 of less than 0.9 and MAPE significantly higher than 20%. The results showed that the optimal models established in this study could be used for the quantitative phenotyping of PE, PC, and Chla content in field samples of N. yezoensis.

Prediction maps of phycobiliproteins and Chla in N. yezoensis
The visualized maps representing phycobiliproteins and Chla content predicted by the best models are shown in Fig. 5, along with an RGB image of the N. yezoensis thalli. The pigment con tent is presented in terms of the gradient of color variation from blue to red. The content gradually increased from the holdfast to the tip, through the entire thalli for all 4 pigments. From these visualized images, through our established models, the pigment contents could be deduced from any point on the thalli rather than from the average values of the whole thalli as per traditional methods.

HSI for high-throughput phenotyping
Monitoring the physicochemical properties of plants and their growth while interacting with the surrounding environment is one of the most important aspects of plant phenotyping [61,62].
However, highthroughput phenotyping methods for investi gating the phenotypes of macroalgae are still in the early stages of development [39]. Wet chemistry methods to evaluate phyco biliprotein and chlorophyll contents are timeconsuming, la borious, destructive, and highly experiencedependent. Usually, it takes about 4 h to extract phycobiliproteins and chlorophyll from the thalli of Neopyropia using the traditional method. However, using HSI, it took only 2 to 3 min to spread the N. yezoensis thalli to avoid the overlapping influence on reflec tance and another approximately 2 min for image acquisition using a hyperspectral camera in a nondestructive manner. Overall, we estimated that the HSI approach for predicting phycobili proteins and chlorophyll contents was at least 50 times faster than traditional methods in terms of data collec tion. In addition, with the standard methods established in this study, the optimal prediction models showed satisfactory robustness and accuracy for both laboratory and field samples (Figs. 3 and 4).
Moreover, on the basis of HSI, it is possible to monitor the thallus over its lifetime and on a large scale. Once the traits can be measured accurately across large numbers of individ ual thalli, specific regions of the genome or genes controlling trait variations can be identified using quantitative genetic tools [34]. Compared with traditional destructive methods, multiple properties of any point of interest can be modeled and assessed simultaneously from one image using HSI, which could fur ther improve the throughput and reduce the cost of the mea sure ment [28]. Furthermore, this fast and nonde structive phenotyping method could also be used on other economical macroalgae thalli such as Neoporphyra haitanensis, Undaria pinnatifida, and Saccharina japonica. The hyperspec tral data collection pro cess is faster and more convenient for most macroalgae.
As one of the most important macronutrients, N forms parts of proteins, free amino acids, and chlorophyll molecules [63]. In N. yezoensis thalli, the phycobiliproteins and Chla contribute approximately 40% of cell proteins [5], which could be used to assess N content. In this study, the prediction model of phyco biliproteins and Chla content using HSI system detecting wavelengths of 400 to 1000 nm could be extended to the evaluation of N content in N. yezoensis, which usually strongly correlates with wavelengths of 1,300 to 2,500 nm in higher plant [64,65].

HSI on nondestructive prediction of phycobiliproteins and Chla contents
HSI, which combines spatial information and highresolution spectral reflectance data, is a promising and noninvasive meth od for detecting biochemical variations in plants [31,66]. This study introduced the HSI system to predict phycobiliprotein and chlorophyll content in Neopyropia. On the basis of multi variate modeling analysis, the established models could be used for the quantitative prediction for PE, PC, and Chla with high accuracy (Tables 2 and 3). Using the HSI system not only pro vides numerical information for pigment contents of the whole thalli rapidly and nondestructively but also visualizes the distribution maps of pigment content through a thallus. The spatial distribution of PE, PC, APC, and Chla observed for Neopyropia showed no homogeneity throughout the individual thallus, and values of any point of interest on the thallus could be deduced from our models (Fig. 5). In addition, lower pig ment concentrations were observed near the holdfast than in other parts, corroborating that the cell base region near the holdfast was the region of rapid cell division in N. yezoensis. Similarly, in Laminariales, a lower pigment content was found in meristematic cells at the bottom of the blade than in other parts of the blade cells [67]. Overall, the prediction maps of the 4 pigments provided a visual representation of spatial hetero geneity within the thallus of N. yezoensis. Using the HSI system, this study is the first successful estimation of phycobiliprotein and chlorophyll content in red algae.
The stress response in algae is complex and involves a variety of physiological and biochemical adaptions at different levels, leading to pigment changes and reallocation [Asaari et al., 2018;68]. The composition of PE and PC vary significantly during chromatic adaptation [5]. The elimination of nitrogen from the growth medium of red algae leads to the rapid degradation of PE, followed by the degradation of the PC hexamer [49]. In this study, Neopyropia thalli responded to various light qualities and nutrition conditions by producing different concentrations and proportions of pigments, which supplied adequate range data for modeling in the HSI system. The variation in pigment con tent can also be used to evaluate different responses and adaptive capacities under different stress conditions [69]. It is possible to detect genotypic differences among different strains by nonde structively observing the response of physiological and bio chemical phenotypes to various stresses using the HSI system.

Advantages of multivariate modeling analysis based on full spectra for HSI
The spectral characteristics of N. yezoensis were different from those of higher plants. Absorption bands at approximately 500 and 570 nm were due to the unique absorption of PE [5]; mean while, a salient absorption near 620 nm should be attributed to the presence of PC [70]. Moreover, absorption features ap peared near 440 and 660 nm owing to chlorophylls [71,72]. The absorption at approximately 970 nm may be related to the O-H stretching vibration [60,73]. Thalli with lowpigment con cen trations exhibit relatively high reflectance values through out the entire spectral region [69]. These differences revealed that the physicochemical properties of the thalli were altered during the experimental treatments [74].
The potential of hyperspectral sensing has not been fully re alized in previous studies, and it commonly uses a simple mode of vegetation indices derived from only 2 or 3 bands [75].
Vahtmäe et al. [45] used 9 vegetation indices to predict chlorophylla, chlorophyllb, and carotenoid concentrations in the marine macroalgae F. vesiculosus, C. glomerata, Chara aspera, and Chara horrida. The result showed that the coeffi cient of determination (R 2 ) between chlorophylla, chlorophyllb, and vegetation indices were between 0.41 and 0.67 depend ing on the index within 4 macroalgae. Choo et al. [76] used normalized difference vegetation index to pre dict chloro phyll content with a correlation coefficient of 0.70. How ever, vegetation indices only consist of a minimal number of spec tral bands, which might ignore the characteristic bands of traits [68], and the indices might not be suitable for diverse species [6]. Moreover, all complex compounds (such as pigments, lipids, and carbohydrates) contribute to the spectral characteristics of the blade [28]; therefore, specific compounds might not be predicted accurately using vegetation indices.
In this study, multivariate modeling analysis based on full spectra was used to process image data to establish more ac curate models of vegetation indices for the photosynthetic pigment contents in N. yezoensis. The R Train 2 , R Test 2 , and R W 2 of optimal models for PE, PC, and Chla concentrations predic tions were all greater than 0.9, and MAPE of the 3 pigments were all less than 15%, that meaning they could be used for quantitative analysis with satisfactory accuracy and robustness (Tables 2 and 3 and Figs. 3 and 4). The PE prediction model showed the best performance with a prediction error of 0.4804 mg/g in the range of 1.840 to 13.450 mg/g. The prediction error of the PC and Chla content was 0.3068 mg/g in the range of 0.960 to 5.270 mg/g and 0.1530 mg/g in the range of 0.290 to 2.230 mg/g, respectively. For APC content, the model could only be used for qualitative analysis with the best R Test 2 and RPD values of 0.8399 and 2.53, respectively, with a prediction error of 0.2232 mg/g in the 0.430 to 2.350 mg/g range. This may be attributed to the small variation in the APC content under the experimental conditions [77] and the overlap of the absorp tion peak between APC (650 nm) and Chla (660 nm). Multi variate modeling analysis plays an essential role in extracting meaningful information from widerange spectral data for qualitative and quantitative analyses [57].
In our study, the SVR model performed better than the PLSR model, indicating a more precise prediction. Mountrakis et al. [56] also reported that SVR often produces higher accuracy than traditional methods because of its ability to handle small training datasets successfully. The SVR models performed bet ter than the PLSR models in the quantitative determination of biochemicals, such as phosphorus in seafood [57], soluble solid content in sweet potato [78], and soluble solid con tent in Agaricus bisporus [79]. Ge et al. [28] used Vis-NIR-SWIR to determine leaf physiological traits in maize, and the SVR model performed slightly better than the PLSR model for 5 traits.
Although the optimal models showed satisfactory results, the model performed better on the laboratory samples than on the fieldcollected samples. The results might be attributed to the range of pigment content in the laboratory samples that could not cover that of the fieldcollected samples. It has been found that content outside the modeling range may lead to extra prediction errors [80]. In addition, other machine learn ing approaches such as random forest, convolutional neural network, artificial neural network, and deep learning meth ods could also be considered for a robustness test in future work. Moreover, the genetic variations between laboratory and fieldcollected samples may increase the complexity of the analysis [63,81]. Genetics and breeding with highthroughput phenotyping generally involve hundreds of genotypes; there fore, more N. yezoensis genotypes should be used to improve the application of the model in further studies.