Arabidopsis phenotyping through geometric morphometrics

Abstract Background Recently, great technical progress has been achieved in the field of plant phenotyping. High-throughput platforms and the development of improved algorithms for rosette image segmentation make it possible to extract shape and size parameters for genetic, physiological, and environmental studies on a large scale. The development of low-cost phenotyping platforms and freeware resources make it possible to widely expand phenotypic analysis tools for Arabidopsis. However, objective descriptors of shape parameters that could be used independently of the platform and segmentation software used are still lacking, and shape descriptions still rely on ad hoc or even contradictory descriptors, which could make comparisons difficult and perhaps inaccurate. Modern geometric morphometrics is a family of methods in quantitative biology proposed to be the main source of data and analytical tools in the emerging field of phenomics studies. Based on the location of landmarks (corresponding points) over imaged specimens and by combining geometry, multivariate analysis, and powerful statistical techniques, these tools offer the possibility to reproducibly and accurately account for shape variations among groups and measure them in shape distance units. Results Here, a particular scheme of landmark placement on Arabidopsis rosette images is proposed to study shape variation in viral infection processes. Shape differences between controls and infected plants are quantified throughout the infectious process and visualized. Quantitative comparisons between two unrelated ssRNA+ viruses are shown, and reproducibility issues are assessed. Conclusions Combined with the newest automated platforms and plant segmentation procedures, geometric morphometric tools could boost phenotypic features extraction and processing in an objective, reproducible manner.

A description of all resources used, including antibodies, cell lines, animals and software tools, with enough information to allow them to be uniquely identified, should be included in the Methods section. Authors are strongly encouraged to cite Research Resource Identifiers (RRIDs) for antibodies, model organisms and tools, where possible.
Have you included the information requested as detailed in our Minimum Standards Reporting Checklist?
Availability of data and materials All datasets and code on which the conclusions of the paper rely must be either included in your submission or deposited in publicly available repositories (where available and ethically appropriate), referencing such data using a unique identifier in the references and in the "Availability of Data and Materials" section of your manuscript.  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 Introduction developing field, could give rise to reproducibility issues, due to different growing substrate-segmentation 48 algorithms combinations. Moreover, different approaches give sometimes the same name to different 49 parameters (e.g. "roundness" in ImageJ [11] vs. [7]) or different names to the same parameter (e.g. "solidity" 50 in [8] equals "compactness" in [4,7] and "surface coverage" in [2]). The need of developing objective, 51 mathematically and statistically sound and more accurate shape descriptors in plants has been stressed out in 52 recent reviews on the topic [12][13][14]. 53 Nonetheless, image datasets analyses require a conceptual and statistical corpus of knowledge that is not 54 always present in a plant biologist's research field. Plant phenotyping relies on skills and technologies that 55 are used to characterize qualitative or quantitative traits regardless of the throughput of the analyses [1]. One 56 such knowledge corpus is morphometrics [15]. 57 Traditional morphometric analyses such as measures and ratios of length, depth and width were widely used 58 in Paleontological and Zoological studies throughout the 20 th century. To the end of that century the seminal 59 work of [16] was re-evaluated under the light of multivariate analysis and novel mathematical developments 60 [17,18], giving rise to modern geometric morphometrics (GM), in which was called a "revolution" in 61 morphometrics [19][20][21]. 62 GM combines geometry, multivariate morphometrics, computer science and imaging techniques for a 63 powerful and accurate study of organismal forms. This family of methods in quantitative biology is proposed 64 to be the main source of data and analytical tools in the emerging field of phenomics [22]. Formally, GM is 65 "a collection of approaches for the multivariate statistical analysis of Cartesian coordinate data, usually (but 66   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 not always) limited to landmark point locations" (http://life.bio.sunysb.edu/morph/glossary/gloss1.html). 67 Landmark methods have been successfully applied to various species, and have the advantage of being easy 68 to understand [23]. 69 Besides enhanced statistical power and better descriptive and graphical tools, GM allow researchers to 70 decompose form in size and shape, and the whole configuration of the organism under study is analyzed, 71 rather relying on the description of relative displacements of pairs of points. 72 GM is now a mature discipline that has been widely applied in biology [24][25][26] (see [27] for a review). In 73 plants, leaves of grapevine [28] and oak [29,30] were studied using GM methods. 74 Plant viruses cause important worldwide economic losses in crops [31]. Symptoms include plant stunting, 75 changes in leaf morphology, and sometimes plant death [32] and vary depending on various aspects 76 including genetic compatibility and environmental conditions. 77 Even given a particular host-virus interaction, different viral strains trigger different symptomatology, which 78 are more or less subtle for the observer to distinguish [33][34][35]. Comparing the severity of qualitative viral 79 symptoms is a difficult task performed mainly by visually rating symptoms (e.g. [36]). Consequently, 80 morphological differences could be difficult to describe and reproducibility issues could arise. 81 Arabidopsis thaliana (L.) Heynh. has been extensively used in studies of influences of environmental factors 82 on plants, paving the path to the development and testing of experimental techniques or data analysis 83 methods [37]. The Arabidopsis rosette is a nearly two-dimensional structure in the vegetative phase [8], 84 which facilitates image acquisition and interpretation. 85 Here, it is proposed a case study where GM tools are applied to study and quantitatively describe 86 morphometric changes triggered in A. thaliana plants by RNA viruses belonging to two unrelated families. It 87 is proposed a particular selection of landmarks to locate in the Arabidopsis rosette during its vegetative 88 phase. The study spans from the earlier stages of viral infection to later ones, when symptoms are already 89 visually detectable by naked eye. Comparisons are made between discriminant power of computer-assisted 90 classification and expert human eye. Symptoms severity provoked by both viruses is also compared, based 91 on the relative morphometric changes induced relative to healthy controls. Changes in allometric growth, 92 phenotypic trajectories and morphospace occupation patterns are also investigated. Size analyses are also 93 performed and the problem of growth and development modeling is discussed in the context of viral 94 infections. Throughout this work, several bioinformatics resources are applied, in order both to extract the 95 higher degree of information available, but also to exemplify different and complementary possibilities that 96 nowadays GM offers for the accurate description of shape in Arabidopsis. 97 98 99   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 Plant growth conditions 100 A. thaliana Col-0 seeds were stratified at 4°C for 3 days. Plants were grown under short days conditions (10 101 h light/14 h dark cycle, T(°C)= 23/21, Hr(%)= 60/65, and a light intensity of 150 μE m-2 s-1) in a controlled 102 environmental chamber ( landmark. Landmarks 1, 2, 3, 4 and 5 (which are located at the tip of leaves #8 to #12 and are the maxima of 128 curvature of that structure) and landmarks 6, 7, 8, 9 and 10 (which are located at the intersection of the 129 petiole and the lamina of each leaf from #8 to #12) cannot be unambiguously assigned due to the continuous 130 nature of the leaf curvature and are Type 2 landmarks. Each specimen was digitized in less than 1 minute. 131

Materials and Methods
The Output of TPSDig2 is a .TPS file containing information about specimen name, scale factor, and raw 132 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 coordinates of each landmark for all specimens digitized. Landmark digitization was repeated to estimate 133 measurement error for each specimen a week after the first digitization. 134 Validation of the tangent space approximation 135 For a given M-dimensional structure with K landmarks (here, M = 2 and K = 11) it can be imagined an 136 individual´s shape as a point in an M X K multidimensional space (a hypersphere). After centering and 137 rescaling, 3 dimensions are lost and shapes are said to be in a pre-shape space; they are not rotated yet. The 138 distance in the surface of the hypersphere at which rotation differences between shapes are minimal, is called 139 Procrustes distance. Afterwards, a reference (average) shape is selected and all other shapes are rotated to 140 minimize distances relative to it, generating a shape space and losing one more dimension (remaining 2K-4). 141 Because distances over curved multidimensional spaces are non-Euclidean, conventional tools of statistical 142 inference cannot be used. Fortunately, for most biological shapes an approximation to Euclidean distances is 143 valid, by projecting shape points to a tangent Euclidean space (for a visual explanation see [44]). This 144 assumption should, however, be tested when new forms are being analyzed. The TPSSmall program is used 145 to determine whether the amount of variation in shape in a data set is small enough to permit statistical 146 analyses to be performed in the linear tangent space approximate to Kendall

163
This work aims to introduce the use of GM tools for the analysis of Arabidopsis rosette phenotypes in an 164 objective and repeatable way. As such, it is not intended to offer a complete introductory explanation of each 165   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 GM tool, an objective that is beyond the scope of this paper. Such a task was already performed by [30] and 166 for a complete introductory explanation of GM tools applied in biological systems it is recommended the 167 lecture of [44]. Software used in this work frequently has its own user´s manual and informative examples 168 [42,46,47,50]. Nevertheless, with the purpose to facilitate the comprehension of this work to newcomers in 169 the field of GM, each tool is briefly described prior to its application throughout the Results section. 170 Morphometrics aims at analyzing the variation and covariation of the size and shape of objects, defining 171 altogether their form. Shape and form might be confusing words, used as synonyms in many languages [10]. 172 Hereafter, it will be followed the GM definition of shape in the sense of [17] that it is "all the geometric 173 information that remains when location, scale and rotational effects are filtered out from an object". 174

Landmarks digitization, Procrustes fit and outliers detection 175
At the heart of GM analyses is the concept of landmarks. Landmarks are discrete anatomical loci that can be 176    ones, but also be placed in regions that experience dramatic changes upon infection. A relatively reduced 196 number of landmarks can be used to describe complex forms [30,54]. 197 An 11-landmark configuration for the Arabidopsis rosette is shown in Figure 1A (see Materials and 198 Methods). Plants were inoculated in their third true leaf (24 plants were mock-inoculated and 17 were 199 ORMV-infected) and images were acquired starting from three days post-inoculation (DPI) to 12 DPI (see 200 Materials and Methods). Leaves below number 8 were not chosen for landmark placement for three main 201 reasons: a) they are hidden for younger leaves at later stages of infection b) these old leaves had almost 202 finished their growth by the time the first photographs were taken (and the form covered would be a less 203 informative one for the process of shape and size change upon viral infection) and c) the senescence process 204 of older leaves lead to morphological changes derived from dehydration and death. Younger leaves (beyond 205 leaf number 12) were not chosen because they were not present at the earlier stages of infections, therefore 206 violating the requisite of repeatability of landmarks. 207 A flowchart of data analyses throughout this paper is shown in Figure 1B were also done to perform further grouped analyses. 216 The first step of shape analysis in GM consists in extracting shape coordinates from raw data obtained at the 217 digitization step. The procedure that has become standard in GM studies is the Generalized Procrustes 218 Analysis (GPA). 219 The purpose of Procrustes procedures is to remove from the specimens all information that is not relevant for 220 shape comparisons, including size. Specimens are firstly translated at the origin ("superimposed") by 221 subtracting the coordinates of its centroid from the corresponding (X or Y) coordinates of each landmark. 222 Then, differences in size are removed rescaling each specimen to the mean centroid size (CS) (CS is 223 calculated as the square root of the summed squared distances of each landmark from the centroid, giving a 224 linearized measure of size). Differences in rotation are eliminated by rotating specimens minimizing the 225 summed squared distances between homologous landmarks (over all landmarks) between the forms. The 226 process starts with the first specimen, and then an average shape is found that now serves as a reference. It 227 proceeds iteratively over all specimens until no further minimization of average distances are found [55]. 228 MorphoJ performs a full Procrustes fit, which is a variant of the analysis that is more conservative and 229 resistant to outliers of shape. 230 In Arabidopsis, the arrangement of organs along the stem (phyllotaxy) follows a predictable pattern, the 231 Fibonacci series. Phyllotaxy orientation can be either clockwise or counter-clockwise [56]. This should be 232 taken into account because clockwise and counter-clockwise rosettes are biological enantiomorphs like right 233 and left hands and must not be superimposed by GPA. Opportunely, MorphoJ automatically performs 234 reflections on every specimen when executing a GPA and therefore it is not a problem at this stage, but care 235 must be taken when using different software. 236 After executing a full Procrustes fit of each dataset, they were inspected for the presence of outliers. The 237 shape of one Mock-inoculated plant (M2) diverted the most from the rest in 11 out of 16 datasets. Therefore, 238 it was excluded from all datasets for successive analyses. 239 Afterwards, datasets were combined and the "Combined dataset 3-12 DPI" was created with 640 240 observations included following a common GPA. Then, a wireframe was created that connects consecutive 241 landmarks. This tool aids visualization, as will be explained later. Next, the "Combined dataset 3-12 DPI" 242 was subdivided by DPI. This creates one dataset for each DPI, each one with the two digitization outputs for 243 each plant. 244

Validation of the tangent (Euclidean) space approximation 245
Prior to conducting further analyses, a basic assumption of GPA-based GM analysis should be tested: that the 246 projections of shapes in Kendall´s shape space onto a tangent Euclidean shape space are good 247 approximations for the studied shapes. This task is performed by basically comparing the Procrustes 248 distances (the conventional measure of a morphometric distance in geometric morphometrics [57]) obtained 249 using both shape spaces (see Materials and Methods). Two subsets of data were created for each DPI, one 250 with Mock-inoculated and the other with ORMV-infected plants. Next Figure 1). This results are in 261 line with several similar analysis performed onto a variety of biological forms [30,[58][59][60]. 262 Testing measurement error and variation between treatments using Procrustes ANOVA 263 As mentioned before, two digitizations were performed on each plant at each DPI, in order to evaluate 264 measurement error. This procedure is important because digitization error should always account for far less 265 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 variance in the subsequent analyses than specimens and treatments do [30]. There are the differences 266 between specimens and particularly between treatments that are worth investigating, not human error in 267 landmark placement. Purposely, datasets for each DPI were combined and subjected to a hierarchical 268 analysis of variance (ANOVA). In MorphoJ this is a Procrustes ANOVA, with "Treatment" as an additional 269 main effect, "ID" for the individuals and "Digitization" as the Error1 source. In Procrustes ANOVA, 270 variance is partitioned by means of hierarchical sum of squares (SS) which implies that each effect is 271 adjusted for effects that appear earlier in the hierarchy. This is taking into account the nested structure of the 272 data (an issue that is crucial if the design is unbalanced, i.e., with unequal sample sizes as is here the case), 273 thus allowing one to quantify differences in Treatments and plants regardless of Treatment. The variance 274 unexplained by any of these effects is measurement error and it is estimated using the differences between 275 digitizations. Hence, total variance was decomposed into main (treatment) and random (ID and digitization) 276 components and was expressed as a percentage of total variance for each DPI. The analysis is executed 277 simultaneously for both size and shape. Results are shown in Supplementary Table 2. Explained variance (as 278 a % SS) for which measurement error accounted for was in the range of 0.01 and 0.12 for size and 0.40 and 279 1.15 for shape over all DPIs. Thus, measurement error was negligible throughout the digitization process. 280 Detailed analysis of results shown in Supplementary Table 2 revealed that for size, the Individual (ID) effect 281 was highly significant at each DPI as evidenced by Goodall´s F-test (p < 0.0001). Treatment effect was 282 insignificant from 3 to 5 DPI but starting from 6 DPI the virus affected the plant´s size (0.0001 < p < 0.03). 283 For shape, the Individual effect was also highly significant at each DPI as evidenced both by Goodall´s F-test 284 (p < 0.0001) and by MANOVA (standing for Multivariate Analysis of Variance) results (p < 0.0001). 285 Treatment impacted earlier shape than size, since as soon as 5 DPI differences in shape were detected (p < 286 0.001). 287

PCA 289
Once shape variables (the 22 Procrustes Coordinates) are extracted for all specimens at each DPI, it is useful 290 to plot differences between individuals and treatments. However, patterns of variation and covariation 291 between lots of variables are difficult to envision particularly because geometric shape variables are neither 292 biologically nor statistically independent [44]. PCA is a technique that allows simplifying those patterns and 293 making them easier to interpret. By performing a PCA, shape variables are replaced with complex variables 294 (principal components, PCs) that do not covary but carry all the information. Moreover, as PC axis are 295 orthogonal and independent, and most of the variation in the sample usually can be described with only a few 296 PCs, shape analysis could be restricted to very few axes, avoiding the need of jointly interpret dozens of 297 variables. It is important to keep in mind that PCA is useful for the comparison between individuals, not 298 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 groups, and though a powerful descriptive tool, it does not involve any statistical test. Therefore, the relative 299 separation of groups in a PCA plot does not allow one to extract conclusions about significant differences (or 300 its absence). 301 Firstly, this technique was used to inspection error measurement (previously quantified by Procrustes 302 ANOVA, Supplementary Table 2). A covariance matrix was created for the "Combined dataset 3-12 DPI" 303 and then a PCA was performed. Scatterplots were generated for the first 4 PCs, which together account for 304 87.2 % of the total variance ( Figure 2). The proximity of equally colored points indicates a small digitization 305 error. 306 As measurement error explained a negligible percentage of variance, digitizations were averaged within 307 specimens and DPIs. From the "Combined dataset 3-12 DPI" it was created the "Combined dataset 3-12 DPI, 308 averaged by ID DPI" dataset, which contains all the 320 averaged observations. The averaged data were used 309 to find the directions of maximal variance between individuals. A covariance matrix was generated and a 310 PCA performed. PC1 accounted for 64.2 % of total variance and the first 4 PCs summed up to 87.4 % of it. 311 PCs 4 and beyond accounted for less than 5 % of variation each and are therefore of little biological interest. 312 Afterwards, PCA output was used for which is one of its main purposes in GM: visualization of shape 313 change. Three types of graphs were obtained: PC shape changes (a diagram showing the shape changes 314 associated with the PCs); Eigenvalues (a histogram showing the percentages of total variance for which the 315 PCs account) and PC scores (a scatterplot of PC scores). 316 PC Scatterplots show the distribution of specimens along the axes of maximum variance ( Figure 3A, B). To 317 aid visualization, dots corresponding to early stages (3-6 DPI) were lightly colored and later ones (7-12 DPI) 318 were darker colored. Results evidenced that PC1 is a development-related axis, because clearly separated 319 early (mostly negative values) from late (positive values) stages of the experiment ( Figure 3A). Moreover, at 320 later stages ORMV-infected plants scored less positive in this axis, suggesting that infected plants retained a 321 more juvenile (pedomorphic) shape. Positive extremes of PC2-4 are related to ORMV shapes. 322 To this point, GM visualization tools are used to better understand what these relative positions on 323 scatterplots mean respect to shape differences. 324

Visualization of shape changes 325
Prior to showing graphs from the "PC shape changes" tab, a brief description of common GM visualization 326 tools is needed in order to accurately interpret the results. After the GPA, every configuration in the sample 327 is optimally aligned to the average configuration and nearly optimally aligned to every other configuration in 328 the sample. GPA removed differences attributable to size, position and orientation from configurations. All 329 differences that remain are shape variation. Accordingly, shape differences are found using the relative 330 displacements of the landmarks from one shape to another shape nearby in shape space [61]. By 331   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 superimposing a target shape to a starting shape and looking for the relative displacement of homologous 332 landmarks from one shape to another, insights on how variation between shapes occurs can be obtained and 333 hypotheses about the underlying mechanisms, proposed. 334 A key concept to bear in mind is that it is fundamentally wrong to consider landmarks displacements in an 335 isolated manner [44,61]. This is because all the landmarks included in the GPA jointly determine the 336 alignment of each configuration in relation to the mean shape. Then, variation in the position of each 337 landmark after superimposition is relative to the positions of all other landmarks. Though a shift is shown at 338 every landmark, these shifts are relative to all other landmarks. Lollipop and wireframe graphs are based on 339 these assumptions (see below). 340 Shape variation could be depicted by means of transformation grids. Transformation grids are 341 mathematically constructed following the thin-plate spline technique, whose detailed explanation is far from 342 the objective of this work and has been explained elsewhere (Klingenberg, 2013;Zelditch et al., 2012). 343 Briefly, landmarks of a starting shape are placed on a grid of an imaginary infinitely thin metal plate. 344 Landmarks of a target configuration are placed on another grid of equal characteristics, and both metal sheets 345 are superimposed. Each landmark in the starting shape (e.g., mean shape) is linked to its homologous to 346 reach the target configuration, and the deformation caused in the spline is calculated finding the smoothest 347 interpolating function that estimates energy changes in the spline between landmarks. Importantly, Hence, by depicting the -PC1 component, target shapes have negative values ( Figure 3C). It can be seen that 357 -PC1 explains the relative shortening of leaves #11 (landmarks 4 and 9) and #12 (landmarks 5 and 10). It 358 makes sense, since younger plants have yet to develop these relatively new leaves. Petioles of leaves #10, 359 #11 and #12 are particularly shortened. Relative to these shortenings, older leaves (#8 and #9) are relatively 360 longer but, interestingly, only its laminae, since its petioles are not relatively elongated. Taken together, PC1 361 reveals that ORMV impaired the elongation of newer leaves to their normal extent. PC2 ( Figure 3D) 362 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 associates with relative radial displacements of leaves; tips of leaves #8 and #9 (landmarks 1 and 2) come 363 close together, lowering the typical angle between successive leaves from near 137.5° to close to 90°. These 364 relative displacements determine that leaves #9 and #10 form an exaggerated angle of near 180°. PC3 ( Figure  365 3E) also mostly relates to radial changes in the infected rosette: leaf #10 is relatively displaced towards leaf 366 #8 and the main effect is, again, the increase of the angle between leaves #9 and #10 to near 180°. PC4 367 ( Figure 3F) explains less proportion of total variance (4.5%) and its effect is less clear; it is mostly related to 368 the relative displacement of the lamina of leaf #11 towards leaf #9, almost without altering its petiole, which 369 functions as a hinge. Leaves #9 and #10 are, as a combination of the effects depicted by PC2 and PC3, both 370 relatively displaced towards leaf #8. Taken together, the wireframe visualization of the first four PCs (which 371 account for more than 87% of total variance) show that ORMV induces the relative shortening of laminae 372 and (especially) petioles of newest leaves, which relates to a pedomorphic shape, and provokes the relative 373 displacement of leaves #9 and #10 towards leaf #8.  Figure 3G). It can be directly compared with Figure 3C. 380 Lastly, transformation grids are exemplified for -PC1 in Figure 3H-I. Figure 3H depicts the starting (mean) 381 shape. Figure 3I show the transformed grid for -PC1. The compression of the grid in the central zone is the 382 result of the relative displacement of landmarks 3, 8 (leaf #10), 4, 9 (leaf #11) and 5, 10 (leaf #12) towards 383 the center of the rosette, whereas grid stretching is detected around landmarks 1 and 2 (revealing the relative 384 expansion of laminae of leaves #8 and #9, since its petioles remain relatively immobile, landmarks 6 and 7). 385 As stated above, visualization with these grids should be cautiously interpreted since the interpolation 386 function deforms the grid between places where no landmarks are placed (and no information about even the 387 existence of tissue is guaranteed). Therefore, only regions near landmarks should be discussed when viewing 388 these graphs. To see these changes in more detail, PCA analyses were performed for each DPI. The 389 "Combined dataset 3-12 DPI, averaged by ID DPI" was subdivided by DPI performing a common Procrustes 390 fit, creating eight new datasets (DPIs) (raw data in Supplementary file ORMV.morphoJ) . Covariance 391 matrices were generated and a PCA performed for each DPI dataset. PC1 accounted for between 27-43 % of 392 total variance and the first 4 PCs summed up from 78 to 84 % of it. PCs beyond PC4 accounted for 5 % or 393 less of variation each. Shape change visualization showed that PC1 gradually separated specimens belonging 394 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 to different treatments. Mock-inoculated plants were progressively more aligned with positive PC1 values. 395 PC2 was more generally related to ORMV-infected plants in its positive values. Relative shortening of 396 younger leaves and petioles, and relative displacement of leaves towards leaf #8 were progressively more 397 accentuated (Supplementary Figure 2). 398

Discriminant Analysis 399
So far, differences between individuals were addressed with the aid of PCA. Afterwards a Discriminant 400 Analysis (DA) was performed to test whether differences between treatments exist. 401 DA is a technique mathematically related to PCA. It finds the axes that optimize between-group differences 402 relative to within group variation. It can be used as a diagnostic tool [44]. It is here used for testing 403 treatments using a series of tests for sample mean differences including an estimate of the accuracy of shape 404 in predicting groups. The capability of DA to correctly assign specimens to treatments was assessed along 405 the experiment using the averaged datasets for each DPI. In MorphoJ, Discriminant Function Analysis was 406 requested selecting "Treatment" as classification criterion. By default, DA in MorphoJ performs a parametric 407 Hotelling's T-square test, and here there were also requested permutations tests for the Procrustes and 408 Mahalanobis distances with 1000 random runs. Hotelling's test is the multivariate equivalent of the common 409 Student´s t-test. Procrustes and Mahalanobis distances show how far shapes from one group are from the 410 mean of the other group. Results of the tests are shown in Table 1. At 5 DPI the three tests found shape 411 differences between treatments (0.001<p<0.005). From 6 DPI and beyond, p-values were extremely 412 significant (p<0.0001). These results coincide with those obtained by Procrustes ANOVA of shape 413 (Supplementary Table 2). DA maximizes group separation for plotting their differences and predicting group 414 affiliation (classification). The classification of a given specimen (through the discriminant axis) is done 415 using functions that were calculated on samples that included that same specimen (resubstituting rate of 416 assignment). Then, a degree of over-fitting is unavoidable and leads to an overestimate of the effectiveness 417 of the DA. To overcome this problem, one solution is to use a cross-validation or jackknife procedure 418 [30,44]. Jackknife procedure leaves one specimen at a time not used for constructing the Discriminant 419 function and then tests the rate of correct specimen assignment. Only jackknife cross-validated classification 420 tables provide reliable information on groups. Results of DA in group assignment are shown in Figure 4 for 421 3, 7 and 12 DPI and detailed for all DPIs in Supplementary Table 3. As expected, resubstitution rates of 422 assignment (Figure 4A, D, G) were higher than jackknifed counterparts ( Figure 4B, E, H), but the latter 423 reached high levels of accuracy (≥90 %) from 6 DPI and beyond (Supplementary Table 3). To test whether 424 this level of accuracy was indeed good, these results were compared with classification/misclassification 425 tables completed by human observers. The entire image dataset of 7 DPI was given to three expert 426 researchers working with Arabidopsis (one of the authors (S. Asurmendi) and two other researchers from 427 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 another Institution). They were all blind to the assignment of treatments to each plant, except for one Mock-428 inoculated and one ORMV-infected plant that were given as references. Wireframe graphs for 3, 7 and 12 DPI ( Figure 4C, F, I) show the difference from Mock group to ORMV 433 group. There is little difference at 3 DPI, if any ( Figure 4C), consistently with nonsignificant differences 434 found by DA at this stage. At 7 DPI (Figure 4F), the relative shortening of leaf #11 (landmarks 4 and 9) is 435 evident, as is the relative increase in the angle between leaves #9 and #10. These tendencies persisted at 12 436 DPI ( Figure 4I). At this stage, petioles of leaves #11 and #12 are strongly relatively shortened. These results 437 resemble those obtained in Figure 3C-F and approximately summarize shape changes explained by the first 4 438 PCs, indicating that these shape differences not only separated juveniles from adults but are also hallmarks of 439 shape change induced by ORMV. These results are interesting because discriminant axes not necessarily 440 resemble PCA axes [44]. 441

Allometric patterns and size correction 442
As ORMV induced not only changes in shape, but also in size (Supplementary Table 2) it is worth 443 investigating whether shape changes are associated to size differences. In principle, group differences could 444 arise if individuals of one group are different in shape because they grew faster than the other group´s 445 individuals and reached earlier a more advanced developmental stage. The association between a size 446 variable and the corresponding shape variables is called allometry. Isometry, by contrast, is the condition 447 where size and shape are independent of each other and usually serves as the hypothesis null. These concepts 448 are rooted in the Gould-Mosimann school of allometry that conceptually separates size and shape [62]. 449 Though size had been removed from forms after GPA, leaving shape differences free of it, there could be a 450 linear relationship between them. Allometry can be statistically tested for by tests of multiple correlation. 451 When groups are present, a single regression line through all groups cannot be fit to test allometry because 452 lines could have group-specific slopes or intercepts [30]. As TPSRegr (see below) uses raw data coordinates 453 and averaged by ID DPI datasets in MorpohoJ do not have them, these analyses were carried on with 454 individual datasets from only the 1 st Digitization. As proven earlier, differences between digitizations were 455 negligible (Figure 2 and  regressions (which correspond to allometric variation of shape) are shown in Figure 5A. Allometry 460   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 accounted for moderate to high proportions of the total shape variation since SS reached values of 36% at 6 461 DPI (Mock). ORMV induced a reduction in the allometric component of shape variation as evidenced by 462 lower predicted SSs along the experiment and non-significant values of allometry for all except 4 and 5 463 DPIs. For both treatments and particularly for healthy controls, a bell-shaped curve is detected and a 464 maximum of allometry is seen at 6 DPI for Mock plants but a day before for ORMV. Differences between 465 treatments start sharply at 5 DPI, when allometry accounts for 32% of predicted SS for Mock but only 20% 466 for ORMV. This analysis shows that for ORMV, shape variation is much less driven by size heterogeneity 467 (at a given DPI) and that for Mock plants this situation (isometry) occurs at later stages of development (10-468 12 DPI). 469 When at least one group has regression slopes different from zero a series of tests could be done in order to 470 control for size and repeat analyses to assess whether differences in shape are actually the result of size 471 variation only [29,30,44,62]. TPSRegr (v. 1.41) was used firstly to determine whether group-specific slopes 472 were parallel at each DPI (3-8). Only at 3 and 4 DPI this occurred (p > 0.05, slopes not statistically 473 significant). As slopes were found to be parallel, it is possible to test whether they are separate parallel slopes 474 or coincident (same Y-intercept). TPSRegr tests demonstrated that slopes are coincident (p > 0.05). Then, 475 size-corrections could only be done for 3 and 4 DPI, since from 5 to 8 DPI slopes were different (p < 0.05) 476 and groups follow its own allometric pattern and for 10 and 12 DPI there are isometry and size do not  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  However, as pointed out by [63], no one method of disparity measurement is sufficient for all purposes. 520 Using a combination of techniques should allow a clearer picture of disparity to emerge. With this aim, 521 another available approach to compare shape trajectories through multivariate morphospace was used. 522 Originally developed to study unequal morphological diversification in a clade of South American fishes 523 [64], this approach is useful because allows investigating whether a group "explores" different amount of 524 morphospace than others, additionally to possible differences in magnitude of phenotypic evolution. 525 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 Moreover, density parameters (D) could be calculated to determine whether the amount of morphological 526 change is more or less constrained in the morphospace. 527 The method was adapted to the present case study: as there is not a phylomorphospace and both treatments 528 lack a "common ancestor" but each plant follow its own independent ontogenetic path, nodes and branches 529 do not exist. Rather, each plant possesses its own trajectory without points in common. Taken 3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 calculations were not statistically significantly different (D2(Mock) = 111.47 vs. D2(ORMV) = 146.34, p = 559 0.25051, Mann-Whitney test), although ORMV-infected plants had a higher average density. These 560 differences in density measures could arise from the fact that hypervolume calculations can produce values 561 that are extremely small and variable. Since the hypervolume is calculated by taking the product of univariate 562 variances, any axis or axes with negligible variance will produce a value of hypervolume close to zero. 563 Moreover, all multiplied variances are given the same weight and consequently, PC axes that represent a 564 minimal percentage of the total variance could distort conclusions obtained with more informative axes. 565 Thus, hypervolume can be very sensitive to variation in a single character. To avoid this issue, only the axes 566 with significant variances are chosen to represent the disparity among points in morphospace [63]. Therefore, 567 the analysis was repeated including only the first three PCs, which accounted for more than 95% of variance. 568 Results were similar to previously obtained for all parameters but hyperellipse´s volumes were found to be Mann-Whitney test) ( Table 2). 572 Taken together, PTA showed that Mock-inoculated and ORMV-infected plants follow separate paths through 573 morphospace. They differ in length, direction and shape (Figure 6), but also explore distinct regions of 574 morphospace in a disparate quantity. Control plants experience more diversification of shape, as evidenced 575 by the comparative length of trajectories (MD and MPL), have a higher amount of difference between shape 576 states through the experiment in morphospace (∑Var) and explore more ample regions of morphospace (D1, 577 D2) ( Table 2). On the whole, ORMV infection not only alters the direction of ontogenetic shape development 578 but also diminishes shape change. 579

Growth and Development modeling 580
Even after finding that Mock-and ORMV-infected plants follow different ontogenetic trajectories of 581 shape it is possible to compare their rates and timings of growth and development. When groups have 582 different ontogenetic trajectories of shape, it is necessary to use a formalism that can be used when 583 treatments follow group-specific ontogenetic trajectories [44]. One such possibility is to compare the rates 584 and timings at which groups depart from their own juvenile forms [65], an approach that can be applied to 585 compare growth [66] and development [67] rates between groups with different ontogenetic trajectories. To 586 linearize the relationship between size and age, ln(CS) was regressed on ln(DPI) and growth rates were 587 compared. Results showed higher growth rate for Mock (1.06, CI95%=1.00-1.12) than for ORMV (0.72 588 CI95%=0.64-0.79) (p(same slope)= 3,0309E-12) ( Figure 7A). Lack-of -fit was assessed for both regressions and 589 rejected (p= 0.9975 and p= 0.3144 respectively), thus indicating the goodness of fit for both linear 590 regressions. To compare developmental rates, it was measured the rate at which shape progressively 591 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 differentiates away from that of the youngest age class (3 DPI) from 4 to 12 DPI. The degree of 592 differentiation is measured by the morphometric distance between each individual and the average of the 593 youngest age class [67], using Euclidean distances as approximations of Procrustes distances (Supplementary 594 Figure 1 and Supplementary Table 1). Linear regressions with Euclidean distances (D) as a dependent 595 variable and ln(DPI) as a regressor indicated a higher developmental rate for Mock-(0.34, CI95%=0.32-0.36) 596 relative to ORMV-infected plants (0.24, CI95%=0.22-0.26) (p(same slope)= 5,4657E-13) ( Figure 7B). Lack-of -fit 597 was rejected (p= 0.1626 and p= 0.3278 respectively). Healthy controls depart more from its own juvenile 598 shape from 8 DPI and beyond ( Figure 7C), indicating that developmental change was relatively impaired by 599 ORMV. Together, these results indicated that ORMV reduced both growth and morphological change. Watson panel test (Appendix S1). When the residuals are autocorrelated, it means that the current value is 611 dependent of the previous (historic) values and that there is a definite unexplained pattern in the Y variable 612 (Euclidean distance in this case study) that shows up in the disturbances. As a basic assumption of these 613 analyses is the independence of residuals and particularly, their absence of autocorrelation, neither analyzed 614 model fitted the development accurately. The same problem was found when the five growth models were 615 applied to study growth, regardless choosing CS or ln (CS) as the dependent variable and DPI or ln(DPI) as 616 the regressor (data not shown). An explanation for the autocorrelations of residuals is that data from 617 successive DPIs (intra-group analyses) are not independent: every specimen is recorded at each DPI. This 618 leads to a multivariate longitudinal data analysis situation, a branch of statistical analysis that has been 619 recently addressed following different approaches [68], and whose level of complexity is beyond the scope 620 of this work. 621 However, intra-treatment paired comparisons of shape are possible using a paired Hotelling´s test (a 622 multivariate analog of the paired t-test). It was found a strong effect of time on shape and differences are 623 extremely statistically significant for Mock plants (Supplementary Table 4).  624   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 Basic rosette growth and development parameters were studied (Figure 7). Rosette Growth, 625 Absolute Growth Rate (AGR) and Relative Growth Rate (RGR) had been proposed as measures of rosette 626 expansion and its velocity and rate, respectively [2,4]. Mock-inoculated plants were statistically significantly 627 bigger from 6 DPI and beyond ( Figure 7D). The graphic lacks the typical sigmoidal shape of growth curves, 628 probably because early stages of development (when landmarks used in this work were not present yet) were 629 not included in the analysis and plant growth had not reached its plateau phase at 12 DPI yet. AGR and RGR 630 analyses revealed an early change in growth tendencies between treatments. As early as between 3 and 4 631 DPI, (two days before the detection of significant differences in rosette area) ORMV started to slow rosette 632 growth relative to healthy controls ( Figure 7E-F). AGR graphic ( Figure 7E)  before the mean CS is found to be statistically significantly different from healthy controls, and that the 645 acceleration of growth, which is characteristic from several growth models, is impaired by the virus. 646 A similar approach was followed to investigate developmental differences between groups. Mean 647 developmental rates were calculated between consecutive DPIs ( Figure 7H). Mock -inoculated plants 648 showed a higher mean developmental rate from 5 DPI to the end of the experiment. The Mean 649 Developmental Acceleration ( Figure 7I) showed a more complex pattern: Mock-inoculated plants peaked at 650 5 DPI and after that, a deceleration of development was detected until 12 DPI. In infected plants, though, 651 acceleration reached a maximum at 6 DPI and then sharply decreased towards more negative values than 652 control plants, indicating a relative stagnation in morphological change. At 12 DPI ORMV induced a less 653 negative value of Mean Acceleration of development than controls, although its velocity remained lower 654 (( Figure 7H-I). 655 Taken together, these results show that ORMV impacts both growth and development very early after 656 infection. Whereas a direct measure (CS) detected differences between treatments at 6 DPI, more elaborated 657 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 parameters (AGR, RGR and Mean Developmental Acceleration) allowed differences to be detected as soon 658 as 4 DPI. Growth and developmental patterns differed between treatments in a dissimilar manner: AGR 659 showed differences in growth velocities at 4 DPI, whilst Mean Developmental Rate was clearly different 660 later on. Acceleration graphics ( Figure 7G,I) indicated that ORMV has an early effect in decelerating both 661 growth and development, but the latter was more dramatically affected in comparison with relative growth 662 deceleration whose decrease was more or less stepwise. Mock-inoculated plants peaked developmental 663 acceleration at 5 DPI and growth acceleration the following day. ORMV-infected plants peaked 664 developmental acceleration at 6 DPI but lost the subsequent growth acceleration phase ( Figure 7G,I). This 665 and other comparisons indicate that ORMV does not just induce delayed growth or morphological change 666 patterns, but a more radical change in the coordination of both parameters. 667

Comparison with TuMV infections 668
As stated earlier, one goal of applying the GM approach to the study of Arabidopsis is to make 669 phenotypic comparisons in a more objective and repeatable manner. To this end, the same experimental setup 670 was applied to the study of viral infections of A. thaliana with TuMV, an ssRNA+ virus unrelated to ORMV 671 (http://viralzone.expasy.org/). The experiment spanned from 4 to 10 DPI since at 12 DPI excessive curling of 672 some leaves induced by TuMV impaired the correct assignment of landmarks (Supplementary file TuMV  673 1st.morphoj). Individual datasets were created for each DPI and Procrustes Coordinates extracted. A 674 combined dataset was created and PCA carried on. After outliers exclusion, 27 Mock and 14 TuMV-675 inoculated plants remained. PCA revealed that PC1 accounted for 49.2% of total variance (much less than 676 the ORMV experiment accounted for) and PC1 plus PC2 accounted for 69.3% of total variance. Again, PC1 677 mostly separates juveniles from adult rosettes and negative values related predominantly to infected plants 678 which retained a more immature phenotype ( Figure 8A). It was supported by the associated wireframe graph 679 which depicts a relative shortening of leaves #11 and #12, similarly to ORMV-infected plants ( Figure 3C). 680 PC2 was strongly positively related to infected plants and, similarly to the ORMV case ( Figure 3D showed that, similarly as observed with ORMV, group means were statistically significantly different 684 starting from 5 DPI. Wireframe graphs also evidenced a strong relative shortening of the petioles, similarly 685 to that had been found under ORMV infections ( Figure 4F,I), indicating that more compact rosettes are a 686 common outcome of these viral infections. Discriminant power was slightly higher for almost all DPIs in the 687 case of TuMV (Supplementary Table 5, Supplementary Table 3). Moreover, Procrustes Distances were 688 higher for every DPI in the case of TuMV, which induced a Procrustes separation at 8 DPI only matched at 689 12 DPI for ORMV-infected plants (Supplementary Table 5, Table 1). These results suggest that TuMV is a 690 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 more severe virus than ORMV is in Arabidopsis, since it induces a more pronounced departure from Mock 691 mean shape. 692 PTA supported this evidence: A subset of 4-10 DPI datasets were selected to compare ORMV with TuMV 693 infections (Table 2, Figure 9A-B). Whilst trajectory size difference (MDMock,TuMV) was similar to the obtained 694 with ORMV, the multivariate angle (θMock,TuMV) that separates infected form healthy trajectories more than 695 doubled that of ORMV. Shape differences (DShapeMock, TuMV) between trajectories almost doubled. The 696 majority of the other measures indicated a slower rate of shape change relative to Mock plants, similarly to 697 ORMV infection, but relatively less marked (Table 2). To visualize and compare shape changes, 698 transformation grids with Jacobian expansion factors and lollipops were done in PAST for 10 DPI plants 699 ( Figure 9C-D). Both viruses induced relative contraction of the rosette around leaf #11 (the most affected), 700 but TuMV induced more severe deformations. To confirm these results and to test for the reproducibility of 701 the analysis, an independent experiment of TuMV infection was executed (Supplementary file TuMV  702 2nd.morphoj). PTA analyses were run and trajectory attributes compared (Table 2). There were obtained 703 very similar results relative to the first TuMV experiment. 704 Together, these results indicated that both TuMV and ORMV induced relative developmental arrest as well 705 as shape change, but symptoms triggered by ORMV are mainly driven by developmental arrest whereas 706 TuMV also promotes shape change in a relatively higher extent, thus impacting more strongly on overall 707 shape. 708 709