Differences in post-translational modifications of proteins in milk from early and mid-lactation dairy cows as studied using total ion chromatograms from LC-ESI/MS

Liquid chromatography top-down mass-spectrometry combined with an in-house library with 5180 entries of known milk protein masses was used to identify common and rare isoforms of major milk proteins from individual cow's milk samples from earlyand mid-lactating (EL and ML, respectively) cows. The milk samples were from 24 Danish Holstein-Friesians, with 12 cows each in EL and ML. A Latin square design with four periods was used to obtain one composite milk sample from each cow per period. The deconvoluted masses were used for identifying common and rare isoforms of the major milk proteins (aS1-CN, aS2-CN, b-CN, k-CN, a-LA, and b-LG). aS1-CN splice variants were found in all samples, aSand k-CN isoforms differed between EL and ML. k-CN post-translational modifications were found to relate to k-CN genotype. This approach could identify 66 isoforms of major milk proteins, where 15 were of the aS-CNs and 39 from k-CN. © 2021 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).


Introduction
Bovine milk is a complex mixture of minor and major proteins, even when focusing on the four caseins (a S1 -CN, a S2 -CN, b-CN and k-CN) and two major whey proteins (a-LA and b-LG) alone, a wide range of isoforms are present. One part of the complexity is caused by the different genetic variants of these proteins (Farrell et al., 2004), while another is related to different combinations of posttranslational modifications (PTMs). Within the Bos genus, a total of 49 common or rare genetic variants of the major milk proteins have been identified (8 for a S1 -CN, 4 for a S2 -CN, 12 for b-CN, 11 for k-CN, 11 for a-LA and 3 for b-LG) (Farrell et al., 2004). PTMs are added to these proteins during biosynthesis, including disulphide bond formation, phosphorylations and glycosylations, which are all of known importance for milk protein functionalities (Bijl, Holland, & Boland, 2020;Le, Deeth, & Larsen, 2017). Natural casein glycosylations are found as O-glycosylation on k-CN, and five different structural glycosylation forms, denoted a, b, c, d and e are commonly found in bovine milk (Saito & Itoh, 1992). These include: a ¼ GalNAc; b ¼ Galb(1e3)GalNAc; c ¼ NeuAca(2e3)Galb(1e3) GalNAc; d ¼ Galb(1e3)[NeuAca(2e6)]GalNAc; e ¼ NeuAca(2e3) Galb(1e3)[NeuAca(2e6)]GalNAc.
In relation to phosphorylation of Ser/Thr residues in the milk proteins, the casein kinase motif has been identified as Ser/Thr-X-Glu/Asp/SerP, where X can be any amino acid (Mercier, 1981). Genetic polymorphisms may influence the PTMs attached, as exemplified by the a S2 -CN C and D variants (Farrell et al., 2004). The a S2 -CN C variant has a Thr residue at residue 47, which grants the C variant an additional potential site for phosphorylation matching this required motif. In the a S2 -CN D variant, residues 51e59 have been deleted comprising three of the known sites of phosphorylation (Ser56, Ser57 and Ser58). For a S1 -CN, it has been shown that different genetic marker regions are linked to the expression of the a S1 -CN 8P and a S1 -CN 9P isoforms, possibly reflecting the involvement of different CN kinases (Bijl, van Valenberg, Huppertz, van Hooijdonk, & Bovenhuis 2014b).
Even though genetic factors are crucial governing milk protein heterogeneity, environmental and physiological factors, like lactation stage and dry period, have also been linked to glycosylation type and glycosylation degree (GD) of k-CN (Bonfatti, Chiarot, & Carnier, 2014;Maciel et al., 2017;Poulsen, Giagnoni, Johansen, Lund, & Larsen, 2021;Poulsen, Jensen, & Larsen, 2016). GD of k-CN has been associated with CN micelle size (Bijl, de Vries, van Valenberg, Huppertz, & van Hooijdonk, 2014a) and rennetinduced coagulation properties (Poulsen et al., 2016). Furthermore, the phosphorylation degrees (PD) of both a S1 -and a S2 -CN have been shown to be affected by lactation stage (Fang et al., 2017). Using LC-MS, Fang et al. (2017) demonstrated the presence of as many as six phosphorylation isoforms of a S2 -CN (9e15P) along with the two major phosphorylation isoforms of a S1 -CN (8 and 9P). The relative abundance of a S1 -CN 9P and of a S2 -CN 13 and 14P was further found to increase over lactation, with the corresponding relative abundances of a S1 -CN 8P and of a S2 -CN 10 and 11P to decrease, thereby significantly affecting their PDs.
A common approach for the detailed analysis of isoforms of the major milk proteins and how they may vary is to use LC coupled with either a UV (normally at 214 nm) or a mass spectrometer (MS), or both (Jensen et al., 2012;Nilsson et al., 2020). This allows for the detection of the UV and/or the total ion chromatogram (TIC) data from top-down MS data. Quantification based on UV is widely used, but has its limitations, as it is not always possible to chromatographically separate PTM isoforms and genetic variants, like, e.g., the a S2 -CN phosphorylation isoforms, complex glycosylation patterns of k-CN or even rare splice variants, which have recently been identified and quantified using the TIC by top-down MS using a reverse phase HPLC coupled to a ESI-TOF (Miranda, Bianchi, Krupova, Trossat, & Martin, 2020).
The aim of the present study was to build on this approach and investigate presence and amount of isoforms of the six major proteins in individual cow's milk based on the TICs from liquid chromatography electrospray mass spectrometry (LC-ESI/MS) in relation to lactation stage (EL and ML).

Milk samples
In total 24 Danish Holstein cows, 6 primiparous and 6 multiparous in EL, and 12 multiparous in ML, were subjected to an experimental feeding trial in a 6 Â 4 incomplete Latin square design with 4 periods (Giagnoni, Lund, Sehested, & Johansen, 2021). At the beginning of the experiment, EL cows were 23.3 ± 6.7 (mean ± SD) days in milk (DIM) and ML cows were 176 ± 15 DIM. In the current study, cows are only assigned to lactation group and MS data are not explored relative to feeding, which was explored by Poulsen et al. (2021). The cows were milked twice daily (at 06:00 and 17:00 h) in a herringbone milking parlour, where milk yield was recorded at each individual milking. By the end of each period, an afternoon and a morning milk sample were collected from each cow and, immediately after morning milking, a representative sample was prepared by mixing the 2 milk samples proportional to milk yield. These composited samples were used for further milk analyses giving a total of 96 individual cow's samples. After mixing the morning and afternoon samples, skim milk samples were prepared by removal of fat after centrifugation at 2643Âg for 30 min at 4 C. Skim milk samples were frozen for later analysis (À80 C).

Protein compositional analysis by LC-ESI/MS
As previously described by Jensen et al. (2012), the protein composition of the skim milk samples was determined using reversed phase LC-ESI/MS Single Q. Proteins were separated using an HPLC 1100 system (Agilent Technologies, Santa Clara, CA) with a Jupiter C4 column (250 mm Â 2 mm, 5 mm particle size, 300 Å pores; Phenomenex, Torrance, CA) operated at 40 C with a G1315A diode-array detector for UV detection at 214 nm coupled to a mass selective detector for identification and relative quantification of the milk proteins. The LC gradient used was a linear gradient from 33% B (B ¼ 0.05% trifluoroacetic acid (TFA) in acetonitrile (ACN), A ¼ 0.05% TFA in LCMS grade water) to 50% B over 25 min.
A database was constructed using amino acid sequences representing a S1 -CN, a S2 -CN, b-CN, k-CN, as well as a-LA and b-LG, as downloaded from Uniprot.org (accessed on 27-04-2020) and modified to match and represent the different genetic variants previously identified in Danish Holstein (Jensen et al., 2012). Postprocessing of imported sequences included removal of signal peptides, representation of single point mutations (Farrell et al., 2004) and inclusion of pyroGlu as the N-terminal of k-CN. As all samples were reduced with DTE prior to LC-ESI/MS, cysteine bridges were not included, and all masses were based on reduced forms. The database contained the known possible phosphorylation variants; a S1 -CN (7e10P), a S2 -CN (9e15P), b-CN (4e6P), k-CN (0e3P), and a-LA (0e1P) (Fang et al., 2017;Farrell et al., 2004;Miranda et al., 2020). For k-CN, up to 3 glycosylations were allowed, representing possible combinations of the five reported glycosylation forms: a, b, c, d, and e (Saito & Itoh, 1992). The masses of the c and d glycoforms are identical, therefore they are referred to as c/ d glycoforms. Since lactosylations on b-LG have been reported in milk (Miranda et al., 2020), one lactosylation (þ324 Da each) for b- LG was included in the database. In addition, earlier reported fragments of plasmin cleavage (a S1 -CN 1P fragment 80e199) or splice variants (a S1 -CN 8P eexon 12, a S1 -CN 9P eexon 12, a S1 -CN 8P eQ59 or Q78 and a S1 -CN 9P eQ59 or Q78) of a S1 -CN were included as well (Gaiaschi et al., 2000;Martin, Bianchi, Cebo, & Miranda, 2013a;Martin, Cebo, & Miranda, 2013b). In total, 518 different masses were included in the database. To allow for common method-induced modifications, the database was further expanded to include loss of up to three water molecules, deamidations or sodium adducts per mass. The database was thus expanded to 5180 entries.
MassHunter 10 Bioconfirm software (Agilent) was used to generate the deconvoluted masses from the TICs of the recorded data (a representative TIC is shown in Supplementary material Fig. S1). The default data processing method for intact reduced proteins was used with the following modifications: charge stages allowed from 7 to 40, deconvolution mass range 10,000 to 35,000 Da, mass step 0.05 Da, 50% of peak height was selected for calculation of average mass, and mass tolerance ±1.5 Da. Results were exported to Excel, parts per million (ppm) error and abundance of the deconvoluted masses of identified biomolecules relative to total area of each sample were calculated. The milk samples were genotyped based on deconvoluted masses of observed protein variants using the constructed database. To make isoforms comparable across homozygotes and heterozygotes, the relative abundances of homozygotes were divided by two.

Statistical analysis
To investigate if differences in relative abundance were significant between EL and ML for a specific isoform, the following model was applied in R.
here Y ijkl is the dependent variable, m is the overall mean, L is the fixed effect of lactation stage (i ¼ EL, ML), P is the fixed effect of period (j ¼ 1 to 4), A was the fixed effect of parity (k ¼ primiparous, multiparous), C is the random effect of the cow (l ¼ 1 to 27, as 3 cows were substituted during the experiment) and e ijkl is the random residual error. Only isoforms with at least five observations in both EL and ML were tested. To test for the effect of genotype for some isoforms, the model was expanded to include genotypes.
(2) here G is the fixed effect of genotype (m ¼ 1 to 3). In all cases when genotypes were compared, the former division of relative abundances by two for homozygotes was first reversed to make abundances comparable.

Results
After assessment of the data, six samples were removed, resulting in 90 samples being used for further analysis. Across the 90 skimmed milk samples, the obtained TIC from the LC-ESI/MS revealed a total of 170 unique isoforms. These represented genetic variants, splice variants, natural PTMs, like phosphorylation and glycosylation, as well as losses of water, deamidation and sodium adducts. In total, 1870 identifications could be assigned to these isoforms across all milk samples. Of these, 195 identifications e assigned as method-induced modifications (loss of water, deamidations and sodium adducts) e were discarded due to very low relative abundances. Thus, a total of 1675 identifications, corresponding to 66 isoforms of the six major milk proteins were kept and included in the analysis. An overview of all isoforms and their distribution between EL and ML can be found in Supplementary material Table S1.
The whole palette of these unique 66 isoforms identified, representing genetic variants and their natural PTMs (glycosylated and phosphorylated isoforms), splice-and cleavage-variants of a S1 -CN, as well as lactosylations of b-LG, as shown in Fig. 1. Furthermore, the range of chromatographic RT covering these are shown, including not only expected variations in RT between isoforms, but also some variations within isoforms. Overall, the first protein to elute was k-CN, with its various glycosylation and phosphorylation isoforms (4e10 min). In total, 39 isoforms were assigned to k-CN. The first k-CN molecules to elute are the heavily glycosylated k-CN A and B isoforms with up to three O-glycans attached. Of these, based on the mass identifications, the first isoforms to elute are rich in mainly e glycoforms and some c/d glycoforms. These are followed by k-CN isoforms with glycosylations of the a and b type. Then, the non-glycosylated k-CN A with 1 or 2P elutes in the same time range as k-CN B with one or two glycans, one is the e or c/d, glycotypes, while the other can be of the a/b glycotypes with 0, 1 and 2P. These were followed by k-CN B with one glycosylation, then k-CN A without any PTMs, k-CN B 0 or 1P with 1e3 glycoforms of the a, b and c/d type and then by non-glycosylated k-CN B with 1 or 2P. Next is a S2 -CN (10e12 min), with all a S2 -CN isoforms representing genetic variant A, resulting in six isoforms assigned to a S2 -CN, overall, in the order of lower to higher number of phosphorylations. This is followed by a S1 -CN isoforms (13e16 min), all representing a S1 -CN B, including some splice variants and a plasmin cleavage product. The first a S1 -CN molecule to elute was the a S1 -CN B 1P plasmin cleavage product 80e199, then a S1 -CN B 7 and 8P, followed by the splice variants a S1 -CN eQ59 or Q78, then a S1 -CN B 9P, followed by splice variants a S1 -CN B 8P eexon12 and a S1 -CN B 9P eexon 12, and lastly a S1 -CN B 10P. Thus, nine isoforms were assigned to a S1 -CN.
The last CN to elute was b-CN (17e19 min), the genetic variants of which eluted in the order: b-CN B, A 1 , A 2 and then I. b-CN was found as the major 5P isoform and a minor 4P isoform. The 4P isoform was found only for b-CN A 1 and A 2 , where they eluted before their 5P isoforms. In total, six isoforms were assigned to b-CN. The CNs were followed by b-LG and a-LA; b-LG eluted first (19e22 min), with the b-LG genetic variant B before A. In addition, b-LG with addition of one lactose (þ324 Da) was identified for both variants; in both cases the lactosylated isoform eluted prior to the non-lactosylated form, resulting in a total of four isoforms being assigned to b-LG. The final protein to elute (21e22 min) was a-LA variant B, where a low abundant a-LA 1P isoform eluted after a-LA B.
3.1. Presence and abundance of a S1 -CN isoforms and effect of lactation stage Across the 90 milk samples, 438 observations were assigned to the nine isoforms of a S1 -CN, of these 207 represented the four different phosphorylation isoforms (7e10 P). The relative abundances and percentage of samples with an observed isoform within each lactation group are displayed in Fig. 2. The major phosphorylation isoforms a S1 -CN B 8P and 9P were found in all samples. In addition, the low abundant a S1 -CN B 7P and 10P isoforms were observed, though with a lower frequency (27.7% and 2.2%, respectively). When comparing the relative abundances between lactation groups, a S1 -CN B 8P (P < 0.05) were significantly more abundant in EL, a S1 -CN B 10P was found only in ML. Apart from the four phosphorylation isoforms of a S1 -CN B, five additional masses, matching masses of a S1 -CN B splice-or plasmin cleavage-products were found. These splice or cleavage products were observed 231 times in the dataset. The identified plasmin cleavage product was a fragment of a S1 -CN, corresponding to a loss of the first 79 amino acids, i.e., a S1 -CN B 1P fragment (80e199) with a mass of 14107.95 Da. The splice products were a S1 -CN B of 8 and 9P isoforms, where exon 12 was missing, i.e., a S1 -CN B 8P eexon 12 with a mass of 21876.73 Da and a S1 -CN B 9P eexon 12 with mass of 21956.72 Da. Furthermore, a S1 -CN B 8P and 9P isoforms, where a single glutamine (Q or Gln) residue at 59 or 78 are missing, i.e. a S1 -CN B 8P eQ59 or Q78, with mass of 23486.76 Da or a S1 -CN B 9P eQ59 or Q78 with a mass of 23566.75 Da were also observed. The relative abundances of these splice variants were relatively low. The splice products with the highest relative abundances were those with a missing Gln at positions 59 or 78. In addition, splice products representing a S1 -CN B 8P were more abundant and frequent than those representing a S1 -CN B 9P. The most abundant splice product was a S1 -CN B 8P eQ59 or Q78, which was slightly more abundant than a S1 -CN B 7P, while the lowest abundant splice product was a S1 -CN B 9P eexon 12. Statistical analysis of these products revealed that a S1 -CN B 8P eexon 12 was significantly more abundant in EL than ML (P < 0.001). Further analysis of the splice-and cleavage-products (results not shown) tracked these to all but one sample. Thus, these splice-and cleavage-products were expressed alongside an otherwise intact a S1 -CN B protein profile and each sample typically contained two of these products.

Presence and abundance of a S2 -CN isoforms and effect of lactation stage
In total, 312 of the observed masses were assigned to either of the six a S2 -CN isoforms across the dataset, comprising a S2 -CN variant A 10e15P (Fig. 3). The most frequent of the a S2 -CN A isoforms were those with 11 and 12P, which were also the isoforms of highest abundance. However, also the 13P form, and, to some extend the 10P form, were both present in relatively high abundance, and were identified in more than half of the samples. The 14 and 15P isoforms were present, but in relatively low abundance, as well as with a lower frequency (14P EL ¼ 6.98%, 14P ML ¼ 25.53%, 15P EL ¼ 2.33% and 15P ML ¼ 2.13%). Only a S2 -CN A 11P and a S2 -CN A 12P had enough observations to be considered for statistical analysis, these were both significantly more abundant in EL (P 11P < 0.01 and P 12P < 0.05). In terms of frequency, it seems that a S2 -CN's with higher P counts are more frequent in ML while the lower P counts are more frequent in EL.

Presence and abundance of b-CN isoforms and effect of lactation stage
A total of 134 observations was assigned to the six b-CN isoforms identified, representing four different genetic variants (A 1 , A 2 , B and I). These were found as six different genotypes in the dataset; A 2 A 2 , A 1 A 2 , A 2 I, A 1 I, II and A 2 B. The b-CN B was found only in the ML group, and, in general, the genetic variants were not balanced between the EL and ML groups, which is clearly reflected in their frequencies. The major b-CN 5P isoform was found in all samples and for all genetic variants, whereas the low abundant 4P isoform was found only in the ML group for the A 1 and A 2 variants only (Fig. 4). Due to the number of observations, only the b-CN A 2 5P isoform qualified for Relative abundances and frequency of observations for all isoforms of a S1 -CN within EL or ML. The primary y-axis shows the relative abundance in percentage of total protein identified (bars with the first and third quartile as whiskers), while the secondary y-axis indicates observations of each specific isoform (dots) for EL and ML as a frequency of all available samples within either EL and ML. Light grey represents EL while dark grey represents ML. Asterisks denote the level of significance in abundance between lactation groups: *, P 0.05; ***, P 0.001. statistical analysis, and here no significant difference was observed between EL and ML.

Presence and abundance of k-CN isoforms and effect of lactation stage
A total of 512 observations were assigned to 39 different k-CN isoforms, including both k-CN A (19 isoforms) and k-CN B (20 isoforms), and were represented as the genotypes k-CN AA, k-CN BB and k-CN AB. This by far makes k-CN the protein with the largest isoform heterogeneity, as expected. Each genetic variant is found in isoforms with up to two phosphorylations and up to three glycosylations using the previously described glycoforms (a, b, c/d and e).
Furthermore, among the k-CN isoform molecules, only the A variant was found in the unmodified form. For k-CN AA milk, 4.15 isoforms were, on average, observed per sample, whereas 5.70 Fig. 4. Relative abundances and frequency observations for all isoforms of b-CN within either EL or ML. The primary y-axis shows the relative abundance in percentage of total protein identified (bars with the first and third quartile as whiskers), while the secondary y-axis indicates observations of each specific isoform (dots) for EL and ML as a frequency of all available samples within either EL and ML. Light grey represents EL while dark grey represents ML. isoforms on average were detected per k-CN BB milk sample and 6.26 isoforms for k-CN AB milk samples. The most frequent PTM on k-CN is the 1P modification, followed by 2P, both without glycosylations. Among the glycoforms, the e glycoform was confirmed to be most frequent, while the most frequent isoform with P and G was 1P 2G (e, e), as observed for both k-CN A and B genetic variants (Fig. 5). Among the many isoforms of k-CNs, not many of the isoforms had enough observations to qualify for statistical analysis.
The isoforms that qualified were; k-CN A 1P, k-CN A 2P, k-CN A 1P 2G (e, e), k-CN B 1P, k-CN B 2P, k-CN 1P 1G (e), k-CN B 1P 2G (c/d, e), Fig. 5. Relative abundances and frequency observations for all isoforms of k-CN within either EL or ML. The primary y-axis shows the relative abundance in percentage of total protein identified (bars with the first and third quartile as whiskers), while the secondary y-axis indicates observations of each specific isoform (dots) for EL and ML as a frequency of all available samples within either EL and ML. Light grey represents EL while dark grey represents ML. An asterisk denotes the level of significance in abundance between lactation groups: *, P 0.05. To explore the presence of phosphorylations and glycosylations across k-CNs genotypes (AA, AB and BB), the former division by two for the homozygotes are reversed to represent actual conditions in milk. In addition, the isoforms were merged into groups representing the different combinations of P and G (Fig. 6). From this it Fig. 7. Relative abundances and frequency observations for all isoforms of a-LA within either EL or ML. The primary y-axis shows the relative abundance in percentage of total protein identified (bars with the first and third quartile as whiskers), while the secondary y-axis indicates observations of each specific isoform (dots) for EL and ML as a frequency of all available samples within either EL and ML. Light grey represents BB, medium grey represents AA while dark grey represents AB. Asterisks the level of significance in abundance between lactation groups: **, P 0.01. became clear that the PTM profiles of k-CN were different between genotypes. Among these groups, only k-CN 1P, k-CN 1P 1G, k-CN 1P 2G, k-CN 1P 3G and k-CN 2P had enough observations for statistical analysis. The relative abundances of the total sum of glycosylations across all groupings were strongly affected by genotype (P < 0.001), while the comparison of total k-CN among genotypes was insignificant. Comparing genotypes for groups of k-CN with 1G, 2G and 3G, respectively, were all significant (P 1G < 0.05, P 2G < 0.001 and P 3G < 0.01). Combined this indicates that the total abundance of k-CN is similar between genotypes, the difference between genotypes is to be found in the abundance of the glycosylated k-CN, among these glycosylated k-CNs, the 2G group seems to vary the most between genotypes. To explore if these differences in glycosylation were linked to any particular combination of phosphorylation and glycosylation, groupings of P and G content were generated (Fig. 6). It was found that genotype was of significance for all the glycosylated groups of k-CN with 1P (P 1P 1G < 0.05, P 1P 2G < 0.001 and P 1P 3G < 0.05). It is thus clear that differences between k-CN genotypes are to be found in the pool of glycosylated isoforms, specifically those with 1P and 2G. Interestingly, for all glycosylated 1P groups, it seems as if the BB genotype is the most frequent and abundant, followed by AB and AA. The only exception is the abundance of 1P 1G, where the order is BB > AA > AB (Fig. 6).

Presence and abundance of major whey protein isoforms and effect of lactation stage
For a-LA, a total of 139 observations were assigned. All were representing the B variant, which was found in two isoforms, a major non-phosphorylated isoform, identified in all samples, and a minor singly phosphorylated form, observed in 54.44% of the samples (Fig. 7). Based on relative abundances, a-LA was significantly more abundant in the EL group compared with the ML group (P < 0.01). A total of 140 observations was assigned to four different isoforms of b-LG. These were assigned to the two genetic variants, b-LG A and B representing three genotype combinations, b-LG AA, b- LG AB and b-LG BB (Fig. 8). Of the 140 observations, some were b-LG with a single lactosylation (þ324Da), for b-LG A and B, 25.80% and 3.33% of samples contained lactosylations, respectively. It is interesting to note that the majority of lactosylations were linked to b-LG A. In addition, by comparing the non-lactosylated b-LG genotypes (AA, AB and BB) a significant difference was found (P < 0.001), and thus the genotypes are expressed in different relative quantities (data not shown).

Discussion
In this study, isoform heterogeneity of major milk proteins, including some less studied low abundant PTM isoforms and splice and cleavage variants, were investigated in relation to the lactation stages EL (DIM at the beginning of the experiment of 23.3 ± 6.7) and ML (DIM at the beginning of the experiment of 176 ± 15). Using deconvoluted data from the TIC of the LC-ESI/MS, isoforms eluting at similar RTs could be identified, which otherwise would be difficult or impossible to distinguish. In total, this approach allowed us to identify a total of 66 different isoforms with 1675 observations throughout the 90 samples. From the observed RP-HPLC RT profiles (Fig. 1), it is clear that most of the isoforms cluster around the same RT. However, some of the isoforms belonging to k-CN seem to have a wider distribution than those of the other proteins, which is consistent with the larger PTM heterogeneity of k-CN. Similar observations were made by Jensen et al. (2015), who suggested that one of the explanations behind this could be that the PTMs in question are attached to different sites on the protein backbone. This could also be an explanation for some of the RT variations observed here. In addition, Fig. 1 demonstrates that a higher ACN concentration is needed to elute isoforms with higher levels of phosphorylation. This is particularly evident for a S1 -CN, a S2 -CN, b-CN and a-LA, and to a lesser degree for k-CN. The RP-HPLC RT profiles also demonstrate that glycosylated/lactosylated isoforms elute at lower ACN concentrations. This was observed for b-LG and k-CN. For free glycans, their RT have been correlated with NeuAC content (Dallas et al., 2011); however, for attached glycans, this relation was reported to be less clear (Goonatilleke et al., 2019). The combined effect of glycosylation(s), types thereof, phosphorylation(s), sites of PTMs and genotypes, could explain the deviations observed for k-CN relative to these general patterns.

Phosphorylation isoforms of a S -CN and b-CN
From our data, multiple phosphorylation isoforms were observed. This was particularly true for a S -CNs, where isoforms with 7 to 10P were observed for a S1 -CN and 10 to 15P for a S2 -CN.
Since the analysis was carried out using top-down LC-ESI/MS proteomics, the specific sites of phosphorylation in the protein backbones are not known, but are expected to be within the commonly accepted casein phosphorylation motifs; Ser/Thr-X-Glu/Asp/SerP, with SerP referring to an already phosphorylated Ser residue (Bijl et al., 2020;Fang et al., 2016;Mercier, 1981). Mercier (1981) used a database of known phosphorylation sites to investigate known kinases, and it was suggested that the kinase(s) involved in casein phosphorylation would have different preferences for specific motifs. The following order was proposed: Ser-X-Glu/SerP > Thr-X-Glu/SerP > Ser-X-Asp/SerP > Thr-X-Asp/SerP. One of the suspected primary kinases behind phosphorylation was isolated from the Golgi apparatus and identified as the FAM20C kinase (Tagliabracci et al., 2012). This kinase was responsible for adding phosphorylations to the Ser-X-Glu/SerP motif, but not to the Ser/Thr-X-Asp or Thr-X-Glu/SerP motifs. This could indicate that more than one kinase is involved in the complete phosphorylation of milk proteins, and possibly related to different phosphorylated isoforms. This seemingly differential preference for phosphorylation sites could indicate, that different kinases are involved in the generation of the low and high phosphorylation isoforms. In the present study, different frequencies of phosphorylation isoforms of a S2 -CN were observed. These frequencies could be the product of different kinase compositions or activities, and thus that different motifs could be preferred at different lactation stages. This may also explain the difference in abundance observed between specific phosphorylation isoforms within the same protein (Fang et al., 2016(Fang et al., , 2017. Mercier (1981) suggested that the phosphorylation sites should be relatively close to or part of a b-turn. This would ensure that the site is present near the surface of the protein, and thus that a kinase could interact with the sequence. A consequence of this could be that a phosphorylation remains on the outside of a protein and is thus available for interactions. Combining this would allow for up to 10P for a S1 -CN, 16P for a S2 -CN and 6P for b-CN (Bijl et al., 2020;Fang et al., 2016;Miranda et al., 2020;Nilsson et al., 2020). In spite of these theoretical possibilities, neither a S2 -CN 16P nor b-CN 6P have so far, to our knowledge, been observed.
For a S1 -CN, multiple masses corresponding to splice products or a plasmin cleavage product have earlier been reported (Gaiaschi et al., 2000;Martin et al., 2013a,b). Not all these earlier reported isoforms were observed here. The plasmin cleavage product observed in this work corresponded to what would be a loss of the first 79 AA in a S1 -CN. This loss was also observed by Gaiaschi et al. (2000) and Miranda et al. (2020). As these first 79 AA in a S1 -CN contain all phosphorylation sites, except one, the remaining fragment [a S1 -CN 1P fragment (80e199)] holds only one possible phosphorylation. The two observed splice products of a S1 -CN corresponded to a loss of either exon 12 or to a deletion of a Gln residue at position 59 or 78. According to Martin et al. (2013a,b), these splice products originate from two different events. The a S1 -CN splice variant form corresponds to the loss of exon 12 (representing deletion of residues 96e109). The amino acid stretch represented by exon 12 does not contain any potential phosphorylation sites, and therefore this exon skipping does not affect the phosphorylation status. The loss of Gln at either position 59 or 78 is caused by a cryptic splice event. However, simultaneous loss of both these Gln residues was not observed in our data. The deletion of Gln at either position 59 or 78 does not interfere with any phosphorylation motif, and these splice variants are thus found in both the 8P and 9P forms.
Also, for a S2 -CN, considerable variation within phosphorylation forms was observed throughout the dataset. Our database allowed identification of isoforms from 9 to 15P, and the dataset was later searched for 16P as well. This resulted in the observation of a S2 -CN with 10 to 15P. The measured relative abundances of a S2 -CN isoforms observed in this study are comparable with that of Nilsson et al. (2020), where a S2 -CN 9e15P forms were observed. As also observed by Fang et al. (2016), not all possible isoforms were present in all samples. In the present study, the 11 and 12P isoforms were both the most frequent and the most abundant of the a S2 -CN isoforms. According to Mercier (1981), a total of 18P should in theory be possible for a S2 -CN; however, as they also point out, some sites and use of the motifs have never been observed, and thus the actual maximum is probably closer to 15P, as described by Farrell et al. (2004).
For b-CN, 4 and 5P isoforms were observed. It was shown, as expected, that the 4P isoform was low abundant, compared with the 5P counterpart. According to Mercier (1981), a total of 6P is possible, as at least 6 matches to the Ser/Thr-X-Glu/Asp/SerP motif are in theory possible. However, one of these theoretical matches is to the Thr-X-Asp motif, which has not been confirmed as an actual used site of phosphorylation. Based on the casein motif (Ser/The-X-GluAsp/SerP), and the preference of a nearby b-turn, the most logic sequence therefore seems to be that the cluster of Ser residues at positions 15, 17, 18 and 19 are phosphorylated in conjunction, while the more isolated Ser at position 35 is phosphorylated on its own (Mercier, 1981). Considering that Ser at position 35 is relatively close to a b-turn, this only highlights this as an option for the fifth and final phosphorylation site for b-CN.

k-CN isoforms
As expected, a large number of k-CN isoforms was observed (total of 39, Fig. 5). The majority of observed k-CN isoforms were glycosylated (34 of 39). Of these, most contained NeuAc (30 of 39) as part of their glycosylation, as observed earlier (Holland, Deeth, & Alewood, 2004, 2006Miranda et al., 2020;Nilsson et al., 2020;Poulsen et al., 2016). In Nilsson et al. (2020), the precise number of identified isoforms was not reported, therefore a direct comparison with the present study is not possible. From Nilsson et al. (2020), it is interesting to note, that up to 11 different isoforms of k-CN could be identified in one single milk sample. In the present study, on average the samples contained 5.86 different k-CN isoforms, with a maximum of 11. The sample with 11 identified isoforms represented an AB milk, where three of the isoforms were assigned to variant A, and the remaining eight were assigned to variant B. The genotype representing milk with the highest number of isoforms from Nilsson et al. (2020) was not given. Notably, Miranda et al. (2020) accepted masses matching glycoforms of k-CN eluting later than non-glycosylated k-CN variants. In the present study, where a gradient from low to high hydrophobicity was used (see Section 2.2), such matches were discarded because it was considered unlikely that addition of a glycosylation would increase the overall hydrophobicity, as demonstrated in Fig. 1. In Holland, Deeth, and Alewood (2006), 2D gels were used to separate milk proteins according to their masses and pI values with subsequent MS/MS analysis of spot digests. However, comparing results of Holland et al. (2006) with those in the present study, surprisingly few glycosylated k-CN isoforms without NeuAc were found here.
From Fig. 1, it is evident that some k-CN masses appeared at multiple RTs. Similar findings were also observed by Jensen et al. (2015) in a study of k-CN macropeptide. As suggested by Jensen et al. (2015), this could be due to different attachment sites on the protein backbone for the PTMs in question. In the work of Holland et al. (2006), a total of 17 different isoforms was described for k-CN B alone, some with up to six attached glycans, and it is thus possible, that the heterogeneity of k-CN, especially for k-CN B, could be much greater than indicated in the present study, where a maximum of three attached glycans was allowed. In addition, Holland et al. (2006) proposed, that the fourth glycosylated amino acid could be at position 145, being a Thr residue, whereas no suggestions for the fifth and sixth glycosylation was made. As LC-ESI/MS was used here, it would be of interest to study differences between identical molecules by MS/MS to decipher PTM positions, as suggested by both Holland et al. (2004) and Jensen et al. (2015).
The average GD (defined as total glycosylated k-CN relative to total k-CN) was calculated here, and found to be 10.85% (result not shown), which is lower than most GD values reported in other studies (Bijl et al., 2020;Poulsen et al., 2016). However, these studies were using UV or gels and not LC-ESI/MS to determine the GD, and therefore the reported difference may relate to methodological differences. The value obtained here complies well with that of Nilsson et al. (2020), who also used LC-ESI/MS as well as a to our study comparable database for matching theoretical and observed masses. A calculation based on their reported results gives a GD of 9.90%. However, in future studies, it should be considered to allow for more potential glycosylations per k-CN molecule, as suggested by Holland et al. (2006), even though some of these expectedly may turn out to be of very low abundance or even undetectable.
From the k-CN data it was further observed that the k-CN isoforms differed in relative abundances between genotypes. As also observed by Poulsen et al. (2016), in the present study, different total relative abundances of k-CN isoforms present in k-CN AA, k-CN AB and k-CN BB milks were observed (Fig. 6). In our dataset the total relative abundancy of k-CN BB > k-CN AB > k-CN AA, which is the same order as reported earlier in milk from Danish Holstein by Poulsen et al. (2016). Even though GDs from UV-based studies and MS based studies are not directly comparable, the broad pattern of GD reported here reflected the expected order of k-CN BB > k-CN AB > k-CN AA (Poulsen et al., 2016).

Whey proteins
For the whey proteins, the number of isoforms found was relatively low; though, notably, a 1P form of a-LA was found. The existence of such a 1P isoform of a-LA was reported recently (Miranda et al., 2020); though, to the best of our knowledge, the site of phosphorylation has not been reported. Considering the Ser/Thr-X-Glu/Asp/SerP phosphorylation motif suggested by Mercier (1981), two positions seem to be possible, at positions 47 or 75. According to Farrell et al. (2004) none of these positions are altered between known genotypes. Given that residue 47 matches the Ser in the Ser-X-Glu motif, whereas the residue at position 75 matches the Ser in the Ser-X-Asp motif, the residue at 47 would match the most frequently used motif.
For b-LG, it was noted, that lactosylation was more frequent on b-LG variant A compared with b-LG variant B (25.80% versus 3.33% of samples, see Fig. 8). Without MS/MS data, it can only be speculated what drives this. One explanation may be related to different chemical properties of Asp and Gly, which are present at position 64 of b-LG variant A and B, respectively (Farrell et al., 2004). In contrast to Gly, Asp is negatively charged, which is expected to affect the structural properties of the protein. Since this position is not far from known lactosylation sites at residues 60 and 69 (Le et al., 2017), it could be speculated that these structural differences between the A and B variants could affect at least one of these positions, thus making variant A more accessible for lactosylation. The presence of lactosylations in fresh milk, which was used in the present study, is somewhat surprising, as this is normally a result of processing (Le et al., 2017), but shows that this can be observed even in cold-stored milk, as also reported by Miranda et al. (2020).

Effect of stage of lactation on PTM patterns
The physiological process of lactation has been reported to potentially affect PTM decoration of the milk proteins. Phosphorylation degree and/or phosphorylation isoforms of the a S -CNs have been reported to be potentially affected by lactation stage (Fang et al., 2017;Poulsen et al., 2016). In earlier studies, both GD of k-CN, as well as total content of NeuAc in its glycans, were found to increase with advance of lactation (Bonfatti et al., 2014;Maciel et al., 2017;Poulsen et al., 2016;Robitaille, Ng-Kwai-Hang, & Monardes, 1991). In the present study, both a S -CNs and k-CN, including their many isoforms, were significantly affected by lactation stages. This indicates, that the level or presence of kinases and glycosyltransferases in the endoplasmatic reticulum (ER) and Golgi apparatus (GA) can be affected by lactation. Here, the a S1 -CN 10P was found only in ML, at quite low frequency (2.22%). Considering the motif from Mercier (1981) (Ser/Thr-X-Glu/Asp/ SerP), 9 matches can be found in the a S1 -CN amino acid sequence corresponding to the Ser/Thr-X-Glu/SerP motif, the 10th site belonging to a Thr-X-Asp motif at residue 49, which has never been observed before. Considering the rarity of a S1 -CN 10P and that it was observed only in the ML group, this could indicate that the composition of kinases in the ER and/or GA could be affected by stage of lactation. For b-CN, 4P was found in ML only, together these findings could indicate that the composition of kinases of the ER and GA are indeed affected by lactation.

Conclusion
In the present study, it was shown that the use of deconvoluted TIC data from an LC-ESI/MS system, combined with a database of theoretical masses, was a powerful tool for identifying and quantifying isoforms of bovine milk. By this approach, it was possible to identify and quantify PTM differences relative to lactation stage. Some of these changes was found in the phosphorylation stages of a S -CNs, ranging from 7 to 10P for a S1 -CN, and 10 to 15P for a S2 -CN. In addition, some low abundant splice products for a S1 -CN were identified and quantified, alongside an otherwise normal a S1 -CN profile. For k-CN, 39 different isoforms were identified and quantified with different P and G contents; the most common glycosylated isoform being k-CN 1P 2G (e, e). Some of the k-CN isoforms were observed to cluster around different RT. For b-CN, the rare 4P PTM form was observed, but only in ML. Taken together, the presence of a S1 -CN 10P and b-CN 4P only in milk from cows in mid lactation could indicate that the composition of kinases in the ER and/or GA changes during lactation. The differences in RT of some of the glycosylated isoforms of k-CN could further indicate that the glycans are attached to different locations. This may reflect that the glycosyltransferases found in the ER and/or GA could be affected by lactation as well. These findings would require further studies to enlighten possible mechanisms behind these observations. For this purpose, an MS/MS approach could be an approach to identify the locations of unknown phosphorylations, glycosylations and to investigating the presence of different kinases in bovine milk.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.