Molecular Comparisons of Full Length Metapneumovirus (MPV) Genomes, Including Newly Determined French AMPV-C and –D Isolates, Further Supports Possible Subclassification within the MPV Genus

Four avian metapneumovirus (AMPV) subgroups (A–D) have been reported previously based on genetic and antigenic differences. However, until now full length sequences of the only known isolates of European subgroup C and subgroup D viruses (duck and turkey origin, respectively) have been unavailable. These full length sequences were determined and compared with other full length AMPV and human metapneumoviruses (HMPV) sequences reported previously, using phylogenetics, comparisons of nucleic and amino acid sequences and study of codon usage bias. Results confirmed that subgroup C viruses were more closely related to HMPV than they were to the other AMPV subgroups in the study. This was consistent with previous findings using partial genome sequences. Closer relationships between AMPV-A, B and D were also evident throughout the majority of results. Three metapneumovirus “clusters” HMPV, AMPV-C and AMPV-A, B and D were further supported by codon bias and phylogenetics. The data presented here together with those of previous studies describing antigenic relationships also between AMPV-A, B and D and between AMPV-C and HMPV may call for a subclassification of metapneumoviruses similar to that used for avian paramyxoviruses, grouping AMPV-A, B and D as type I metapneumoviruses and AMPV-C and HMPV as type II.


Introduction
The Genus Metapneumovirus (MPV), in the family Paramyxoviridae, subfamily Pneumovirinae, includes globally important viruses in avian and human health. Avian MPV (AMPV) [1][2][3][4][5] cause respiratory and genital disorders in poultry having a severe economic impact on the industry [6]. Human MPV (HMPV), is responsible for bronchiolitis in infants [7,8] and severe infections in the elderly or immunocompromised adults [7,[9][10][11]. AMPV and HMPV are now classified into the genus Metapneumovirus [12], to acknowledge the difference in genome order [13] and the absence of the non structural protein genes NS1 and NS2 as compared with members of the Pneumovirus genus [14].
MPV have non segmented, single stranded, negative sense RNA genomes between 13.1 and 14.2 kb which are known to encode 9 proteins. MPV genomes are organized in the order 39-leader-N-P-M-F-M2-SH-G-L-trailer-59. Genetic and antigenic studies have revealed four AMPV subgroups (A to D) and two HMPV subgroups (A and B) with a high similarity between HMPV subgroup A (strain 001) and AMPV-C [15][16][17][18][19]. Genetic sublineages have been defined within HMPV subgroups and AMPV-C, the latter forming two genetic lineages in Muscovy ducks in France [20] and turkeys and wild birds in the USA [21][22][23][24][25]. It is not fully understood why AMPV-C pathogenic for turkeys emerged in the USA, whereas such viruses have not been recognized in the EU or Asia, with the exception of the AMPV-C strain recently isolated in chickens in China [26].
Determining full length sequences of viral genomes is an essential step towards studying the possible molecular basis for host tropism or pathogenicity, first by allowing the development of reverse genetics systems for the studied strains, and second by allowing genome wide sequence comparisons highlighting relevant regions to study using reverse genetics. Full length genome sequences of subgroup A and B viruses are available [27][28][29][30]. Full sequences are available for AMPV-C from both turkey [16,31] and wild goose [21,32] in the US, from pheasants in Korea [33], and most recently from Muscovy duck in China (Acc Nu KC915036 and KF364615). Partial sequence (M gene) is available for a Chinese chicken isolate of AMPV-C (acc Nu JX422020).
In combination with sequences described previously [20,24,34,35], this study completes the sequences of European AMPV-C (French duck isolate, Fr-AMPV-C) and D (French Turkey isolate, Fr-AMPV-D). Comparisons were made to full length sequences of European AMPV-A, B and US-C and of all HMPV-A and B sublineages.

Acquisition of Full length Fr-AMPV-C and D sequences
Fr-AMPV-C Muscovy duck/France/1999/99178 and Fr-AMPV-D Turkey/France/1985/85035 (the latter previously identified Fr/85/1 in refs [35,36]) were propagated in Vero cells as described previously [20,36]. The 99178 and 85035 viruses were shown experimentally to be pathogenic for SPF Muscovy ducklings or turkeys, respectively [37]. Fr-AMPV-C and Fr-AMPV-D virus stocks had a titer of log 10 4.20 and log 10 5.00 TCID 50 /ml, respectively [38]. Viral RNA was extracted using QIAamp Viral RNA mini kit (Qiagen, France) according to the manufacturer's instructions. Primers were designed from previously published partial sequences (Table S1) and from the full genome sequences of US AMPV-C and AMPV-A and B. Additional primers were defined from the newly determined sequences (sequence of all primers are available on request). Sequence of the leader and trailer as previously reported, based on 39 tailing of the genome and its positive replication intermediate [34]. cDNA copies of the viral RNA were prepared using superscript II (Invitrogen, France) according to manufacturer's recommendations. dsDNA was amplified from cDNA in overlapping segments using Expand high fidelity enzyme (EHF, Roche, France) according to manufacturer's recommendations. PCR products were purified using an Utraclean Gelspin kit (Mobio), and sequenced using Big Dye Terminator v3.1 cycle sequencing kit as recommended by the manufacturer. Each genome region was amplified three times and PCR products were sequenced in both directions.

Genetic comparisons
Full length Fr-AMPV-C and Fr-AMPV-D sequences were assembled using vectorNTIv11 software, then aligned using MEGA 5.2 [39] against available full genome sequences of MPV downloaded from Genbank (four HMPV and 13 AMPV genomes, see Acc No in Table 1). Open reading frames (ORFs) were predicted and then compared with those reported previously using MEGA 5.2. Program ''getorf'' from EMBOSS (emboss.sourceforge.net) was used to detect potential ORFs which had been defined as a region of at least 150 nucleotides between two STOP codons. The amino acid (aa) sequences of these ORFs were compiled in a database file. ORFs identified from Fr-AMPV-C and Fr-AMPV-D were compared to all other MPV ORFs by local BLAST [40] and were submitted to global BLAST search online.

Codon usage
The extent of codon bias was evaluated among the studied MPV. To measure the general non-uniformity of the synonymous codon usage, the effective number of codons (Nc) [41] was calculated based on the longest MPV gene (L). Nc values range from 20 when only one of the possible synonymous codons is used for each amino acid, to 61 when all synonymous codons are used equally. The closer the Nc value is to 20, the stronger the bias in codon usage and the more non random codon usage is. It is generally admitted that genes have a significant codon bias when the Nc value is less than or equal to 35 [42].
In the Nc value calculation formula: Nc = 2+9/F2+1/F3+5/ F4+3/F6, F2 corresponds to the probability that two randomly chosen codons for an amino acid, possibly encoded by two distinct codons, are identical. F3 is the probability that three randomly chosen codons for an amino acid with three synonymous codons are identical and so on for F4 and F6. The Nc value was determined using CodonW 1.4.4 (http://codonw.souceforge.net) and was correlated to the percentage of G+C at the third position (GC3) as it has been shown previously to be a major factor influencing the synonymous codon usage pattern in the HMPV genome [43].
Phylogenetics. All available AMPV full-length genome sequences and one representative of each of the four HMPV sublineages were aligned using Clustal W. Alignments were also checked manually for a good correspondence of the common coding regions. Phylogenetic analysis was performed using MEGA 5.2 with the Neighbor-Joining method (1000 boostrap replicates) and the Kimura-2-parameter substitution model.

Sequence overview
The full length consensus sequences for Fr-AMPV-C and Fr-AMPV-D were 14152bp-and 13415bp-long, respectively. Table  S1 presents the previously released sequences for these two viruses. The present report provides newly determined sequences equal to 73 and 78% of the total genome sequence for these viruses, respectively. The full length genomes were consistent in the order (39-leader-N-P-M-F-M2-SH-G-L-trailer-59) and in the size of known ORFs for MPV genomes (Table 1). Both sequences have been submitted to EMBL (Accession numbers HG934338 and HG934339 respectively). The Fr-AMPV-C and Fr-AMPV-D genomes, like several other AMPVs, were found not to conform to the ''rule of six'' [44], a feature that separates pneumovirinae from paramyxovirinae [45]. In general, genome lengths were conserved amongst AMPV subgroups A, B and D and amongst HMPV sublineages, however clear differences could be seen in the genome lengths of AMPV subgroup C viruses, mostly resulting from the different lengths of their G genes (Table 1).

Phylogenetics
Three significant clusters were observed, one grouping all HMPVs, a second grouping AMPV-Cs and a third grouping the AMPV-A, B and D subgroups (Fig. 1).
Within the AMPV-C cluster, viruses isolated from Muscovy ducks (SO1, GDY and 99178) formed a separate sub-lineage from the others. This separation is potentially related to species rather than geographic origin as the Asian SO1, GDY, PL-1 and PL-2 isolates were split into different clusters (SO1 and GDY with the European 99178 isolate and PL-1 and 2 with the US isolates, Fig. 1), although these geographical relationships could be blurred if AMPV were shed by migratory birds, in the overlap between the East Asian-Australasian flyway with both the East Atlantic and/ Pacific Americas flyway in the Northern hemisphere [46].

Nucleoprotein (N) ORF
The first ORF in the AMPV genome encodes the N protein, which is a component of the polymerase complex and important for the formation of the nucleocapsid helical structure [45,47]. The N ORF of Fr-AMPV-C was more closely related to that of other AMPV-Cs and HMPVs in terms of length and aa conservation than it was to AMPV-A, B or D. For all coding regions, in addition to the ORF length, the deduced protein length was indicated in brackets. b number of nucleotides between the N and P ORFs, this encompasses the 39NCR of the N gene, the N-P intergenic region and the P 59NCR. c Due to the overlap between M2.1 and M2.2, the sum of nucleotides presented is greater than the genome length. d .
Indicates the full genome is longer than shown due to the genome extremities not being confirmed. Fr-AMPV-C, N ORF (394aa) was identical to previously reported AMPV-C N ORFs regardless of host species and to those of all HMPV sublineages (Table 1). High aa identity was observed with all other AMPV-C sequences (99%) and with all HMPV sublineages (89-90%) however, the identities with AMPV-A, B and D were lower (70, 71 and 73%, respectively). Two aa positions (44 and 137) were found to be specific of Fr-AMPV-C N compared to all other AMPV-Cs, notably these were the same amino acids as found in HMPV subgroups at this position (Fig. 2). Neither of these amino acids was within the three conserved regions identified previously among pneumoviruses (Barr et al 1991) and represented as boxes A-C in Fig. 2 the two latter (B and C) being merged in MPV forming a larger conserved domain B/C (aa 241-327 Fig. 2). Four separate regions (see grey shaded boxes in Fig. 2) were also highly conserved in all MPVs.
Subgroups A, B and D viruses were more closely related in terms of length and aa identity than they were to AMPV-Cs or HMPVs. Indeed, the length of the Fr-AMPV-D N protein (391aa) was identical to that of both AMPV-A and B and three amino acids shorter than AMPV-C and HMPV N proteins. Amino acid identities were high with A and B (89-90%) but lower with subgroup C and HMPV N proteins (71-74%). Localization of aa differences can be seen in Fig. 2, which also supports a relationship between Fr-AMPV-D and subgroups A and B.

Phosphoprotein (P) ORF
The second main ORF in the AMPV genome encodes the P protein, which is also part of the polymerase complex. Consistent with N protein comparisons, the P ORFs of AMPV-Cs and HMPVs were more closely related than they were to AMPV-A, B and D. In the same respect, subgroups A, B and D also demonstrated closer relationships.
The length of the Fr-AMPV-C, P ORF (294aa) was identical to other previously reported subgroup C P ORFs regardless of host species, and to those of all HMPV subgroups ( Table 1). The P ORFs of AMPV-A, B and D were 16-17aa shorter ( Table 1). The full length Fr-AMPV-C P sequence demonstrated a high aa conservation of 96-97% with all AMPV-C sequences, 67-68% with HMPV subgroups and 56%, 54% and 53% with AMPV-A, B and D respectively. Sequence conservation in the carboxy terminal half of the P protein (aa160-294) was notably higher for all the studied MPVs than it was in the amino terminal half (aa11-159). The carboxy terminal half has been reported to support most of the interactions with the N protein and polymerase complex, as reviewed by Easton et al., 2004 [45]. The high conservation of the P interaction domain between AMPV-C and HMPV is consistent with the finding that a recombinant chimeric HMPV with the P gene derived from AMPV-C was able to replicate in Vero cells [48,49].
In common with all subgroup C and HMPV P protein sequences analyzed previously [50], Fr-AMPV-C P lacked cysteine residues and maintained high conservation within the region (aa185-240) proposed to play a role in maintaining the structural integrity of the nucleocapsid complex [51]. More recently, this region in the HMPV P sequence has been shown to contain a short molecular recognition element (aa198-211) and a small domain (aa171-193) responsible for P tetramerization [52] (Fig. 3). In the later domain subgroup C viruses were fully conserved and only differed at two amino acid positions from all HMPV sequences. Sequences of subgroups A, B and D in this domain were fully conserved but they differed at three and five amino acid positions from HMPVs and AMPV-Cs respectively (Fig. 3). Interestingly the first 14aa of the molecular recognition element were 100% conserved in all MPVs with the exception of strain SO1 which contained just one aa difference (Fig. 3 grey shaded  box).
The length of the Fr-AMPV-D P protein (278aa) was identical to subgroup A and just one aa shorter than the P protein of subgroup B. Amino acid identities were 71-72% conserved with subgroups A and B. In contrast to subgroup C and HMPV P proteins, subgroups A, B and D contained cysteine residues (Fig. 3). Cysteine 56 was conserved in subgroups A, B and D and cysteine 64 was conserved between subgroups A and B. Subgroups A, B and D all differed extensively at the extreme C terminus of the P protein (Fig. 3).

Matrix protein (M) ORF
The third main ORF in the AMPV genome encodes the M protein which orchestrates the assembly of viral components at the plasma membrane, through interactions with the viral glycoproteins and nucleocapsid [53,54]. The length of the M ORF (254aa) was identical in all MPVs and its aa sequence was extremely conserved amongst all AMPV-Cs (99%) and between AMPV-A, B and D (90 to 94%). High aa conservation was also seen between AMPV-C and HMPVs (87-88%), but to a lesser extent between AMPV-C and AMPV-A, B and D (78-79%). The hexapeptide (aa14-19) with no known function but conserved across all pneumoviruses [55] was also highly conserved in all MPVs, with the exception of Fr-AMPV-C that contained one conservative aa change (VRI) at position 18. Three cysteine residues (aa110, 147 and 239) were also conserved in all MPVs.

Fusion protein (F) ORF
The fourth main ORF in the AMPV genome encodes the highly antigenic, type I membrane fusion protein F. In paramyxoviruses, F is synthesized as an inactive single precursor F0, which is directed to the endoplasmic reticulum by its Nterminal signal peptide. F0 is then cleaved at an arginine-rich cleavage site, mostly by host endoproteases such as furins, into functional F1-F2 subunits held together by disulfide bonds. The F1 subunit remains inserted into the virus membrane by its carboxy-terminal transmembrane domain [56]. The F2 subunit of both HMPV and human and bovine RSV (HRSV, BRSV) has been reported to determine cellular host range [57,58].
The length of the Fr-AMPV-C, F ORF (537aa) was identical to previously reported AMPV-C F ORFs, one aa shorter than the F of AMPV-A, B and D, and two aa shorter than HMPV F ( Table 1). The aa sequence of Fr-AMPV-C was extremely conserved (98-99%) with all other AMPV-Cs, highly conserved (81-82%) with all HMPV sublineages and slightly less conserved (71-73%) with AMPV-A, B and D. The cleavage site was located in the same position (aa 99-102) in all MPVs (grey box Clv in Fig. 4). The cleavage sequence (RKAR) conserved in all AMPV-Cs was not consistent with the typical furin cleavage (R-X-R/K-R) site found in AMPV-A, B and D (RRRR, RKKR and RQKR respectively), however a less typical recognition site (R-X-X-R) has also been shown to be functional [59].
The sequence of the signal peptide (aa 1-18 Fig. 4) at the N terminal end of F2 was extremely subgroup specific in the avian viruses, a rather surprising finding considering its function, with at best 39% identity between subgroups D and C however, a higher identity (56-61%) was seen between subgroup C and HMPV. Two cysteine residues (aa 28 and 60) (Fig. 4) remained conserved in all MPVs, including Fr-AMPV-C and D (with the exception of position 28 in the AY579780 APV/CO sequence), which further supports their already suggested possible structural role [50].
In the F1 subunit, the fusion related domain (103-125) [60] was 100% conserved in all MPVs, with the exception of one aa change in AMPV-B. Other interesting conserved features in all MPV sequences included i) the position of the 12 extracellular cysteine residues (Fig. 4), a finding that is consistent with their possible involvement in protein secondary structure through the formation of disulphide bonds (Van den Hoogen et al., 2002), and ii) a proposed N-Linked glycosylation site (aa353-355, Fig. 4) [50]. Other features appeared subgroup or strain specific. For example, all AMPV-C F1 sequences contained a glycine residue (G) at amino acid 294 (Fig. 4), a position previously reported in HMPV to be influential in low pH-triggered fusion and syncitial phenotype [61,62], and in AMPV-A to contribute to the increased protective capacity of a genetically modified virus [63]. An integrin binding domain 329 RGD 331 (grey box Ibd in Fig. 4) has been identified in the F1 subdomain of the HMPV F protein, and changes to either of its first two residues have been shown to be detrimental for fusion activity [64]. No such typical RGD domain exists in the AMPV Fprotein: in contrast, all subgroup C sequences contained a motif 329 RSD 331 and subgroups A, B and D contained a motif 329 RDD 331 . The subgroup specific modifications in this biologically significant domain also support the closer relationship between subgroups A, B and D.
The F1 cytoplasmic tail also exhibited inter subgroup variation (Fig. 4). Indeed, intra subgroup identities in this part of F1 were extremely high (subgroup C sequences 96-100%, subgroup A 100%, and HMPV 88-100%), whereas an extremely low conservation between subgroups was observed, with at best 56% between AMPV-B and D and as low as 0-4% between AMPV-A and HMPV. In spite of this low conservation, a TTG motif was conserved in the F1 cytoplasmic tails of AMPV-A, B and D (Fig. 4). The cytoplasmic tails of several paramyxovirus fusion proteins have been shown to be important in virus assembly [65].
Finally, several regions in the F1 subunits of pneumoviruses and MPVs are important in the production of neutralizing antibodies [66][67][68][69]. Brown et al 2009 [66] demonstrated that two regions (211-310 and 336-479) of the AMPV-A F protein were recognized by neutralizing antibodies to both subgroup A and B but not subgroup C virus. These regions appeared highly conserved (mean 95%) between AMPV-A, B and D but much less so with AMPV-C and HMPV (71-84%). Such identities are consistent with the lack of cross neutralization of AMPV-A, B and D with subgroup C viruses, and further suggest that neutralizing epitopes within regions 211-310 or 336-479 of AMPV-A and B are also likely to exist in subgroup D. These genetic data thus correlate with the previously reported antigenic cross-reactivity between AMPV subgroups [66,70].  The M2 protein ORF The M2 gene contains two overlapping ORFs (M2.1 and M2.2). M2.1 is involved in virus synthesis and enhances the processivity of the viral polymerase whilst M2.2 has been suggested to alter the balance between transcription and replication [45]. M2.2 has also been shown to be important for adaptation to Vero cells [71]. Fr-AMPV-C M2.1 was identical in length (184aa) to all other subgroup C sequences, however two, four and three aa shorter than AMPV-A and B, AMPV-D and HMPV sequences, respectively. Fr-AMPV-C amino acid identities were again highly conserved with all other subgroup C sequences (98%) and with all HMPV sublineages (84-85%), which is consistent with the finding that the polymerase complex proteins (M2.1, N, P and L) of either virus are biologically active in heterologous rescues [49,72]. Similarly, high identities (87-90%) were also observed between AMPV-A, B and D, however identities between subgroup C with AMPV-A, B and D (71-74%) were moderately lower. The three cysteine residues found in all pneumoviruses [50] within the first 30 aa of M2.1 remained conserved in both Fr-AMPV-C and D. M2.1 is intra cellular and conservation of cysteines has been shown in RSV to be important for the formation of structural metal binding motifs [73,74].
In M2.2, a similar level of conservation was seen between subgroup C sequences (93-99%) but conservation was notably lower with the HMPV sublineages (54-58%) and with AMPV-A, B and D (20-24%) which was consistent with previous literature [50]. The highest inter subgroup identity was seen between AMPV-A, B and D (64-72%).
Three cysteine residues were conserved across all MPVs (aa7, 16 and 56) and a further two conserved between AMPV-A, B and D (aa22 and 59). Cysteines 7, 16 and 22 fell within a region (aa0-25) identified in HMPV as critical to promote viral gene transcription [75].

Small hydrophobic protein ORF (SH)
SH is a small type II membrane glycoprotein protein localized in the endoplasmic reticulum, golgi and cell surface [76]. SH has been shown to be non-essential for virus attachment, infectivity or virion assembly [29,77,78]. However, a SH deletion in AMPV-A contributed to an altered syncytial phenotype and a reduced immunogenicity [79].
SH length (175aa) was conserved across all AMPVs with the exception of subgroup B (180aa). This was in contrast to the varying lengths seen in HMPV SH (177-183aa). A range of 83-100% aa identity was seen between all subgroup C SH sequences. SH conservation between AMPV-A, B and D was considerably lower (42-49% aa identity), and even lower between HMPVs and all APMVs (14-31%). In the SH transmembrane domain, subgroup-C sequences demonstrated a closer relationship with HMPVs (39-50%) than they did with subgroups A, B or D (19-30%). Subgroups A, B and D were more closely related in their transmembrane domains (70-86%).
Further relationships between AMPV-C/HMPV or AMPV-A/ B/D were evident in the conservation of cysteine residues. AMPV-A, B and D had fourteen (3 in the intracellular and 11 in the extra cellular domain) and AMPV-Cs and HMPVs had nine in the extracellular domain. Seven cysteines were conserved across all MPVs in the extracellular domain.
These features make the SH protein the second most variable protein (after the G gene) in the MPV genome, with respect to inter-subgroup aa identity. Interestingly inter-subgroup differences did not prevent the restoration of a typical phenotype when SH B was introduced into a SH-deleted AMPV-A genome background. A similar result could not be achieved using SH C [79].

Glycoprotein ORF (G)
G is a heavily glycosylated type II membrane protein, involved in, but not essential for virus attachment [29,80,81]. Most recently it is emerging as an inhibitor of the cellular host immune response to viral infection [82][83][84].
We have reported previously genetic analysis of the large G ORF in Fr-AMPV-C [24] and D [24,35]. Both studies showed that G exhibited the most extensive divergence between subgroups in terms of length and sequence identity. Differences in the length of the G protein ectodomain amongst AMPV-C isolates have been also reported [18,[85][86][87]. In the present study, the length of both Chinese AMPV-C G sequences were identical to that of Fr-AMPV-C (585aa), whilst both Korean AMPV-C G sequences were shorter (264aa) and more closely resembled AMPV/CO (table 1). Intra subgroup C identities including the two Chinese and two Korean AMPV-C G sequences were within the range reported previously (75-83%) [24]. The two pairs of Asian AMPV-C sequences were highly conserved (intra pair identity = 97 and 99.6%, respectively) and the four viruses exhibited the conserved intracellular and trans membrane domains and the ten extracellular cysteine residues previously reported to be conserved in all AMPV-Cs [24]). Remarkably, 19 out of 22 aa differences between the two Chinese sequences were confined to a short domain (aa269-299) immediately at the N terminal end of the previously identified, variable part of G ectodomain. The polymerase protein ORF (L) The final ORF of metapneumovirus genomes encode the large RNA-dependent RNA polymerase protein L. It is a major part of the polymerase complex responsible for most of the enzymatic processes involved in transcription and replication [88]. It is also responsible for viral messenger RNA capping, polyadenylation, methylation and phosphorylation processes [89].
The length of Fr-AMPV-C L (2005aa) was identical to all other AMPV-C and HMPV sequences, one aa shorter than that of AMPV-A and B and two shorter than that of Fr-AMPV-D. Extremely high aa conservation was observed amongst subgroup-C viruses (98-100%), and amongst HMPV sublineages (94-99%). Closer relationships were observed between subgroup-C viruses and HMPVs (80-81%) than between these viruses and subgroups A, B, D (63-64%). Subgroup D again demonstrated a closer relationship with subgroups A and B (84-86%).
Six functional domains (I-VI) have been identified in the L proteins of non segmented negative strand viruses [88], with domain III including four highly conserved core polymerase motifs (A-D) [50] The newly identified sequences were consistent with these findings in motifs A, B and C, however some variation was seen in motif D (Fig. 5) and motifs A and C appeared to be larger in MPVs (Fig. 5). Two additional regions were observed where all MPVs were completely conserved (Fig. 5). The QGDNQ pentapeptide found in motif C within domain III was replaced in all MPVs by NGDNQ. AMPV-A, B and D shared four or five amino acids in motif D that were not represented in the subgroup C viruses or HMPVs (Fig. 5). Further conservation was observed between all MPVs in the ATP-binding motif (aa 1677-1721) identified previously [50] and in five previously unidentified regions scattered through the L ORF were all MPVs were 100% conserved over 15 or more aa (aa15-29, aa549-573, aa656-670, aa1250-1265 and aa1297-1319). Finally two regions of subgroup specific sequences were observed towards the N terminal end of the L protein (302-320, and 431-446).

Codon usage in the L gene
Different groups were revealed by the codon bias analysis i) HMPVs (Nc = 41.9 to 43.2) ii) AMPV-Cs (47.1 and 47.5) and iii) AMPV-A, B, D (51.4 to 52.7) (Fig. 6). Data points were close to the curve (expected value of Nc if the bias was solely due to the G+C content at the third position) demonstrating that the biases were mostly due to the GC content. Interestingly, AMPV-C and HMPV demonstrated a different codon bias profile, although many of their proteins shared high aa similarity, a feature that most probably reflects their adaptation to a specific host. Another striking aspect of the codon bias study was that AMPV-A, B and D had a very similar codon bias, although the genetic distances were important between these viruses (Fig. 1). It is not known whether the bias picture would change if all protein sequences in the full length genome were used, however, this has been performed for HMPV and resulted in a very similar bias [43].

Non coding regions, intergenic regions and leader and trailer sequences
The numbers of nt between two consecutive ORFs (thus a sequence encompassing the 39NCR of previous gene, intergenic region and 59NCR of subsequent gene) in Fr-AMPV-C were consistent with other subgroup Cs ( Table 1). The smallest (21 nt) was between M2-2/SH and the largest (187 nt) between SH/G. The only notable differences in lengths occurred between G/L of AMPV-C strains 2a/97, PL1 and PL-2 (Table 1), where these three strains exhibited 28 nts, 16 nts less than all other AMPV-Cs. The numbers of nt between consecutive ORFs in subgroups A, B and D were more similar than they were to subgroup Cs or HMPVs (Table 1). Although the typical AMPV gene start signal GGGACAAGT and gene stop signal AGTTA(Xn)Poly A [13,[90][91][92] were mostly conserved amongst AMPV subgroups, some differences were observed (Table 2).
We have previously described the 39 and 59 sequence extremities of Fr-AMPV-C [34] discussing. Here complete leader and trailer sequences showed varying levels of conservation amongst all MPV's (67-97.5%). The highest level of conservation was seen between the leader and trailer sequences of subgroup C viruses and HMPVs (79-85%). This was consistent with the heterologous rescue of AMPV-C and HMPV minigenomes using different polymerase complexes [49,72]. Remarkably, subgroup D had a leader sequence of 62 nt which was 7 nt longer than any MPV leader sequence reported to date.

Conclusion
This study provides the full length genome sequences for two new AMPV strains including the first full length sequence for AMPV subgroup D. Results supported previous reports that AMPV-C viruses are indeed more closely related with HMPVs than they are with other AMPV subgroups, and further demonstrate that AMPV-D is more closely related with the AMPV-A and B subgroups. Ideally, this study might be extended by sequencing more AMPV-D isolates. Unfortunately only two such isolates are currently available worldwide, both isolated in France, on the same date and within close proximity, consequently efforts to obtain new AMPV-D isolates should be continued. The three MPV ''clusters'' HMPV, AMPV-C and AMPV-ABD were also further supported based on phylogenetics, sequence comparisons and codon bias studies.
These data combined with those of previous reports indicating antigenic relationships between subgroups A, B and D [20,70] and between subgroup C and HMPV [93] may call for a sub classification of MPVs comparable to that implemented for avian paramyxovirus, where viruses are first grouped into serotypes (type number) then separated into genotypes [94,95]. Transposing a similar approach into the MPV genus would result in grouping AMPV-A, B and D as type I MPVs and AMPV-Cs and HPMVs as type II.