Detailed Methods

Cell culture and transfections HeLa cells were cultured in RPMI 1640 medium supplemented with 100 U/ml penicillin, 100 µl/ml streptomycin and 10% (v/v) newborn calf serum U.S.A.) and incubated at 37°C under a humidified atmosphere of 5% CO 2. Transfections were performed using Lipofectamine 2000 transfection reagent (Invitrogen Corporation) according to the manufacturer's instructions. Near full-length proviral cDNA were PCR-amplified from the individuals' peripheral blood mononuclear cells (PBMCs) at each time point as previously described [1]. First-exon tat sequences were amplified from the cDNA via a nested PCR, firstly using the external primers 5'-AGGAGTGGAAGCCATAATAA-3' and 5'-ATTGCCACTGTCTTCTGCTGT-3', and then the internal primers 5'-ACAAGTTTGTACAAAAAAGCAGGCTATGGAGCCAGTAGAT-3' and 5'-ACCACTTTGTACAAGAAAGCTGGGTATGCTACTACTAATG-3'. Amplicons were cloned into vector pDONR201 (via attB sites, underlined in the internal primer sequences) and then into the expression vector pcDNA-DEST40 using Gateway cloning technology (Invitrogen Corporation) in a manner that places a stop codon immediately after the first-exon tat sequence. Up to five clones per time point were isolated, totalling 89 clones. Clones were sequenced using dye terminators version 3.1, translated and aligned against HIV-1 clone SF2 [2] Tat using the European Molecular Biology Open Source Software Suite programs TRANSEQ and SHOWALIGN [3]. Unique clones from each individual were then chosen for further study. The sequences of these unique clones are available from the GenBank database with accession numbers EU184659 – EU184684. The one-exon tat sequence from SF2 was similarly cloned into pcDNA-DEST40 and used as a control for the transactivation study. The study was performed using a luciferase assay system,

convergence was reached early in the run. The majority-rule consensus tree was generated from the combined post-burn-in samples (Supplemental Fig. S1).
This analysis is intended to provide meaningful branch lengths calibrated by both time and character change. We recovered > 90% of the topological relationships in the source tree (Turner and Sertich, 2010), with large polytomies resolved. For careful consideration of interspecies relationships, we refer the reader to Turner and Sertich (2010), Pol et al. (2014), and Turner (2015).

Morphological Data Collection
Due to variable preservation quality, four techniques were used to measure head width (HW). (1) as trans-quadratic width, for specimens with complete post-orbital skulls; (2) twice the distance between the midsagittal plane (as defined by an anteroposterior line passing through the midpoint of the occipital condyle) and the lateral margin of the quadrate, for distorted or incomplete cranial specimens; (3) trans-articular width for specimens lacking complete skulls but possessing lower jaws; and (4) or as twice the distance between the midsagittal plane (as defined by the anteroposterior midline passing through a fully articulated mandibular symphysis) and the lateral margins of the articular, for distorted or incomplete lower-jaw specimens.
Retroarticular processes (RAPs) were measured as the anteroposterior length of the articular bone from the midpoint of the quadrate articular joint (i.e., center of rotation) to the posterior-most margin of the articular bone, perpendicular to an imaginary line passing through the center of rotation of the jaw joints. Values for Metriorhynchus casamequelai were substituted with values for Metriorhynchus brachyrhynchus (Supplemental Table S3) in the analysis of trait history as they are congeners. RAP length was measured directly on fossil skulls and was measured following manual palpation on living specimens, which were restrained to ensure measurement accuracy.

Preliminary Evolutionary Model Fitting
Prior to the evolutionary analysis described in the main text, we first fit a series of standard models of trait evolution and evaluated the location and probability of evolutionary regime shifts on the pruned phylogeny. Evolutionary models of continuous trait evolution were fitted using the 'geiger,' 'phytools,' and 'evomap' R packages (Harmon et al., 2008;Revell, 2012;Smaers, 2016). Lambda (Pagel, 1999), delta (time dependent model; Pagel, 1999), kappa (punctuational model; Pagel, 1999), early burst (Harmon et al., 2010; similar to the accelerating-decelerating model of Blomberg et al., 2003), Brownian motion (Felsenstein, 1973), Ornstein-Uhlenbeck (random walk with central tendency; Butler and King, 2004; with variance-covariance matrix transformation of Slater, 2014), and independent evolution (adaptive peak of Smaers and Vinicius, 2009; implemented in evomap) were fit to the pruned phylogeny and values of natural-logged retroarticular process lengths (as ln mm). The phylogeny was independently scaled to each model and the posterior distribution of traits was calculated using a Bayesian Markov Chain Monte Carlo (MCMC) ancestral character estimation (2,000,000 generations). The relative goodness-of-fit of each model was compared using log likelihood and the small-sample-corrected Akaike information criterion (AICc). The best fit model is one of independent evolution (Supplemental Table S1), which was, therefore, employed in the analysis described in the main text. The worst fit model was Ornstein-Uhlenbeck (Supplemental Table S1). This may be due to the fact that our pruned phylogeny is both non-ultrametric and contains a sample size fewer than 200 taxa. These issues are described in detail by Slater (2014) and Cooper and colleagues (2016). SUPPLEMENTAL TABLE S1. Support for each model fit based on log-likelihood and AICc values. The best fit model is one of independent evolution (the adaptive peak model of Smaers and Vinicius, 2009

Evolutionary Model Support
Although the independent evolution model is the best fit and was used to calculate evolutionary rates and ancestral character estimations, a series of exploratory analyses was conducted to (1) estimate and (2) test hypotheses regarding the strength, directionality and probability of regime change within the tree. To identify shifts without a priori hypotheses, the SURFACE model of Ingram and Mahler (2013) was implemented in the R package 'surface' (Ingram and Mahler, 2013). SURFACE uses stepwise algorithms to locate regime shifts on a tree with the performance of each model tested using sequential, iterative changes in AICc values (a model initially conceived of by Hansen, 1997;Ingram and Mahler, 2013). This model involves forward and backward passes, with the forward pass establishing the number of regime shifts and the backward pass identifying instances of convergence in trait evolution (Ingram and Mahler, 2013).
Model output includes estimations of the parameters K (number of regime shifts), α (evolutionary rate), σ 2 (magnitude of uncorrelated diffusion), and n optima of θ (optimum trait values). Our SURFACE models accounted for both HW and RAP (in ln mm), and identified four regimes in RAP evolutionary rates: (1) early Suchia + Notosuchia (excluding hypercarnivorous Notosuchia), (2) hypercarnivorous Notosuchia, (3) Neosuchia, and (4) (Uyeda and Harmon, 2014). Whereas SURFACE treats α and σ 2 as constant throughout the tree, the multi-peak OU model allows these values to change. We ran two models for retroarticular process evolution in 'bayou'. In the first, all prior parameters were unconstrained ('heuristic priors' below). In the second, we set our priors to be roughly equivalent to the results of our SURFACE models ('four-shift' below). We identified four branches on which a priori regime shifts were expected, introduced three values for θ (2.033, 3.515, and 4.649 ln mm), and established initial evolutionary rates and the magnitude of uncorrelated diffusion (α = 0.038, σ 2 = 0.028). Both of our 'bayou' models ran for 2,000,000 generations, allowed for a maximum of one regime shift per branch, and all prior distributions were assumed to be normal (with the exception of the distribution of shifts, for which the prior distribution was assumed as conditional Poisson).
The four-shift 'bayou' model returned a bimodal set of trait optima (Supplemental Table   S4 Figure S2). It is possible to implicate either of these locations as the onset of ontogenetic inertia (and, thus, adult overperformance); however, modern data can be most directly extended to shifts within Crocodylia. SUPPLEMENTAL