Forecasting US Army Enlistment Contract Production in Complex Geographical Marketing Areas

Purpose–The purpose of this paper is to demonstrate an improvedmethod for forecasting theUSArmy recruiting. Design/methodology/approach – Time series methods, regression modeling, principle components and marketing research are included in this paper. Findings – This paper found the unique ability of multiple statistical methods applied to a forecasting context to consider the effects of inputs that are controlled to some degree by a decision maker. Research limitations/implications – This work will successfully inform the US Army recruiting leadership on how this improvedmethodologywill improve their recruitment process. Practical implications – Improved USArmy analytical technique for forecasting recruiting goals.. Originality/value – This work culls data from open sources, using a zip-code-based classification method to develop more comprehensive forecasting methods with which US Army recruiting leaders can better establish recruiting goals.


Introduction
Since the formal elimination of the draft by Congress in 1973, the USA Army has maintained an All-Volunteer Force (AVF) (Waddell, 2005, pp. 413-442). At the lowest echelon of the Army recruiting system, US Army Recruiters are tasked to help fill the ranks of the AVF by actively pursuing qualified future soldiers with the ultimate goal of generating required enlistments. At higher echelons of recruiting leadership, however, a fundamental concern arises in how best to distribute recruiting goals to subordinate recruiting echelons throughout the country.
Implied in the latter concern is an assumption that a quantitative relationship between numerous recruiting factorsboth within and outside of the recruiters' controland enlistment production in each geographical recruiting area exits, can be satisfactorily established and can be exploited for forecasting purposes. If so, recruiting leadership can then maximize potential recruiting contracts by altering recruiting goals subject to a set of organizational constraints. Thus, the two primary steps of setting recruiting goals consist of: defining an appropriate quantitative relationship that is robust to extrapolation into the future; and maximizing the outputs based on organizational constraints and the quantitative model parameters found in the first step. We use the acronym US Army Recruiting Command (USAREC) to refer to the corporate, goal-setting body of Army recruiting leadership.
This article focuses on the first of recruiting leadership's tasksthe development of a quantitative relationship between recruiting market factors and enlistment productionas it is fundamental to successful execution of the second task, optimizing production. We are not the first to do this; USAREC has used and currently uses such quantitative models in the past. We do however offer an improved methodology for achieving a useful quantitative model. We first establish the extent to which we can accurately forecast the relationship between enlistment supply and demand factors and enlistment contract production. We assume as supply factors those which are outside of the recruiters' control; local unemployment rates are an example. Enlistment demand factors, by contrast, are those over which the Army and recruiters have some control; numbers of recruiters on-hand and recruiting goals are examples. In terms of the response, we focus specifically on Regular Army (RA) enlistment contracts in 38 recruiting regions that span the USA, territories excluded. We subdivide the enlistment contracts into three mutually exclusive categories of interest to USAREC: (1) high-aptitude high school graduates (GA); (2) high-aptitude high school seniors (SA); and (3) all others (OTH) (Flesichmann and Nelson, 2014).
These are not the standard responses used by USAREC in their forecasting model, but we address in this paper why changing to these make sense. We analyze data from both recruiting organizations and open sources for the period Fiscal Year (FY) 2010-2014. We take advantage of open source data at the county level and map this county-level data to each ZIP code-based recruiting market boundary. To complete this mapping, we introduce a way to compress data from over 3,000 counties and 42,000 ZIP Codes into 38 markets. We then apply principal components analysis (PCA) and mixed stepwise regression to the re-mapped data to develop adequate, parsimonious models for each recruiting market and contract type. The application of PCA represents a significant contribution to the level of statistical rigor in our model development methodology over previous efforts.
Quantitative prediction models are useful when they yield accurate predictions. We use hold-out samples for model validation. We use only the first 75 per cent of the data to estimate ordinary least squares models; the latter 25 per cent of the data are used in forecast validation. To obtain a better appraisal of model stability during validation, we create additional realism by using simple linear trend forecasts of market supply variables. At the conclusion of this step, we achieve our penultimate objective by rendering quantitative market-and contract-specific comparisons of model performance within the context of a forecasting scenario. As a brief concluding excursion, we compare our multivariate forecasting approach to several univariate time series models.
In summary, we introduce an improved methodology for model development and assessment. To this end, we introduce an improved way to generate appropriate response data, and we demonstrate the use of principle components analysis to reduce model dimensionality and demonstrate an improved level of statistical rigor in our model development methodology. We also present an empirical study comparing our model to other common models.

Background
There is a vast literature on military recruitment, enlistment and retention. We focus on macro-and micro-economic studies. Macroeconomic studies are highly aggregated geographically, involving only a handful of national regions. Microeconomic studies are geographically disaggregated, typically at the ZIP code level. Due to space limitations, we here provide general findings based on a detailed discussion found in McDonald (2016).
Our review of pertinent literature on Armed Forces recruiting spans a 26-year period from 1985 through 2011. In total, macroeconomic studies of enlistment supply are helpful in describing the "big picture" of recruiting models. There seems to be some general agreement between these studies in the significance of a few select factors: unemployment, qualified military available population, veteran population and recruiter strength are of particular importance in this regard (Asch et al., 2009;Warner et al., 2001;Dertouzos andGarber, 2006, 2008;Dertouzos, 1985). However, these works have limited capacity to predict contract production for geographical areas corresponding to recruiting market boundaries. We therefore acknowledge the added value of microeconomic models in both their geographic specificity and use of validation data despite their reduced specificity in the response and difficulties in comparing fit adequacy (Gibson et al., 2011(Gibson et al., , 2009).
There are useful conclusions derived from the prior literature. First, the literature is dominated by econometric methodology. The econometric studies share a common objective of describing socioeconomic effects on recruiting over time. In nearly every study, this involved some form of regression. We use a similar quantitative modeling methodology albeit with useful extensions.
Second, there is some broad agreement that several factors are correlated with recruit production. Of these, geography appears to be statistically significant (Dertouzos andGarber, 2006, 2008). Unfortunately, we cannot conclude how non-geographic factor effects change with respect to geographic location; each study uses different geographic boundary definitions, none of which correspond to actual USAREC definitions. In any case, geography is statistically significant based on the total body of empirical results. We do note with some caution that statistics gathered at the microeconomic level may have greater measurement Complex geographical marketing areas error (Murray and McDonald, 1999). Thus, our methodology seeks an appropriate balance between required geographic specificity with respect to recruiting markets and data measurement errors inherent in higher resolution. We maintain a variable set broadly consistent with previous literature to provide a general basis for comparison. Interestingly, relatively little of the previous research specifies predictive models designed to produce forecasts into future time periods. Recruitinglike any private-sector marketing effort-requires decision-making (i.e. an irrevocable allocation of resources) in the face of uncertainty (Howard and Abbas, 2015, pp. 1-19). While the studies we reviewed provided some indication of how variables respond to time, most did not explicitly describe model performance in an out-of-sample time period or provide any kind of probabilistic statement regarding future behavior. This observation is a primary motivator for our methodology, specifically with regard to our model validation efforts.

Data description and cleaning
We collected data from USAREC and open sources. Our goals were to use variables mentioned in previous literature and provide a representative sample of pertinent supply and demand factors. In all, we collected data on 26 separate metrics for the period FY10 to FY14. See Table I for the definition of these first 26 variables. While data from recruiting leadership was available at the market level, open source data were collected almost universally at the county level. This presented us with a fundamental difficulty because recruiting market boundariesdefined by a set of ZIP codesare incompatible with political boundaries.
Because the market level data provided the greatest level of resolution, we devised a method of weighting county-level data to express it in terms of recruiting market boundaries. Let Z i ( Z be the subset of (m = 1, 2, 32846) ZIP Code Tabulation Areas (ZCTAs) within each unit i boundary. Let C be the set of (n = 1, 2, 3141) counties in the USA. Let C 0 i be the set of counties which intersect with a ZCTA in a unit's area of responsibility C 0 i fC \ Z i g . We then define a weighted statistic z ' , for unit i (scripts j and t are omitted because this definition applies to all variables and times) as: where I y m(n) : the proportion of county n population residing in ZCTA m, from the 2010 Census and Z n : the available discrete statistic for county n, where |z n | ≥ 1 (USA Census Bureau, 2015). We implemented equation (1) only for fractional data, applying [equation (1)] separately to the numerator and the denominator prior to dividing. We explored weighting a raw value such as population but found that aggregating to market levels produced a total value greater than the original. This is likely due to some double-counting in our formulation of z i 0 . However, we assume that similar over-estimation errors applied to the numerator and denominator of a single rate are likely to cancel each other out. The reasonability of our resulting weighted values further increased our confidence in this method.
We also took care to ensure our sample size of time series data were adequate. A common rule of thumb is to have at least 50 observations for model estimation (Montgomery et al., 2015, p. 39). Recruiting data in the model estimation set provided a suitable N = 45 observation. However, much of the county-level data proved available only at annual, semi-annual or quarterly intervals. We therefore expanded the county-level data to that of its recruiting counterpart by applying stochastic mean value imputation to the county data. This involved creating random realizations of monthly county-level data points along a trend-line between the observed annual data (Montgomery et al., 2015, pp. 18-19).

Modeling methodology
A common approach in the past research builds a linear regression or time series model involving numerous potential variables, seemingly without regard to multicollinearity. We used a more rigorous modeling process.
Variables 17-19 in Table I are used as dependent variables with all other variables as potential predictor or independent variables. Principle component analysis provided the basis for the response choice. Principle component analysis is also used to reduce the dimensionality of the predictor set of variables (and thus reducing multicollinearity when using the full set). The reduced set is modeled based on stepwise regression techniques using R 2 adj to judge model fit and studentized residuals to conduct residual analysis when assessing model adequacy. Model adequacy involved residual plots to assess constancy of variance and reasonable normality, whereas measures such as Cook's distance and Hatmatrix elements were used to examine potential outliers or influential points. Details of these analyses are found in McDonald (2016).

Data splitting
The primary purpose of our models is to predict future data. Model validation examines how well the estimated model performs in the presence of future data. Data splitting is used to conduct model validation. Observations t = 1, 2, T define the estimation set used in the model building processes, whereas observations t = T þ 1, T þ 2, T þ t define the validation set. In our data set, we let T = 45 and t = 15 as 15 to 20 observations are recommended to gain an adequate assessment of prediction performance (Montgomery et al., 2012, p. 375) and recruiting leadership begins setting missions a few months prior to the next full recruiting year. By adding three months to the validation set, we effectively re-create the decision situation from the leadership's point of view, predicting contract production over an extended planning horizon using only the data realized by the forecast origin, T.

Forecast metrics
The usual metrics of model fit such as R 2 Adj and R 2 Pred do not apply as forecast accuracy metrics. The following two metrics are used for assessing forecasting accuracy: where y t is the actual response at lead time T þ 1, T þ 2, T þ t andŷ t is the predicted value of the same (Montgomery et al., 2015, pp. 64-74). Because the mean absolute deviation is scale-dependent, we also use the mean absolute per cent error. We assume independent model and variable forecast errors for a one period-ahead forecast, providing a 100 per cent (1a) prediction interval as: where z a=2 is the (1a) per cent critical value,ŝ 2 a is the variance estimator regressing e t-1 on e t ,b 2 is the square of the coefficient for x obtained from the model andŝ 2 x is the NID(0, s 2 ) estimate of error variance obtained from the first-order autoregressive model of x (Montgomery et al., 2015, pp. 202-204). We generalize the results found in equation (3) for multiple independent regressors to obtain: where we denote the set of forecast regressors with x : To implement equation (4) in a predictive role, simple trend models are used to extrapolate independent factors into the future prediction periods.

Response and variable determination
The first phase of the work examined the current USAREC modeling approach. The current approach involves a response constructed using some of the data in Table I. Using the variable index indicated in Table I the output response used is: where: The current USAREC model is relatively parsimonious and produces a good fit (R 2 Adj ¼ 0:93). The model including x 29 provides a somewhat better fit. However, it contains a rather concerning issue in its formulation: the predictor x 29 is almost identical to the response, y 1 . To improve upon the USAREC approach, we therefore reconsidered how to approach the entire modeling process.
The first step was to reconsider the range of responses available. There are seven potential responses as summarized in Table II. Note the first response comes from the current USAREC approach and the second response is the variable x 29 removed from the current model. The other five are responses are found in Table I.   Complex  geographical marketing areas A principle components analysis was conducted on these responses and revealed loadings on two quantities, one being y 4 . Because responses y 1 , y 2 and y 7 are all ratio values, all involving the x 19 variable, the logical choice was to use responses y 3 , y 4 and y 5 . These provide meaningful responses for the model development and are easy to understand. We must note that the independent formulation of SA achieve, y 4 , as a response variable constitutes a remarkable departure from the current USAREC model, which lumps SA and GA Achieved (y 3 and y 4 ) together in the same response. Further discussion is devoted to why this is a sensible departure. Nevertheless, subsequent model development focused on each of these three responses, GA, SA and OTH production (i.e. contracts achieved), respectively.
Multiple regression methods using the three responses and the remaining 23 variables from Table I resulted in serious problems with multicollinearity. A principle components analysis was again used this time to reduce the dimensionality of the independent variable set. As Figure 1 indicates, five components account for approximately 79 per cent of the variance. The initial set of loadings are provided in Table III. Using the loadings and factors, along with context knowledge of the problem and a few iterations of principle component analysis, we finalized the set of five component variables as x 4 along with the derived variables:

Responses
How defined using Table I variables x 18 y 4 (SA Achieved) x 19 y 5 (OTH Achieved) x 20 y 6 (GA þ SA Achieved) x 18 þ x 19 y 7 (Contract Share) x 21 Note: Labels for each of the variables used found in Table I JDAL 1,1 x 30 ¼ x 25 =x 24 (9) x 31 ¼ x 15 þ x 17 (10) x 32 ¼ x 10 =x 23 (11) x 33 ¼ x 12 x 10 x 5 1 À x 7 ð Þ In general terms, x 4 captures unemployment rates, x 30 the ratio of appointments (a surrogate for recruiter effort), x 31 the combined GA and OTH mission, x 32 the SA contracts per recruiter and x 33 a measure of the young adults available. This reduction from 23 potential variables to just 5 variables is an important modeling consideration.
To conclude this portion, we provide the final correlation matrix for the five variables selected. Of the ten off-diagonal elements in Table IV, seven are less than |0.2|. We note Complex geographical marketing areas entries of 0.37 and 0.41, but these are still both less than 0.5 and are deemed not overly troublesome. Thus, we are confident that this reduced set of five variables are adequate for final model development.

Mixed stepwise selection
An initial examination via stepwise regression ruled out quadratic models or linear models with interaction terms. Thus, a pure first-order model is used. Further, based on Box-Cox analyses, all responses were transformed via the square root transformation to improve compliance with the constant variance assumption of the residuals. Despite the transformation, autocorrelation remained present in the residuals. Adding a first-order autoregressive term to the model alleviated this concern. Finally, we needed to model each of the battalions within USAREC. This was done using indicator variables for each market, while simultaneously allowing for the predictor effects to change between markets (i.e. we modeled categorical-continuous interactions). The final models, while seemingly complex, facilitated model adequacy analyses. Figure 2 provides the pertinent results of the model adequacy analysis. The models for GA, SA and OTH contracts are ordered top to bottom. On the left, we see no reason to doubt reasonable normalcy of the residuals and on the right side we see no reason to doubt the constancy of variance. Residual analysis for outliers and influential points revealed nothing of concern. Table V provides the summary of each model fit. While value of p, the number of terms in the model, is high, the vast majority of these are the indicator variables used to derive the individual battalion models.
Overall, we achieve substantial improvement in terms of model fit -600, 200 and 131 per cent for SA, OTH and GA contract types, respectivelyas measured by the estimation data R 2 Adj over the previous effort in (Dertouzos and Garber, 2008). Our improvement comes with some cost of increased p although this is a result of our use of more highly specific geographical market areas[1]. We refrain from direct comparisons to other studies, particularly that of Gibson et al. (2011) due to incongruities in the types of responses and units of observation used. Final models for each contract type are provided in Tables VI to VIII.

Validation forecasts
The real test of any quantitative model is how well it performs out-of-sample. For this study, the most recent 15 time periods of data were held out for validation purposes. Each model was forecast out for these 15 periods at a consolidated level (all contract types and battalions combined) and at a detail level (by contract type). Two prediction interval bands are provided, 80 and 95 per cent as analysts vary in how much risk to assume with respect to the certainty of the input independent variables driving the forecast. Figure 3 is a comprehensive look across all contract types. Within the model data, the prediction values (dark line) track nicely with the actual data (gray line). Tracking is less accurate in the validation data (as one should suspect) but overall not too bad. Figure 4 breaks this data out in the echelon format by contract type. Figure 5 provides a summary of the validation metrics defined for our effort; again, overall these results are very reasonable. We note with particular satisfaction the prediction intervals obtained using linear trend forecasts of the predictors themselves (i.e. "Unknown X"). Only during the very farthest regions of the forecast horizon do we see the actual data e xceed our prediction intervals for both 80 and 90 per cent probability. We have included the R 2 of each contract type at the aggregate level to compare the estimation models and previous literature for which only Dertouzos and Garber (2008) provide a reliable basis for a comparison; our validation R 2 achieves 530, 170 and 119 per cent relative improvements over the models estimated by Dertouzos and Garber (2008) for SA, OTH and GA contract types, respectively. This is a remarkable feat, especially considering the use of forecast inputs for predictor variables in our models.

Comparative analysis
The causal models developed are based on using multivariate statistical methods to obtain appropriate responses and parsimonious models; these goals were achieved quite well. However, any modeling effort involving many independent variables must answer the question of whether a simpler model would suffice. To this end each model was compared to a naive forecast, an appropriately fit seasonal autoregressive integrated moving average model and a seasonal smoothing method (e.g. the Holt-Winters or Brown's method). Figure 6 plots the  For each of the GA and OTH models, it is quite clear the multivariate models are the preferred approach, realizing of course that these models do already contain a single autoregressive term. For the more seasonal SA response model, the results were not so conclusive as the seasonal time series modeling approaches are comparable options. Overall,   Complex geographical marketing areas the collective result is that our effort to develop multivariate models is indeed rewarded with improved performance.

Conclusion
We have shown, through the use of multiple linear regression aided by increased geographic data specificity through the use of our ZCTA method, mixed stepwise selection methods and PCA, that improvements over previous efforts to model US Army enlistment contract production are possible. Moreover, we have shown that forecasts produced by the multiple linear regression modelswhich themselves require simple linear forecasts of the predictorsare robust for a relevant forecast horizon of up to 15 months. Indeed, the fit of the forecasts alone constitute remarkable improvements over previous models which did not use validation data and are worth the development effort when compared to simple time series models. In closing, we must note the unique ability of the multiple regression model in forecasting to consider the effect of inputs which are controlled to a degree by the decision-maker. While the regression model coefficients and standard errors are indeed based only on past data, our models indicated rather high statistical significance of future "controllable" inputs such as recruiting goals and these inputs should be likewise considered in a futuristic sense when the firm is producing forecasts. Only a causal model such as the one afforded by lagged multiple regression affords such an opportunity for exploration. We hope to have successfully informed subsequent discussions on behalf of such a method.