Developing a Solution to the TRADOC Analysis Center’s Big Data Problem: A Big Data Opportunity

: As data production, collection, and analytic techniques grow, emerging issues surrounding data management and storage challenge businesses and organizations around the globe. The US Army Training and Doctrine Command’s Analysis Center (TRAC) is no exception. For example, among TRAC's many tasks are the evaluation of new materiel solutions for the Army, which typically necessitates the use of computer simulation models such as COMBAT XXI. These models are computationally expensive


Introduction and Background
The mission of the US Army Training and Doctrine Command's Analysis Center (TRAC) is to "produce relevant and credible operations analysis to inform decisions" (TRADOC Analysis Center [TRAC], 2017a), and one of its primary tasks is to conduct Analysis of Alternatives (AoAs), which are the "analytical comparison of the operational effectiveness, cost, and risks of proposed materiel solutions to gaps in operational capability" (Carlucci & Zoller, 2016, p.1).Through its AoAs TRAC provides the evidence to support multi-billion dollar decisions about what systems the Army should acquire, and operational scenarios, simulation, and statistical analysis are among its primary analytical tools.Data is its currency.
Data, or more accurately big data, has drawn significant attention from almost every field in the past decade.A relative term, big data refers to an amount of data which cannot be captured, stored, managed, processed, and analyzed by typical database software or hardware programs (Arthur, 2013, p.1).To put this in context, in the early 1980's, the Commodore 64, the "best-selling single computer model of all time," was released to the public (Griggs, 2011).As per its name, it had 64 KBs of internal, non-removable storage (i.e., RAM); today, mobile phones hold an average of 64 GBs.In roughly four decades, the data storage capacity of personal computing devices has increased nearly a million-fold, and the exponential proliferation of data is largely to blame.Quite simply, data is getting big.
TRAC encounters big data through the use of Combat XXI -a stochastic, high resolution simulation software, which represents land and amphibious warfare from the individual solider to the brigade combat team level (Combat XXI).According to simulation experts at TRAC, an average Combat XXI iteration generates up to 300 MB of data, which quickly stresses TRAC's current ability to archive results when 30 or more iterations are often necessary to achieve statistically significant results (TRAC, 2017b).Moreover, when running Combat XXI simulations, there are vast quantities of information and data that TRAC is currently unable to store and analyze.Given the ability of data analytics to reveal hidden insights, TRAC believes key findings about previously unexplored options, methods, and equipment may be lost.
With this in mind, TRAC is currently working on a big data initiative.Specifically, within five years TRAC wants to be capable of systematically looking through all the data produced by its simulations, allowing it to repurpose and utilize the results of past experiments for current or future work.As such, TRAC is looking at alternate ways to store, manage, and analyze data.Among the many exciting possibilities of this enhanced capability is the ability to perform large scale design of experiments.To this end, a use case was developed that reimagines Combat XXI as an exploratory tool -one which will enable TRAC to gauge what simulation parameter settings yield operationally relevant and statistically significant results, thereby prescribing what a system's requirements should be.

Methodology -The Systems Decision Process
To help TRAC realize its big data opportunity, several methods common in systems engineering were applied, notably the Systems Decision Process (SDP).As seen in Figure 1, the SDP contains four phases, each with three key tasks and a result that allows the user to transition to the next phase.Its cycle is continuous and can be used in several iterations to produce refined solutions that better meet stakeholder needs.

Problem Definition
The SDP's first phase, denoted in red, is problem definition, which includes the key tasks of research and stakeholder analysis, functional requirements analysis, and value modeling.This initial phase is critical because it frames the problem and ensures the right problem is being solved; accordingly, the end result of problem definition is an approved problem statement.With this in mind, the study team initiated the problem definition phase with stakeholder interviews.These interviews were conducted over several months, including teleconferences, video-teleconferences, and a site visit to TRAC Headquarters in Fort Leavenworth, Kansas.Many of the key insights gained through these interviews are included in Section 1, and they led directly into the subsequent functional requirements analysis.For example, the primary stakeholder (and study sponsor) COL David Tarvin, TRAC's Deputy Director, wants a system that can analyze previously gathered data in order to gain insights on current and future problems (personal communication, October 4, 2017).To enable this, the system must efficiently store large amounts of data, as well as ensure the security of classified data, due to the sensitive nature of TRAC's work.The next step was to begin value modeling by creating a qualitative value model, which has several components that paint the overall picture of the system's requirements.First is the fundamental objective, which is the highest level objective that needs to be satisfied for the system to be successful (Parnell, Driscoll, & Henderson, 2011, p. 326).Second and third are the functions and sub-functions, which capture what the system must do to accomplish the fundamental objective.Fourth are the objectives, which are always listed as maximize, minimize, or optimize some component or factor of the system.Last are the value measures, which are scales describing how objectives are measured and always state a factor, the units, and more is better (MIB) or less is better (LIB) (Parnell, Driscoll, & Henderson, 2011, p. 327).The fundamental objective and top-level functions of TRAC's qualitative value model are given in Figure 2.

Figure 2. Summarized Qualitative Value Model
The last step in the problem definition phase is transitioning the qualitative value model into a quantitative value model, which shifts the analysis towards a more objective, technical approach.First, each value measure received an associated value function, which mathematically transforms measured performance into stakeholder value.Specifically, bounds for each value function were defined by specifying three points: what input would produce no value (minimum), what input would produce 100% value (maximum), and what input would produce 50% value (midpoint).For example, Figure 3a on the next page shows the value function for the Amount of Data value measure, which is defined as the amount of data saved and stored per iteration within the design of experiments.The function is linear and corresponds to a MIB objective.It ranges from 100 MB (the largest amount of data TRAC has encountered to date) to 200 MB, with a midpoint at 150 MB (50% value).Whereas the Amount of Data value function is rather simple, Figure 3b depicts a slightly more complicated value function for Probability of a Failed Iteration.This value function assesses the value associated with the chance that any one iteration of the design of experiments fails for any reason (e.g., workstation crashes due to insufficient memory, software cannot process the input, etc.).This value function is less intuitive because it is nonlinear, meaning that the returns in value are not directly proportional to the changes in the value measure.It models a LIB objective, with a maximum value at 0, a midpoint at 0.05, and a minimum value at 1.The function displays a steeply decreasing shape between 0 and 0.2 because as the probability of a failed iteration increases, failures across workstations will compound, drastically increasing the number of runs necessary to complete TRAC's simulation experiments and creating exponentially less value.Following their construction, these value functions, along with the others, were sent to the study sponsor for his adjustments, approval, and ranking.
Armed with the quantitative value model, as well as stakeholder and requirements analysis, the refined problem statement became clear, namely: Develop a system to leverage data science/data analytics, in conjunction with TRAC's Combat XXI simulation capability, to derive defensible requirements for future materiel solutions.To this end, a use case was developed to inspire potential solutions.Specifically, the study team envisioned a system where Combat XXI is constantly simulating new materiel systems across varied scenarios, thereby generating the data necessary for TRAC to perform prescriptive analysis of what requirements and capabilities future warfighting systems would need to increase effectiveness on the battlefield.The ultimate goal of this use case is to place TRAC at the forefront of the requirements generation process by focusing on analysis that justifies the need for new equipment.

Solution Design
In the SDP the second phase is solution design, which consists of three main areas of interest: idea/alternative generation, cost analysis, and alternative generation/improvement.In order to expand the design space, this phase leverages divergent thinking by encouraging the generation of broad, creative ideas for solving the decision maker's problem.Ultimately, this phase produced multiple alternatives for TRAC's future data analytics architecture.To generate alternatives that broadly cover (or span) the design space the study team used Zwicky's Morphological Box (Parnell, Driscoll, & Henderson, 2011, pp. 361-363).As seen in Figure 4, the columns of the box contain options for key components of the data analytics architecture, and alternatives are built by selecting a single option in each column.For example, the first alternative (Full Monty) selects the presumably most valuable option from each column.It will likely be the most expensive alternative, and it helps the decision maker to understand the highest performing alternative available.At the other extreme is the second alternative (Bare Bones), which selects the least valuable option from each column and will most likely be the cheapest alternative.The third alternative (Middle of the Road) seeks moderate value in each column, allowing the decision maker to understand what TRAC can have with modest improvements across all columns.The next two alternatives (Server Heavy and Simulation / Analysis Heavy) emphasize value in their respective columns and accept lower performance otherwise.The final alternative (Off Property) looks at remote options TRAC can utilize.It is a unique alternative that will give the decision maker a different perspective on the problem, potentially leading to new full or partial solutions for TRAC.

Focusing on Decision Maker Value with the Weighted Component Influence Model
Although Zwicky's Morphological Box is an effective tool for generating alternatives that span the design space, it does not explicitly consider how component options map to the decision maker's value functions and their relative importance.In order to bridge this gap the study team devised a new method -the Weighted Component Influence Model (WCIM).This method takes into account the stakeholder's interests and highlights design components that should be maximized to yield the most value.It starts by having the stakeholder rank the value functions in order of importance.In this case, the study sponsor binned the value measures into three groups (very important, important, and less important), which he subsequently rank ordered within each group.The study team subsequently used this input to assign a global importance rank to each value measure, where the most important value measure received the highest rank ( 17) and the least important value measure received the lowest (1).Next, these ranks were normalized, yielding normalized importance values.
Design components were then assessed for the extent they impacted the value measures, and they received ratings of little impact, medium impact, or high impact.As an example, the study team concluded that the total amount of storage has a large impact on the value measures Number of Iterations and Amount of Data.Specifically, the amount of data that is produced and collected per iteration is constrained by the total amount of storage available.Furthermore, the total number of iterations performed depends on the total amount of storage that can be filled, as more iterations generate more data, which requires more space.Similarly, the workstation's processor has a large impact on the Number of Internal Factors and Total Wall-Clock Time per Iteration.After all, if a more powerful processor is used, an iteration can be completed in less time, thereby decreasing the Total Wall-Clock Time per Iteration.Consequently, the cumulative time to execute a given number of iterations will decrease, affording more time to investigate a larger Number of Internal Factors.The remaining design components were assessed in a similar manner, and following these assessments, the impact ratings were assigned the arbitrary values of 0.5, 1, and 1.5 for little, medium, and high impact, respectively.This allowed the study team to take the impact values of design components, multiply them by the normalized importance values of the value measures, and sum the results across each component.
The summarized output of the WCIM methodology is a list of values that helps explain the importance and priority of each design component to the decision maker (see Figure 5).These components are subsequently emphasized in the alternative named WCIM Informed, which is denoted by the white arrows in Figure 4.The end state of the solution design phase is to create a list of candidate solutions which will be screened, scored, and compared in more detail within the next phase of the SDP, namely decision making.

Decision Making
The decision making phase of the SDP consists of three main activities: value scoring and costing, sensitivity and risk analysis, and improvements and tradeoff analysis.The alternatives are screened and scored using data collected from stakeholders, research, and modeling, allowing the alternatives' performance to be quantified and ranked to see which alternative is the most beneficial.The alternatives' values are then plotted against their respective costs to show which alternatives are optimal for the decision maker.Once complete, sensitivity analysis examines how changing the weights of certain values impacts the overall scores, illuminating the role of uncertainty and allowing risk informed trades to be made.Finally, the study team will take these results and look for ways to improve the solution, ultimately recommending an alternative that best satisfies TRAC's needs.

Conclusion and Future Work
Through the SDP the study team has developed alternatives that will allow TRAC to capitalize on its big data opportunity.In particular, by designing a holistic big data architecture that accounts for data storage and simulation hardware, as well as analysis software, TRAC can realize its use case to leverage Combat XXI as an exploratory tool to inform defensible system requirements.In the near future, the study team will work with TRAC to find the option that best meets its needs, wants, and budget, and the goal is to have the SDP's decision making phase complete by the end of April 2018.
After TRAC makes a decision, the next step is to execute the fourth and final phase of the SDP -solution implementation.This phase analyzes the planning, execution, and controlling aspects of the decision, and it ensures TRAC's expectations are realized.Once the architecture has been fully implemented, a potential area for future research is how TRAC's approach can be leveraged in other analytical organizations across the Army.

Figures
Figures 3a.(left) and 3b.(right) Value Functions for Amount of Data and Probability of a Failed Iteration

Figure 5 :
Figure 5: Design Component Influence and Prioritization Based on WCIM Methodology