A Quantitative Analysis for Qualitative Research

This article summarizes development of a general data analytic system that is speciﬁcally designed for qualitative research paradigms. The purpose of the article is to illustrate how Association Rule Analysis can be used to analyze qualitative research data. The article presents several examples of how the ARA would be applied in a wide spectrum of Qualitative research.

endeavors along with examples of how this analysis would apply in a variety of different qualitative studies.We end with a discussing of specific issues that are topics for future research and development.
The sources of data for qualitative research are equally diverse.They include interviews (Driscoll, 2011), studies of online behavior (Netnography, (Kozinets, 2019;Rollman, Krug, and Parente, 2000), literature reviews (Bramer, Rethlefsen, Kleijnen, & Franco, (2017), Case-Oriented Understanding (Swanborn, 2010); Suler, 2000) forecasting, (Parente and Finley (2020) and Mixed Methods Research (Cresswell & Plano-Clark, 2018.)What sets qualitative research apart from quantitative research is that it usually collects and analyzes nonnumerical data such as words, text, interviews, written records etc..The goal is to comprehend peoples' opinions, attitudes, emotions, perceptions and reactions to experience.For example, a researcher may interview survivors of Covid19 concerning their recovery challenges, evaluate feelings of elation immediately after childbirth, or study the daily life experience of a homeless person.
Perhaps the most common type of qualitative study involves the consensus opinion of human evaluators who independently scrutinize the verbal descriptions of individuals' who experienced an event and then provide their assessment of themes that emerge.It can be used to generate plausible hypotheses, to describe a process of perceptual change that occurred over time, or to explore plausible explanations of a phenomenon of interest.The result of a qualitative study is a description of the person's or group's state of mind in a particular situation and a reasonable explanation of the events that shaped the perception of that experience.

Advantages and Disadvantages
Qualitative methods are uniquely suited for capturing aspects of a phenomenon to which conventional numerical analyses may be insensitive.Qualitative studies do not usually require large numbers of participants.There are a number of excellent textbooks that provide guidance for conducting qualitative studies (Caswell & Poth, 2018;VanManen, 2014;Russman, 2008).There are also a variety of avenues for publication of findings (e.g., The American Journal of Qualitative Research (https://www.ajqr.org); Qualitative Research Journal (https://journals.sagepub.com/home/qrj);The Qualitative Report ( https://tqr.nova.edu/journals);Qualitative Research Journal (https://journals.sagepub.com/home/qrj).Perhaps the biggest advantage of qualitative research is that it allows the participant to describe his or her experience firsthand, that is, in their own words, without the limitation or constraint of psychometric procedures that transform firsthand experience into second-hand numerical ratings.
Qualitative methodology is not without its potential problems.For example, the consensus of evaluators is often used to objectify results however, objectivity defined in this way, is nothing more than collective subjectivity which may still be prone to bias.There are relatively few well-controlled research designs that filter out alternative explanations of the data.Perhaps the biggest problem is that because the unit of analysis is usually words or text, there is no comprehensive analytic model that supports the range of research applications listed above.

Proposed Analysis
This paper illustrates how data science can be used to extract significant consistencies from words or themes that define a persons' unique phenomenological experience.The analysis provides a method for significance testing of results that can be applied to most qualitative studies.The core feature of this system is the "Association Rule" (Webb, 2003) that identifies relationships among words in text.Association Rule Analysis (ARA) is a pattern recognition procedure.The goal of the analysis is to generate rules for predicting one set of events from another (Han, Pei, & Kamber, 2011).It is specifically designed for analyzing associative relationships in text.Our goal here is to illustrate how the ARA can be used for generating anddeveloping hypotheses within the broader domains of qualitative research.As a prelude to this discussion, we begin with a basic description of ARA.

Association Rules
The concept of ARA is can be explained via a shopping cart analogy (Agrawal & Srikant, 1994;Webb, 2003).Each shopper's purchases are scanned by the retailer and added to a larger database of words (e.g., lemons, steak, cheese etc.) that describe the collection of customers' purchases.The ARA extracts rules from this data base that determine which items will likely be purchased with other items.For example, if a customer buys steak then they also buy cheese.Association rules are often seen in internet advertising in statements such as "customers who bought this item also purchased . . .".Although the ARA has been used primarily for marketing research, there have been some recent applications outside of the marketing domain.For example, Parente & Finley (2018) and Finley & Parente, (2019) used association rules with brain injury survivors to measure the relationship between organization and memory of unrelated words.
The mechanics of ARA are based on the assumption that the occurrence of any event affects the co-occurrence of other events.The ARA computations are based on dependent probabilities which can be quite complex and usually require software to expedite the computations.For example, the ARA computations are included in larger data mining software packages (e.g., SAS -https://www.sas.com/en_us/software/stat.html, SPSS modeler -https://www.ibm.com/products/spss-statistics,BigMLcom) or as standalone software (e.g, KHcoder -https://khcoder.net/en/ ) .Without going into the computational minutia, it is sufficient to say that the output from ARA is a set of probabilities that expresses the co-occurrence of the participant's verbal descriptions within the inquiry.Each of these probabilities (called "rules") can be tested for significance either with conventional methods (e.g., p < .05)or by replication with a holdout sample.
Conventional ARA analyses associates "antecedent" and "consequent" events.These terms are similar to "independent" and "dependent" variables or "predictors" and "outcomes" in conventional statistics.These associations are the rules discussed above (Balcazar & Dogbey, 2013;Parente & Finley, 2018) that define the co-occurrence between the antecedent and consequent events.Generally, the more significant rules the ARA identifies, the more related are the words in the text.However, there are other measures that can be used to evaluate the utility of any particular rule.For example, the software also produces a measure called the Lift which is an index of the predictive value of the rule relative to using no rule at all.The ARA also provides significance testing to identify which rules best describe the associations that exist in the word set.Finally, if the data set is large enough, the researcher can use half of the data as a "training sample" to generate the rules and use the other half as a "hold out" sample that can be used later to replicate the rules that are generated in the "training" sample.These concepts are best described by example.
Because space does not permit an example of how the ARA would be applied in all of the areas of qualitative research listed above, we have selected five diverse areas for which we have re-analyzed data from our published studies using the ARA model.
Example 1. Grounded Theory .This type of qualitative research involves collecting data with the goal of supporting, developing, or testing a theory (Mitchell, 2014).For example, Magalis (2020) explored qualitative differences in the experience of "quantitative anxiety".To do so he assessed the words that college students chose to describe their fear of math and statistics.He noted that all of the existing investigations of these anxieties used quantitative self-report scales.He reasoned that having students generate their own words to describe their perceptions of math and statistics would provide a clearer perspective of the experience.Specifically, he questioned if the students would generate similar words to describe their experience with math versus statistics.
He gave participants either a math problem (Quadratic Equation solution) or a statistics problem (Standard Deviation computation) to solve and then asked them to generate as many words as possible that described their experience while solving the problem.He then applied the ARA (Webb, 2007) to derive significant association rules that described relationships among the words that participants independently chose to describe their math and statistics computational experience.
The ARA identified two rules that described the statistics participants' word choices.These rules indicated that the words "Anxious" and "Confused" were significant descriptors of the statistics experience.
Table 1 Rules that describe students' experience of statistics computation.____________________________________________ Specifically, a significant number of participants (p < .05) in the statistics problem group independently chose the words Anxious and Confused to describe their feelings when doing the statistics computation.The same analysis applied to the words generated after the math calculation did not reveal any significant word choice consistencies.The Lift values of 2.2 for each word suggest that both words were equally predictive of the student's statistics experience.These results generated two hypotheses which could be tested with future research.First, the fact that the significant word choices were only apparent for the statistics problem suggests that statistics anxiety and math anxiety are not the same emotional experience.Second, confusion is an underlying component of statistics anxiety but not math anxiety.
Example 2. Conversation Analysis .This type of qualitative research studies social interaction based on verbal and non-verbal conduct (Sidnell, & Stivers, 2012).For example, Silver & Parente (1994) explored the characteristics that determined a person's perceived attractiveness after a simple conversation that occurred during a first encounter.They had college students who were strangers meet and carry on a conversation for approximately 10 minutes before generating words that described the person's first impressions.Each person's data set yielded a number of words that described their perception of the other conversant.Seventy participants were randomly assigned to a training group and the remaining 70 were used as a holdout sample.The ARA identified two rules that were significant in the training sample and that replicated in the holdout sample Table 2 Rules that describe the perception of attractiveness during a simple conversation.

_____________________________________________________
These results generated two hypotheses.First, words that describe physical characteristics did not predict in this college student sample which suggests that cognitive factors take precedence over physical characteristics when predicting attractiveness.Second, an attractive person is one who makes another think and who commands a persistent memory.
Example 3. Mixed Methods .Mixed methods research uses both qualitative and quantitative analyses (Cresswell & Plano-Clark, 2018.).The researcher may use one method to answer some research questions whereas the other method is used to bolster, clarify, or to further explain the results.Parente & Finley, (2018) used association rules to assess the relationship between organization and memory (Tulving, 1966).These authors computed the ARA on participants' free recall of a word list.The qualitative component to the study involved the derivation of association rules from the participants' free recall of 12 words across 12 study and test trials.The quantitative portion of the study involved computing numerical measures of: Short Term Storage, Long Term Storage, Long Term Recall, and Consistent Long Term Recall (Buschke,1973).The hypothesis that was being tested in this study was that each participant would develop a unique "subjective organization" (Tulving, 1966) of the words and the number of ARA rules would serve as an index for the strength of that mentation.It was therefore reasonable to suggest that if organization is related to memory, then the number of rules derived from the ARA would correlate significantly with various types of memory that derived from the Buschke (1973) scoring procedure.
The results indicated that the number of rules generated by the ARA were significantly correlated with the SRT measures.The authors discussed how the ARA could be used as a diagnostic tool for assessing a person's ability to organize information in memory.
Example 4. Thematic Analysis .Thematic analysis is, perhaps the most common qualitative research method (Guest, McQueen & Namey, 2012).It involves extracting the consensus of observers regarding themes that are apparent in text.Parente, (2020) described a procedure for extracting themes from written text obtained from the public domain.This procedure analyzed emotional themes extracted from written self-descriptions with no constraints imposed on the writing process.The themes could then be analyzed with the ARA to assess which of them governed the participant's self-perceptions.
Data obtained from a public domain website (www.match.com)included brief paragraph self-descriptions from a cohort of women in their 60s.Parente (2020) selected 20 posts from each of four groups of women who were either: divorced, separated, widowed, or never married.He then extracted themes from each person's self-description that reflected their emotional content.
The IBM Watson Tone Analyzer software (https://www.ibm.com/cloud/watson-tone-analyzer ) was used to evaluate the emotional themes that were apparent in the paragraphs.The software analyzes the content of the text for several emotional themes including: Joy, Sadness, Anger, Confidence, Fear, and Analytic.Although human scorers are usually used to extract thematic content, the Tone Analyzer software was used here because it is based on specific rules and the results were therefore objective and replicable.The goal of the study was to assess whether the participant's life situation (never married, separated, widowed, divorced) produced different emotional themes that were apparent in their self-descriptions.The ARA analysis yielded four rules that replicated in the holdout sample.
Table 3 Rules that differentiated women of different marital status.

_________________________________________________
The association rules indicate that women who were never married described themselves as analytical.Widowed women were sad and separated women were angry.Divorced women were joyful.None of the women's self-descriptions expressed a theme of confidence or fear.
The results generated several hypotheses: First, these data came from women who were over 60 years old and different results might occur with younger populations.Second, men would likely show different emotional themes.Finally, other marital status groups (e.g., living together, happily married, LGBQ partners) might also evidence very different emotional themes.
Example 5. Case Oriented Understanding .Case oriented qualitative research involves an in-depth and detailed study of individual cases (Swanborn, 2010).For example, Parente, et al., (1981) developed a computer assisted method of counseling that involved collecting personal data from therapy clients over several weeks.Personal data included measures of behaviors that were problematic for the client (called targets) along with others that the client felt were related to the targets.Each client generated a set of measures that were unique to their lifestyle and experience.Computing measures of association among the different variables and interpreting these correlative relationships to the clients was sufficient to change their physical or mental state, specifically, to reduce anxiety, to lower blood pressure, and to reduce incidents of stuttering.
Parente & Herman (2010) describe a similar method of "behavioral charting" that involves self-ratings of target behaviors along with other behaviors that the client felt were related to the targets.The therapy began with a discussion of which everyday behaviors the person felt were germane to his or her life situation.For example, the client in this example thought that their thinking skill were related to the number of cups of coffee they drank each day, their perceived levels of depression and anxiety, the severity of their headaches, memory functioning, attention span, and their overall energy level.The therapist and client created a data sheet that allowed the client to document the frequency with which the behaviors occurred.These data were analyzed using the ARA analysis which yielded the following association rules which generated several hypotheses.Table 4 Rules that associate individual case self-ratings.Headaches impair thinking and memory.2. Depression impairs memory.3. Depression worsens with anxiety.4. Energy level and memory facilitates thinking.These rules are then used to direct the therapy process.Each rule generated a testable hypothesis that could be evaluated via simple changes in the person's activities of daily living.For example, any activity that increases energy, for example, increased exercise, may also improve thinking.Medication or any activity that reduces or eliminates headaches will improve thinking and memory.Activities that reduce anxiety (e.g., YOGA) may lessen depression which may also improve memory.This process may therefore be especially helpful to clinicians when planning treatment interventions.

Discussion
This research is a demonstration of how ARA can be applied in a variety of different qualitative research contexts.There are several advantages to using ARA with qualitative data.First, the analysis is designed specifically for use with words and text which is the typical scrutiny in qualitative studies.Second, the analysis provides for significance testing of results which may be especially useful in those studies that involve developing hypotheses (e.g., Framework Analysis, Grounded Theory).The ARA also provides for holdout sampling as an additional index of significance through replication.Third, the ARA can be adapted to most, if not all, of the qualitative research paradigms listed above.
The choice of ARA as an analytic tool requires some degree of caution.For example, there are not many generally available, user friendly, and comprehensive software packages for doing ARA computations.The analysis may produce complex rules with multiple antecedents and consequents that are difficult to explain.There are few instructional guidelines for interpreting the rules.Most journal editors will probably not be familiar with the concept of association rules.This paper is, perhaps, one of the first published research study that uses ARA to analyze psychological data (Parente & Finley, 2018;Finley & Parente, 2019).Consequently, our limited experience with ARA has produced as many questions as answers.Some of the more important areas of future research interest are presented below:

Measures of Association
Perhaps the most straightforward index of association is the number of significant rules the ARA identifies.Generally, the more significant rules, the stronger the overall association among the words in the collective.In addition, the lift measure can function as an index the strength of the rules.It is also easy to interpret.As the value of lift increases, so does the predictive value of the rule.Although there are other measures (eg., Confidence , Strength ,Leverage , Support and Coverage) that are potentially useful for interpreting qualitative research results (Balcázar, & Dogbey, 2013), there is a dearth of research that would inform the use of one measure versus another.

Verification
Whenever possible, ARA analyses should include both a training and holdout sample.Clearly, if a rule is significant in the training sample but fails to replicate in a holdout sample then the rule is probably unreliable.Generally, holdout samples should be used when the purpose of the study is to generate a hypothesis that is likely to generalize to a larger population.However, there are some exceptions to this rule of thumb.For example, holdout samples may not be useful in single case research where the results are not intended to generalize beyond the individual participant who generated the data (Example 5).It may also be impossible to construct a holdout sample in pilot research (Example 1) where the sample size is too small to allow for a meaningful subdivision of the data.

Interpretation
How does one explain an association rule?Basically, association rules identify a significant co-occurrence of words in the data set.However, beyond that description, the interpretation of the rule depends upon alternative meanings of the words and the context of the research.For example, in Example 2, some participants used the word "hot" to describe the person with whom they were conversing.However, if the experiment was conducted on an especially hot day, it is also possible that that this word was meant to imply that the person was overly warm.It is therefore necessary for the researcher to interpret rules in light of any currently trending word meanings and the situational context.Association rules can also be complex which creates additional interpretive difficulties.For example, if in Example 5, a rule such as Depression & Anxiety -> Memory if significant, would suggests the interaction of the two antecedents had a greater impact on the consequent than either alone.This interpretation becomes even more problematic with rules that contain both multiple antecedents and consequents.Complex rules are also less likely to replicate in a holdout sample.
The issues and the limitations discussed above may take quite a while to explore.In the meantime, we assert that the ARA model is ready for use with qualitative research data.