Problematic internet use (PIU): Associations with the impulsive-compulsive spectrum. An application of machine learning in psychiatry

Problematic internet use is common, functionally impairing, and in need of further study. Its relationship with obsessive-compulsive and impulsive disorders is unclear. Our objective was to evaluate whether problematic internet use can be predicted from recognised forms of impulsive and compulsive traits and symptomatology. We recruited volunteers aged 18 and older using media advertisements at two sites (Chicago USA, and Stellenbosch, South Africa) to complete an extensive online survey. State-of-the-art out-of-sample evaluation of machine learning predictive models was used, which included Logistic Regression, Random Forests and Naïve Bayes. Problematic internet use was identified using the Internet Addiction Test (IAT). 2006 complete cases were analysed, of whom 181 (9.0%) had moderate/severe problematic internet use. Using Logistic Regression and Naïve Bayes we produced a classification prediction with a receiver operating characteristic area under the curve (ROC-AUC) of 0.83 (SD 0.03) whereas using a Random Forests algorithm the prediction ROC-AUC was 0.84 (SD 0.03) [all three models superior to baseline models p < 0.0001]. The models showed robust transfer between the study sites in all validation sets [p < 0.0001]. Prediction of problematic internet use was possible using specific measures of impulsivity and compulsivity in a population of volunteers. Moreover, this study offers proof-of-concept in support of using machine learning in psychiatry to demonstrate replicability of results across geographically and culturally distinct settings.


Introduction
The Internet has become an integral part of modern life, and has given rise to a wide range of problematic behaviors associated with its use (Cao et al., 2007). Some of those behaviors, like excessive online gaming, online buying and gambling, frequent email checking, prolific use of social media, and viewing pornography have been reported to cause significant impairment of everyday functioning of some individuals, to the extent that mental health professional help is sought or national health authorities are concerned (Choi, 2007;American Academy of Pediatrics, (2015)).
Epidemiological data have been gathered over the last two decades on problematic internet use (PIU) but the findings are mixed. Ko and colleagues (Ko et al., 2012) reported a prevalence of internet addiction that ranged from 1% to 36.7%. This huge variability in prevalence rates across studies could reflect differences in the assessment tools and different operational definitions of PIU behaviors. Other factors that might have contributed to this disparity of prevalence between studies are social, cultural, and demographic differences and inconsistencies of internet access. Could PIU represent a disorder in one country, but not a valid or relevant concept in another? In fact, internet activities are so widespread in 21st century youth, that there is anecdotal evidence that they have become an inescapable social norm (Wallace, 2014).
On an individual level, there have been strong suggestions that these PIU behaviors are linked with relationship difficulties, failure to thrive academically, and financial problems (Chang and Man Law, 2008;Kir aly et al., 2015). Particularly young internet users have been reported to use online gaming compulsively, to the exclusion of other interests, and to experience significant impairment and distress as a result. Additionally, there has been anecdotal evidence of serious physical harm and death by cardiovascular collapse, the majority reported from East Asian countries, but also one case in the UK, in individuals who have engaged in 'marathon' internet sessions (more than 24 h of continuous activity) of mass multiplayer online gaming (Tam and Walter, 2013;Kir aly et al., 2015).
The most recent literature suggests that some of these PIU behaviors are strongly linked with well identifiable mental health problems (Carli et al., 2013;Ho et al., 2014). A meta-analysis of eight studies comprising a total of 1641 patients with internet addiction and 11 210 controls found high correlations with mental disorders, including disorders of addiction e.g. alcohol use disorder (OR ¼ 3.05) (Ko et al., 2008a;Yen et al., 2009a), affective disorders e.g. depression (OR ¼ 2.77) (Ha et al., 2006;Ko et al., 2008b), anxiety disorders (OR ¼ 2.70) e.g. generalized anxiety disorder (GAD), social anxiety disorder (SAD), obsessive-compulsive disorder (OCD), and attention-deficit hyperactivity disorder (ADHD, OR ¼ 2.85) (Yoo et al., 2004;Yen et al., 2007Yen et al., , 2009). The precise mapping of PIU onto other forms of psychopathology and other dimensions of behavior, like impulsivity and compulsivity, however, is relatively unexplored, and the associations derived from these studies are made under the not necessarily true assumption of a linear model, and have not been validated in terms of whether they really allow prediction of the presence of PIU. Further research is required as to how to fit the observed behavioral phenotypes of problematic internet use into a reliable and valid taxonomical system.
Machine learning (ML) is a subfield of computer science that involves the construction of algorithms that can learn and make predictions on data (Hastie et al., 2008). The main overall difference between traditional statistical models and machine learning techniques is that the latter enable prediction, usually on very few assumptions about the data (Breiman, 2001;Bishop, 2006). Traditional statistical models also enable prediction but usually based on specific assumptions about the data. In our study, we hypothesized that specific measures of impulsivity and compulsivity (self-rated ADHD symptoms (Kessler et al., 2005), along with questionnaire-based measures from the Barratt Impulsiveness Questionnaire (Patton et al., 1995), and the Padua Obsessive-Compulsive Inventory (Burns et al., 1996)) would allow construction of ML algorithms for the prediction of PIU in a population of volunteers. Furthermore, we hypothesized that the performance of the prediction models only including a baseline set of demographic and clinical variables would be enhanced significantly if impulsivity and compulsivity variables were added as predictor variables. If true, such results would be indicative of internet addiction having potentially clinically relevant relationships with these other types of symptomatology. Further reasons for using ML in this paper are described in the supplement (eMethods 1).

Setting and measures
The current study was conducted from January 2014eFebruary 2015. Individuals aged 18 years and above were recruited at two sites: Chicago (USA) and Stellenbosch (South Africa) (mean age 30.1 [18e88]; 1316 males [65.6%]; 1447 Caucasian [72.1%]) using internet advertisements. The advertisements asked individuals to take part in an online survey about internet use. Participants completed the survey anonymously using Survey Monkey software. The survey was sent through Craigslist so only participants from the specific locales were targeted. The study was approved by the institutional review boards at each research site. Participants received no compensation for taking part but were enrolled in a random lottery whereby five prizes were available with each prize valued between $50 and $200 in USA and three prizes between ZAR250 and ZAR750 in South Africa.
The IAT comprises 20 questions examining facets of PIU. Scores on the IAT range from 20 to 100 with 20e49 reflecting mild Internet use, 50e79 moderate Internet use, and 80e100 reflecting severe Internet use. The MINI is a brief structured interview for the major Axis I psychiatric disorders in the DSM-IV and ICD-10. For the purposes of the study, the MINI was adapted for self-administration and only included the OCD, SAD, and GAD modules. The latter was done to limit the length of the survey and ensure high completeness. The PI consists of 39 items assessing common obsessional and compulsive behavior. The ASRS-v1.1 is a self-report screening scale of adult ADHD. The BIS-11 is a self-report questionnaire used to determine levels of impulsiveness.
Only data of participants who completed the entirety of the online survey were included in the analyses. The original sample included 2566 individuals. 63 individuals were excluded for lacking IAT scores. Eighteen individuals were excluded for reporting a transgender gender. A further 474 individuals were excluded for missing important predictor variables e.g. ASRS, PI or BIS questionnaire scores. Five individuals were excluded for reporting age less than 18 years old. The final full set included 2006 individuals with complete scores in all variables. This final full set included 1316 individuals from the Stellenbosch site and 690 individuals from the Chicago site. All continuous predictors (i.e. age) were standardized to increase the interpretability of the model coefficients. The models classified individuals between nonproblematic internet use (IAT score <50) and PIU (IAT score 50 and above). The same cut-off was used in the traditional statistics as well. All analyses were undertaken in R Studio version 3.1.2; ML was done using the caret package (Kuhn, 2015) (classification and regression training version "caret_6.0e47"). More details about the analysis process can be found in the supplement (eMethods 2).

Validation set-ups
In terms of validation set-ups, five different validation set-ups were chosen: (A) training and testing in the full data set, (B) training and testing in the Stellenbosch set, (C) training and testing in the Chicago set, (D) training in the Stellenbosch set and testing in the Chicago set, (E) training in the Chicago set and testing in the Stellenbosch set. The different site samples were used together as one sample in the full data set analysis (validation set-up A) and as separate sets during the within study site (validation set-ups B-C) and between study site analyses (validation set-ups D-E).
The process of training and testing the models was the same for all models. All analyses used cross-validation (Stone, 1974) with 50 replications and results were averaged. At each replication, the sample was partitioned in a training and a testing sub-sample which were complementary; in validation set-ups A, B and C this was done by randomly splitting the data set into a training (75%) and a testing (25%) partition. In validation set-ups D and E, training and testing sets were appointed by the way the set-up was defined.
To avoid having identical training sets in each replication, only a random 90% of the available respective sample (Stellenbosch sample for validation set-up D and Chicago sample for validation set-up E) was used in each replication to train the model. Testing was done in the respective other sample (Chicago sample for validation set-up D and Stellenbosch sample for validation set-up E). A set seed was placed to allow replicability of results. The set seed was randomly selected by the researchers and was the same in all set ups and models. Every set was partitioned randomly into complementary training and testing sets using the caret package.

Error metrics
Receiver-operating characteristic area under the curve (ROC-AUC) and Precision-Recall area under the curve (PR-AUC) were used to examine the performance of the different models. This was considered the most suitable approach for a classification problem with unbalanced groups (Chawla, 2005). AUC is a useful and widely used metric in medical sciences, however, it lacks the ability to weight omission and commission errors and summarizes test performance in areas of the ROC space that are not always relevant for clinical practice (Lobo et al., 2008). Precision-recall curves (PR) to assess a models' performance are not widely used in medical sciences and lack the ability of taking into account of the true negative rate. However, PR curves well complement ROC curves in solving classification problems especially with highly skewed data sets (Davis and Goadrich, 2006). More metrics are reported in the online supplement, including accuracy, sensitivity, specificity, positive predictive value, negative predictive value, kappa and Fmeasure. Mean, standard deviation, and standard error of the mean was calculated for these metrics. Another output metric that was examined was variable importance (VI), which gives an indication of whether a variable is useful for an algorithm to make decisions. VI results were averaged reported in descending order.

Prediction methods
Three ML algorithms were used: Logistic Regression (LR), Random Forests (RF) (Breiman, 2001), and Naïve Bayes (NB) (Duda and Hart, 1973). A Random Forest is a combination of many binary decision trees. When the model receives new data, each decision tree produces a separate response and the overall output is determined by a majority vote. We used the default value of 500 trees. The number of variables considered at each node was a variable tuning parameter that was optimised by a tuning function. The Naive Bayes classifier applies Bayes rule to select the class that maximises the posterior probability of the class labels given the data. Probability distributions were based on kernel density estimates using the training data. No Laplace correction was applied.
Model construction and predictions were made using five different sets of variables: (a) a 'baseline set' of demographic variables, including age, sex, race, education plus social anxiety disorder and generalized anxiety disorder diagnoses, (b) a set that included all baseline variables plus impulsivity and compulsivity variables, (c) a set that included all baseline variables plus impulsivity variables only, (d) a set that included all baseline variables plus compulsivity variables only and (e) a set of demographic, impulsivity and compulsivity variables with randomized scores to establish the 'chance' baseline. An in-sample logistic regression was also fit to ascertain associations using a traditional approach.

Results
Complete data were available for 2006 subjects and all of those were included in the analyses. Demographic and clinical characteristics in the full sample are presented in Table 1. Demographic and clinical characteristics stratified by study site are presented in the supplement eTable 1 and eTable 2. Models that included impulsivity and compulsivity variables produced significantly higher ROC-AUC and PR-AUC from their respective baseline models in all five validation sets. A summary table of those results are presented in Fig. 1. Further head-to-head comparisons between models are presented in the supplement eTables 3e7. All model comparisons we performed using the Wilcoxon signed rank test. There are not any models that were tried and failed and not reported in the manuscript.

Full data set results
In more detail, in the whole data set using the Logistic Regression algorithm we produced a classification prediction that could distinguish PIU from non-PIU with an ROC-AUC of 0.83 (SD 0.03) compared to baseline ROC-AUC 0.73 (SD 0.03) and PR-AUC 0.26 (SD 0.04) compared to baseline PR-AUC 0.10 (SD 0.02). Random Forests had an ROC-AUC of 0.84 (SD 0.03) compared to baseline ROC-AUC 0.69 (SD 0.03) and PR-AUC 0.20 (SD 0.03) compared to baseline PR-AUC 0.10 (SD 0.05). Naïve Bayes had an ROC-AUC of 0.83 (SD 0.03) compared to baseline ROC-AUC 0.74 (SD 0.04) and PR-AUC 0.25 (SD 0.05) compared to baseline PR-AUC 0.01 (SD 0.00). Variable importance rank averages from LR and RF are shown in Table 2. A graphic representation of the ROC and PR curves of those models is shown in Fig. 2. More metrics are presented in the supplement eTable 8 and eTable 9.

Within and between study sites results
We found that models including impulsivity and compulsivity variables outperformed their respective baseline models, both when exclusively trained and validated on one study site [validation set-ups B and C], but also when models were trained on data from one-study site and validated to independent data from the other study site and vice versa [validation set-ups D and E].

Results of within and between study sites analyses [validation set-ups B-E]
including all metrics, ROC-AUC and PR-AUC scores and VI matrices are presented fully in the supplement eTables 10e17 and graphically presented in eFigures 2e5.

Chance-level results with randomized variable scores
All 'chance level' predictions conveyed ROC-AUCs close to 0.50 and PR-AUCs close to 0.0.

Intermediate models comparisons
We introduced impulsivity and compulsivity sets of variables in a step-wise fashion to establish that both dimensions were important and able to improve predictions [eTable 3]. We

Between algorithms comparison
Overall, all three algorithms performed similarly in the full data set. LR and RF performed similarly in terms of ROC-AUC but LR outperformed RF in PR-AUC [eTable 5]. NB outperformed LR in terms of ROC-AUC in validation set-ups C and D only but performed variably in terms of PR-AUC [eTable 6]. NB outperformed RF in both between-sites cross-validation set-ups (D and E) but performed variably in terms of PR-AUC [eTable 7].

Brief summary
This two-site original investigation showed that problematic internet use (PIU) can be predicted from a number of impulsivity and compulsivity variables, as well as baseline demographic and other clinical characteristics. Furthermore, the performance of the prediction models was significantly increased when sets of variables of impulsivity and compulsivity were added to the baseline variables of the prediction models. The inclusion of impulsivity and compulsivity together additively improved performance compared to each dimension used alone. Wilcoxon signed rank tests on ROC-AUC and PR-AUC scores to ascertain model comparisons established that all machine learning methods used (LR, NB and RF) performed similarly and were able to produce the above results in all validation set-ups. Moreover, the out-of-sample cross-validation between two study sites indicated that the predictive models were universal and robust, in that they permitted predictions across two geographically and culturally distinct settings. To our knowledge, this approach has not been utilized before in psychiatry, for any mental disorder. Our approach using 'out-of-sample' prediction means that we were able to estimate how well the models will perform in future, that is, it quantifies the predictive value of the statistical model. In contrast, this is not the case with traditional statistical methods, as commonly used in psychiatry to date, where significances decay in replication studies.

PIU and impulsivity
Previous studies have identified significant associations between PIU and high rates of impulsive disorders and symptomatology (Ko et al., 2009;Carli et al., 2013;Ho et al., 2014). Our study identified similar associations replicating previous results, but also ascertained that indicators of impulsivity, like ADHD and BIS-11 sub-scores (i.e. motor impulsivity, attentional impulsivity, nonplanning impulsivity), are useful to make out-of-sample predictions of PIU, which adds to the validity to those associations and highlights the fact that impulsivity as a dimension, and not only as a categorical variable, is important for PIU. Particularly total ASRS score and motor impulsivity appear to be more important.

PIU and compulsivity
The importance of compulsivity has much less been identified in PIU (Bernardi and Pallanti, 2009;Pallanti, 2010), although specific types of problematic online behaviours have been identified to have compulsive components (King and Barak, 1999;Greenfield, 1999), (Wetterneck et al., 2012), (Weinstein et al., 2015a(Weinstein et al., , 2015b. Our results showed that compulsivity variables are useful to make out-of-sample predictions of PIU, suggesting that compulsivity as a dimensional variable plays an important role in those behaviors and merits further investigation. Among PI variables, checking compulsions and obsessive impulses to harm self or others appeared to be more important.

PIU and demographic characteristics
Older age was linearly associated with higher rates of PIU in our sample, but stratification by study site showed that this association stemmed from the Stellenbosch sample only. Limited research has examined how adult populations with mental health problems behave online. In adult and late adult populations there is a ADHD e Attention Deficit Hyperactivity Disorder; ASRS e Adult ADHD Self-Report Scale (ASRS-v1.1); BIS e Barratt Impulsiveness Scale 11; GAD e Generalized Anxiety disorder; OCD e Obsessive-Compulsive disorder; PADUA e Padua Inventory-Revised; VI e Variable importance.
considerable incidence and projected lifetime risk of psychiatric disorders commonly associated with PIU (Faraone et al., 2006;Cunningham-Williams et al., 2005;Kessler et al., 2007a;Kessler et al., 2007b), therefore it is important to explore how PIU and those disorders interact. Arguably, the relationship between age and PIU might be non-linear if assessed across the whole age span. Caucasian race was associated with lower rates of PIU at both study sites; this is a result that merits further investigation. Exploring how a similar analysis would hold in a setting with a majority of non-Caucasian populations is an idea worth considering; socio-cultural factors common to both study-sites used may be confounding this observed relationship. In contrast to other PIU studies, we did not find any gender differences relating to PIU. However, our sample did not include adolescents. When problematic internet behaviors in adolescents were assessed in Korean youth, those were more prevalent in males (Ha and Hwang, 2014), nevertheless, similar structural brain changes have been identified in females with PIU (Altb€ acker et al., 2015). In a recent study, about half of the individual differences in compulsive online behaviors were accounted for by genetic factors to an equal degree in both genders. It was furthermore noted that boys spend more time gaming while girls spend more time on social network sites and chatting (Vink et al., 2015). While it is plausible that gender differences are masked by selection of the study sample, ours and previous results imply that if a wider range of problematic online behaviors are assessed (and not only internet gaming), gender effects might weaken or disappear (Kir aly et al., 2014). If gender differences in the presentation of PIU may be more pronounced in adolescents or young adults, those might stem from a neurobiological susceptibility of young males towards problematic online gaming or PIU in general.

Limitations
There are limitations to our study deriving from using the MINI; this is validated to be delivered from a trained person in a face-toface interview whereas in our study it was delivered via an online tool. Given the strong links that are reported from previous studies between PIU and psychiatric diagnoses, it is likely that accurate or a wider variety of diagnostic data would improve the predictive accuracy of the models using diagnoses as predictors. Due to using Craigslist, we cannot exclude the possibility of a small number of non-local people having accessed the survey. However, participants were required to provide an address to enter the prize draw, thereby reducing the likelihood of non-local participants contributing to the survey. Our sample consisted of only 1% in the severe group (IAT ! 80) and we were unable to accurately assess classification metrics for predicting the severe group alone. A further limitation is that this study did not explore a wide variety of ML algorithms. For the purposes of this study we focused only on three ML methods that all confirmed our hypotheses and demonstrated the proof-of-concept.

Classification controversy of problematic internet use
There is still a debate as to how to fit the observed behavioral phenotypes of problematic internet use into a reliable and valid taxonomical system. Despite an accumulation of empirical data and analyses on internet addiction behaviors, any clear theoretical conclusions are currently lacking. Since the introduction of the term "Internet Addiction disorder" in the mid-nineties many attempts have been made to revisit the proposed diagnostic criteria, refining the assessment tools (Koh, 2007;Lortie and Guitton, 2013) and formalize the concept in the new classification systems (Block, 2008). Internet gaming has been shown to excessively boost the brain reward systems, while deficits of the dopaminergic system have been identified in internet gaming addiction. Recent imaging data show that the reward, addiction, craving and emotion circuits in the brain are increasingly activated during gaming activities. Therefore, categorizing problematic internet use as an addiction disorder, seems to hold the strongest biological footing and has dominated the literature on the field so far (Kuss and Griffiths, 2012). At the same time, there is a wide range of internet activities that have been observed to have compulsive elements and share commonalities with impulse control disorders; this has raised the question whether problematic internet use should better be classified as an impulse control disorder or within the impulsive-compulsive or obsessive-compulsive spectrum. Modern psychiatric classification systems are undergoing scrutiny and welldeserved critique for their epistemological failings, lack of biological grounding and weak validity (Aragona, 2009). When exploring new concepts like PIU, there is a need for different approaches in psychiatry, that would provide stronger links between behavioral phenotypes observed and brain biology (Cuthbert and Insel, 2013;Cuthbert, 2014), approaches that would allow dimensional constructs to enrich the descriptive frameworks and strengthen the validity and generalizability of the results produced (Hyman, 2010;Nesse and Stein, 2012).

Is PIU a meaningful diagnostic entity?
Although this study does not explore whether PIU shares elements with addictions, it adds to the clinical description of problematic internet behaviors, thus contributing to achieving a valid classification. Furthermore, it strengthens the argument that PIU, if it is to be regarded as a disorder in its own right, should likely be categorized within the impulsive-compulsive spectrum. Such categorization might open several new areas of investigation. PIU could be considered as a newly identified area of symptomatology for the disorders of that spectrum i.e. impulsive online buying in the context of ADHD or compulsive use of social media in the context of OCD, which would respond to well-established treatments for these disorders, or it might worth be considered as a separate commonly co-morbid disorder, requiring PIU-specific treatments. In terms of prevalence rates, individuals suffering from disorders of the impulsive-compulsive spectrum might be at more risk of developing PIU or more severe forms of it. Treating psychiatric co-morbidities as early as possible has been suggested to prevent the development of pathological use of the internet (Ko et al., 2009). In terms of prevention, early identification of PIU may Table 3 Logistic Regression model in the full data set (in-sample), with problematic internet use category (moderate and severely problematic versus controls) as dependent variable. facilitate the diagnosis of impulsive-compulsive disorders and other related common health problems, and enable timely management of a wide range of mental health difficulties. Finally, it will be important to develop better assessment tools for PIU and evidence-based management strategies, which are currently lacking (Weinstein and Lejoyeux, 2010). There is only preliminary evidence for pharmacological treatments of PIU which are mainly conceived and focused on treating a co-morbid disorder, for example treating PIU symptoms by treating co-morbid ADHD with methylphenidate. Psychological treatments including individual or group Cognitive behavioral therapy, family based interventions, and motivational interviewing have been suggested as a possible treatments for PIU symptoms (Spada, 2014).

Broader applications of machine learning for psychiatry in general
In terms of the methodology used, this study demonstrates a proof-of-concept for the use of machine learning approaches with behavioral data in psychiatry, with special consideration to the use of between-study-sites cross-validation. Such approaches enable multi-site studies to explore how robustly the results transfer between distinct settings, which is a vital step in establishing the 'validity' of a given mental disorder.