Computational drug repositioning based on side-effects mined from social media

Drug repositioning methods attempt to identify novel therapeutic indications for marketed drugs. Strategies include the use of side-effects to assign new disease indications, based on the premise that both therapeutic effects and side-effects are measurable physiological changes resulting from drug intervention. Drugs with similar side-effects might share a common mechanism of action linking side-effects with disease treatment, or may serve as a treatment by “rescuing” a disease phenotype on the basis of their side-effects; therefore it may be possible to infer new indications based on the similarity of side-effect profiles. While existing methods leverage side-effect data from clinical studies and drug labels, evidence suggests this information is often incomplete due to under-reporting. Here, we describe a novel computational method that uses side-effect data mined from social media to generate a sparse undirected graphical model using inverse covariance estimation with l 1 -norm regularization. Results show that known indications are well recovered while current trial indications can also be identified, suggesting that sparse graphical models generated using side-effect data mined from social media may be useful for computational drug repositioning.


17
Drug repositioning is the process of identifying novel therapeutic indications for marketed 18 drugs. Compared to traditional drug development, repositioned drugs have the advan- 19 tage of decreased development time and costs given that significant pharmacokinetic, 20 toxicology and safety data will have already been accumulated, drastically reducing the 21 risk of attrition during clinical trials. In addition to marketed drugs, it is estimated 22 that drug libraries may contain upwards of 2000 failed drugs that have the potential 23 to be repositioned, with this number increasing at a rate of 150-200 compounds per 24 year [1]. Repositioning of marketed or failed drugs has opened up new sources of revenue 25 for pharmaceutical companies with estimates suggesting the market could generate 26 multi-billion dollar annual sales in coming years [2,3]. While many of the current suc-27 cesses of drug repositioning have come about through serendipitous clinical observations, 28 systematic data-driven approaches are now showing increasing promise given their ability 29 to generate repositioning hypotheses for multiple drugs and diseases simultaneously 30 using a wide range of data sources, while also incorporating prioritisation information to 31 further accelerate development time [4]. Existing computational repositioning strategies 32 generally use similar approaches but attempt to link different concepts. They include 33 Existing repositioning methods based on side-effects, such as the work of Campillos 65 et al. [20] and Yang and Agarwal [21], have used data from the SIDER database [36], 66 which contains side-effect data extracted from drug labels, largely collected from clinical 67 trials during the pre-marketing phase of drug development. Other resources include 68 Meyler's Side Effects of Drugs [37], which is updated annually in the Side Effects of 69 Drugs Annual [38], and the Drugs@FDA database [39], while pharmacovigilance au-70 thorities attempt to detect, assess and monitor reported drug side-effects post-market. 71 Despite regular updates to these resources and voluntary reporting systems, there is 72 evidence to suggest that side-effects are substantially under-reported, with some esti-73 mates indicating that up to 86% of adverse drug reactions go unreported for reasons 74 that include lack of incentives, indifference, complacency, workload and lack of training 75 among healthcare professionals [40][41][42][43]. Side-effects reported from clinical trials also 76 have limitations due to constraints on scale and time, as well as pharmacogenomic 77 effects [44]. A number of cancer drug studies have also observed that women are often 78 significantly under-represented in clinical trials, making it difficult to study the efficacy, 79 dosing and side-effects of treatments which can work differently in women and men; 80 similar problems of under-representation also affect paediatrics, as many drugs are only 81 ever tested on adults [45].
between the frequency of side-effects extracted from unlabelled data and the frequency 86 of documented adverse drug reactions [46]. Despite this success, a number of significant 87 natural language processing challenges remain. These include dealing with idiomatic 88 expressions, linguistic variability of expression and creativity, ambiguous terminology, 89 spelling errors, word shortenings, and distinguishing between the symptoms that a drug 90 is treating and the side-effects it causes. Some of the solutions proposed to deal with 91 these issues include the use of specialist lexicons, appropriate use of semantic analysis, 92 and improvements to approximate string matching, modeling of spelling errors, and 93 contextual analysis surrounding the mentions of side-effects [46,47], while maintaining 94 a list of symptoms for which a drug is prescribed can help to eliminate them from the 95 list of side-effects identified [48]. Although much of the focus has explored the use of 96 online forums where users discuss their experience with pharmaceutical drugs and report 97 side-effects [49], the growing popularity of Twitter [50], which at the time of writing has 98 over 300 million active monthly users, provides a novel resource upon which to perform 99 large-scale mining of reported drug side-effects in near real-time from the 500 millions 100 tweets posted daily [51]. While only a small fraction of these daily tweets are related to 101 health issues, the sheer volume of data available presents an opportunity to bridge the 102 gap left by conventional side-effects reporting strategies. Over time, the accumulation of 103 side-effect data from social media may become comparable or even exceed the volume 104 of traditional resources, and at the very least should be sufficient to augment existing 105 databases. Additionally, the cost of running such a system continuously is relatively 106 cheap compared to existing pharmacovigilance monitoring, presenting a compelling 107 economic argument supporting the use of social media for such purposes. Furthermore, 108 the issues related to under-representation described above may be addressed.

110
Freifeld et al. [52] presented a comparison study between drug side-effects found 111 on Twitter and adverse events reported in the FDA Adverse Event Reporting System 112 (FAERS). Starting with 6.9 million tweets, they used a set of 23 drug names and a list 113 of symptoms to reduce that data to a subset of 60,000 tweets. After manual examina-114 tion, there were 4,401 tweets identified as mentioning a side-effect, with a Spearman 115 rank correlation found to be 0.75. Nikfarjam et al. [53] introduce a method based on 116 Conditional Random Fields (CRF) to tag mentions of drug side-effects in social media 117 posts from Twitter or the online health community DailyStrength. They use features 118 based on the context of tokens, a lexicon of adverse drug reactions, Part-Of-Speech 119 (POS) tags and a feature indicating whether a token is negated or not. They also used 120 embedding clusters learned with Word2Vec [54]. They reported an F1 score of 82.1% 121 for data from DailyStrength and 72.1% for Twitter data. Sarker and Gonzalez [55] 122 developed classifiers to detect side-effects using training data from multiple sources, 123 including tweets [56], DailyStrength, and a corpus of adverse drug events obtained from 124 medical case reports. They reported an F1 score of 59.7% when training a Support 125 Vector Machine (SVM) with Radial Basis Function (RBF) kernel on all three datasets. 126 Recently, Karimi et al. [57] presented a survey of the field of surveillance for adverse 127 drug events with automatic text and data mining techniques.

129
In this study, we describe a drug repositioning methodology that uses side-effect data 130 mined from social media to infer novel indications for marketed drugs. We use data from 131 a pharmacovigilance system for mining Twitter for drug side-effects [58]. The system 132 uses a set of cascading filters to eliminate large quantities of irrelevant messages and 133 identify the most relevant data for further processing, before applying a SVM classifier 134 to identify tweets that mention suspected adverse drug reactions. Using this data we 135 apply sparse inverse covariance estimation to construct an undirected graphical model, 136 which offers a way to describe the relationship between all drug pairs [59][60][61]. This is 137

3/22
PeerJ Comput. Sci. reviewing PDF | (CS-2015:09:6894:2:0:NEW 18 Jan 2016) Manuscript to be reviewed Computer Science achieved by solving a maximum likelihood problem using 1 -norm regularization to make 138 the resulting graph as sparse as possible, in order to generate the simplest graphical 139 model which fully explains the data. Results from testing the method on known and 140 proposed trial indication recovery suggest that side-effect data mined from social media 141 in combination with a regularized sparse graphical model can be used for systematic 142 drug repositioning.

145
Mining Twitter for drug side-effects 146 We used the SoMeDoSEs pharmacovigilance system [58] to extract reports of drug 147 side-effects from Twitter over a 6 month period between January and June 2014. SoMe-148 DoSEs works by first applying topic filters to identify tweets that contain keywords 149 related to drugs, before applying volume filters which remove tweets that are not written 150 in English, are re-tweets or contain a hyperlink to a web page, since these posts are 151 typically commercial offerings. Side-effects were then mapped to an entry in the FDA 152 Adverse Event Reporting System. Tweets that pass these filters are then classified by 153 a linear SVM to distinguish those that mention a drug side-effect from those that do 154 not. The SVM classifier uses a number of natural language features including unigrams 155 and bigrams, part-of-speech tags, sentiment scores, text surface features, and matches 156 to gazetteers related to human body parts, side-effect synonyms, side-effect symptoms, 157 causality indicators, clinical trials, medical professional roles, side effect-triggers and 158 drugs.

160
For each gazetteer, three features were created: a binary feature, which is set to 1 if 161 a tweet contains at least one sequence of tokens matching an entry from the gazetteer, 162 the number of tokens matching entries from the gazetteer, and the fraction of charac-163 ters in tokens matching entries from the gazetteer. For side-effect synonyms we used 164 the Consumer Health Vocabulary (CHV) [62], which maps phrases to Unified Medical 165 Language System Concept Universal Identifiers (CUI) and partially addresses the issue 166 of misspellings and informal language used to discuss medical issues in tweets. The 167 matched CUIs were also used as additional features.

169
To develop the system, 10,000 tweets which passed the topic and volume filters were 170 manually annotated as mentioning a side-effect or not, resulting in a Cohen's Kappa for 171 inter-annotator agreement on a sample of 404 tweets annotated by two non-healthcare 172 professional of 0.535. Using a split of 8,000 tweets for training, 1,000 for development, 173 and 1,000 for testing, the SVM classifier that used all the features achieved a precision 174 of 55.0%, recall of 66.9%, and F1 score of 60.4% when evaluated using the 1,000 test 175 tweets. This is statistically significantly higher than the results achieved by a linear 176 SVM classifier using only unigrams and bigrams as features (precision of 56.0%, recall 177 of 54.0% and F1 score of 54.9%). One of the sources of false negatives was the use of 178 colloquial and indirect expressions by Twitter users to express that they have experienced 179 a side-effect. We also observed that a number of false positives discuss the efficacy of 180 drugs rather than side-effects.

Manuscript to be reviewed
Computer Science side-effects from 108,009 tweets, once drugs with only a single side-effect were excluded 185 and drug synonyms had been resolved to a common name using exact string matches 186 to entries in World Drug Index [63], which worked for approximately half of the data 187 set with the remainder matched manually. We were also careful to remove indications 188 that were falsely identified as side-effects using drug indications from Cortellis Clinical 189 Trials Intelligence [64]. We used this data to construct a 2196 row by 620 column matrix 190 of binary variables X, where x ∈ {0, 1}, indicating whether each drug was reported to 191 cause each side-effect in the Twitter data set.

193
Calculating the sample covariance matrix 194 Using this data, we are able to form the sample covariance matrix S for binary variables 195 as follows [65], such that element S i,j gives the covariance of drug i with drug j : x ki and x ki is the k-th observation (side-effect) of variable (drug) 197 X i . It can be shown than the average product of two binary variables is equal to their 198 observed joint probabilities such that: where P (X j = 1|X i = 1) refers to the conditional probability that variable X j equals 201 one given that variable X i equals one. Similarly, the product of the means of two binary 202 variables is equal to the expected probability that both variables are equal to one, under 203 the assumption of statistical independence: Consequently, the covariance of two binary variables is equal to the difference between 206 the observed joint probability and the expected joint probability: Our objective is to find the precision or concentration matrix θ by inverting the sample 209 covariance matrix S. Using θ, we can obtain the matrix of partial correlation coefficients 210 ρ for all pairs of variables as follows: The partial correlation between two variables X and Y given a third, Z, can be defined as 213 the correlation between the residuals R x and R y after performing least-squares regression 214 of X with Z and Y with Z, respectively. This value, denotated ρ x,y|z , provides a measure 215 of the correlation between two variables when conditioned on the third, with a value 216 of zero implying conditional independence if the input data distribution is multivariate 217 Gaussian. The partial correlation matrix ρ, however, gives the correlations between all 218 pairs of variables conditioning on all other variables. Off-diagonal elements in ρ that 219 are significantly different from zero will therefore be indicative of pairs of drugs that 220 show unique covariance between their side-effect profiles when taking into account (i.e. 221 removing) the variance of side-effects profiles amongst all the other drugs.

223
For the sample covariance matrix to be easily invertible, two desirable characteristics 224 are that it should be positive definite, i.e. all eigenvalues should be distinct from zero, 225 and well-conditioned, i.e. the ratio of its maximum and minimum singular value should 226 not be too large. This can be particularly problematic when the sample size is small 227 and the number of variables is large (n < p) and estimates of the covariance matrix 228 become singular. To ensure these characteristics, and speed up convergence of the 229 inversion, we condition the sample covariance matrix by shrinking towards an improved 230 covariance estimator T, a process which tends to pull the most extreme coefficients 231 towards more central values thereby systematically reducing estimation error [66], using 232 a linear shrinkage approach to combine the estimator and sample matrix in a weighted 233 average: where α ∈ {0, 1} denotes the analytically determined shrinkage intensity. We apply the 236 approach of Schäfer and Strimmer, which uses a distribution-free, diagonal, unequal 237 variance model which shrinks off-diagonal elements to zero but leaves diagonal entries 238 intact, i.e. it does not shrink the variances [67]. Shrinkage is actually applied to the 239 correlations rather than the covariances, which has two distinct advantages: the off-240 diagonal elements determining the shrinkage intensity are all on the same scale, while 241 the partial correlations derived from the resulting covariance estimator are independent 242 of scale.

243
Graphical lasso for sparse inverse covariance estimation 244 A useful output from the covariance matrix inversion is a sparse ρ matrix containing 245 many zero elements, since, intuitively, we know that relatively few drug pairs will share 246 a common mechanism of action, so removing any spurious correlations is desirable and 247 results in a more parsimonious relationship model, while the non-zero elements will 248 typically reflect the correct positive correlations in the true inverse covariance matrix 249 more accurately [68]. However, elements of ρ are unlikely to be zero unless many elements 250 of the sample covariance matrix are zero. The graphical lasso [60,61,69] provides a way 251 to induce zero partial correlations in ρ by penalizing the maximum likelihood estimate 252 of the inverse covariance matrix using an 1 -norm penalty function. The estimate can 253 be found by maximizing the following log-likelihood using the block coordinate descent 254 approach described by Friedman et al. [60]:

Manuscript to be reviewed
Computer Science Here, the first term is the Gaussian log-likelihood of the data, tr denotes the trace 257 operator and θ 1 is the 1 -norm -the sum of the absolute values of the elements of 258 θ, weighted by the non-negative tuning paramater λ. The specific use of the 1 -norm 259 penalty has the desirable effect of setting elements in θ to zero, resulting in a sparse 260 matrix, while the parameter λ effectively controls the sparsity of the solution. This 261 contrasts with the use of an 2 -norm penalty which will shrink elements but will never 262 reduce them to zero. While this graphical lasso formulation is based on the assumption 263 that the input data distribution is multivariate Gaussian, Banerjee et al. showed that the 264 dual optimization solution also applies to binary data, as is the case in our application [61]. 265

266
It has been noted that the graphical lasso produces an approximation of θ that is 267 not symmetric, so we update it as follows [70]: The matrix ρ is then calculated according to Equation 5, before repositioning predictions 270 for drug i are determined by ranking all other drugs according to their absolute values 271 in ρ i and assigning their indications to drug i. To evaluate our method we have attempted to predict repositioning targets for indications 275 that are already known. If, by exploiting hindsight, we can recover these, then our method 276 should provide a viable strategy with which to augment existing approaches that adopt 277 an integrated approach to drug repositioning [19]. Figure 1a shows the performance 278 of the method at identifying co-indicated drugs at a range of λ values, resulting in 279 different sparsity levels in the resulting ρ matrix. We measured the percentage at which 280 a co-indicated drug was ranked amongst the top 5, 10, 15, 20 and 25 predictions for the 281 target drug, respectively. Of the 620 drugs in our data set, 595 had a primary indication 282 listed in Cortellis Clinical Trials Intelligence, with the majority of the remainder being 283 made up of dietary supplements (e.g. methylsulfonylmethane) or plant extracts (e.g. 284 Agaricus brasiliensis extract) which have no approved therapeutic effect. Rather than 285 removing these from the data set, they were left in as they may contribute to the partial 286 correlation between pairs of drugs that do have approved indications.

288
Results indiciate that the method achieves its best performance with a λ value of 289 10 −9 where 42.41% (243/595) of targets have a co-indicated drug returned amongst 290 the top 5 ranked predictions (Figure 1a). This value compares favourably with both 291 a strategy in which drug ranking is randomized (13.54%, standard error ±0.65), and 292 another in which drugs are ranked according to the Jaccard index (28.75%). In Ye 293 et al. [27], a related approach is used to construct a repositioning network based on 294 side-effects extracted from the SIDER database, Meyler's Side Effects of Drugs, Side 295 Effects of Drugs Annual, and the Drugs@FDA database [36][37][38][39], also using the Jaccard 296 index as the measure of drug-drug similarity. Here, they report an equivilent value of While data sets and underlying statistical models clearly differ, these results taken 299 together suggest that the use of side-effect data mined from social media can certainly 300 offer comparable performance to methods using side-effect data extracted from more 301 conventional resources, while the use of a global statistical model such as the graphical 302 lasso does result in improved performance compared to a pairwise similarity coefficient 303 such as the Jaccard index.

305
To further investigate the influence of the provenance of the data, we mapped our 306 data set of drugs to ChEMBL identifiers [71,72] which we then used to query SIDER 307 for side-effects extracted from drug labels. This resulted in a reduced data set of 229 308 drugs, in part due to the absence of many combination drugs from SIDER (e.g. the 309 antidepressant Symbyax which contains olanzapine and fluoxetine). Using the same 310 protocol described above, best performance of 53.67% (117/229) was achieved with a 311 slightly higher λ value of 10 −6 . Best performance on the same data set using side-effects 312 derived from Twitter was 38.43% (88/229), again using a λ value of 10 −9 , while the 313 randomized strategy achieved 12.05% (standard error ±1.14), indicating that the use 314 of higher quality side-effect data from SIDER allows the model to achieve better per-315 formance than is possible using Twitter data. Perhaps more interestingly, combining 316 the correct predictions between the two datasources reveals that 30 are unique to the 317 Twitter model, 59 are unique to the SIDER model, with 58 shared, supporting the use 318 side-effect data mined from social media to augment conventional resources.

320
We also investigated whether our results were biased by the over-representation of 321 particular drug classes within our data set. Using Cortellis Clinical Trials Intelligence, we 322 were able to identify the broad class for 479 of the drugs (77.26%) in our data set. The five 323 largest classes were benzodiazepine receptor agonists (3/14 drugs returned amongst the 324 top 5 ranked predictions), analgesics (6/12), H 1 -antihistamines (8/11), cyclooxygenase 325 inhibitors (9/11), and anti-cancer (2/11). This indicates that the over-representation 326 of H 1 -antihistamines and cyclooxygenase inhibitors did result in a bias, and to a lesser 327 extent analgesics, but that the overall effect of these five classes was more subtle (28/59 328 returned amongst the top 5 ranked predictions, 47.46%).  While the previous section demonstrated our approach can effectively recover known 342 indications, predictions after the fact are -while useful -best supported by more forward-343 looking evidence. In this section, we use clinical trial data to support our predictions 344 where the ultimate success of our target drug is still unknown. Using Cortellis Clinical 345 Trials Intelligence, we extracted drugs present in our Twitter data set that were currently 346 undergoing clinical trials (ending after 2014) for a novel indication (i.e. for which they 347 were not already indicated), resulting in a subset of 277 drugs currently in trials for 397 348 indications. Figure 3 shows the percentage at which a co-indicated drug was ranked 349 amongst the top 5, 10, 15, 20 and 25 predictions for the target. Similar to the recovery 350 of known indications, best performance when considering the top 5 ranked predictions 351 was achieved with a λ value of 10 −9 , resulting in 16.25% (45/277) of targets having a 352 co-indicated drug, which again compares well to a randomized strategy (5.42%, standard 353 error ±0.32) or a strategy using the Jaccard index (10.07%). Recovery of proposed clinical trial indications is clearly more challenging than known 356 indications, possibly reflecting the fact that a large proportion of drugs will fail during 357 trials and therfore many of the 397 proposed indications analysed here will in time prove 358 false, although the general trend in performance as the sparsity parameter λ is adjusted 359 tends to mirror the recovery of known indications. Despite this, a number of interesting 360 predictions with a diverse range of novel indications are made that are supported by 361 experimental and clinical evidence; a selection of 10 of the 45 drugs where the trial 362 indication was correctly predicted are presented in Table 1. We further investigated three 363 repositioning candidates with interesting pharmacology to understand their predicted 364 results.    . Predicted repositioning of oxytocin (red) for the treatment of schizophrenia based on its proximity to the schizophrenia drug chlorpromazine (grey). Drugs in the graph are sized according to their degree (number of edges), while the thickness of a connecting edge is proportional to the partial correlation coefficient between the two drugs. The graph layout is performed by Cytoscape [81] which applies a force-directed approach based on the partial correlation coefficient. Nodes are arranged so that edges are of more or less equal length and there are as few edge crossings as possible. For clarity, only the top ten drugs ranked by partial correlation coefficient are shown.

366
Oxytocin is a nonapeptide hormone that acts primarily as a neuromodulator in the brain 367 via the specific, high-affinity oxytocin receptor -a class I (Rhodopsin-like) G-protein-368 coupled receptor (GPCR) [74]. Currently, oxytocin is used for labor induction and the 369 treatment of Prader-Willi syndrome, but there is compelling pre-clinical evidence to 370 suggest that it may play a crucial role in the regulation of brain-mediated processes 371 that are highly relevant to many neuropsychiatric disorders [75]. A number of animal 372 studies have revealed that oxytocin has a positive effect as an antipsychotic [76,77], while 373 human trials have revealed that intranasal oxytocin administered to highly symptomatic 374 schizophrenia patients as an adjunct to their antipsychotic drugs improves positive and 375 negative symptoms significantly more than placebo [78,79]. These therapeutic findings 376 are supported by growing evidence of oxytocin's role in the manifestation of schizophrenia 377 symptoms such as a recent study linking higher plasma oxytocin levels with increased 378 pro-social behavior in schizophrenia patients and with less severe psychopathology in 379 female patients [80]. The mechanisms underlying oxytocin's therapeutic effects on 380 schizophrenia symptoms are poorly understood, but its ability to regulate mesolimbic 381 dopamine pathways are thought to be responsible [75]. Here, our method predicts 382 schizophrenia as a novel indication for oxytocin based on its proximity to chlorpromazine, 383 which is currently used to treat schizophrenia (Figure 4). Chlorpromazine also modulates 384 the dopamine pathway by acting as an antagonist of the dopamine receptor, another 385 class I GPCR. Interestingly, the subgraph indicates that dopamine also has a high 386 partial correlation coefficient with oxytocin, adding further support to the hypothesis 387 that oxytocin, chlorpromazine and dopamine all act on the same pathway and therefore 388 have similar side-effect profiles. Side-effects shared by oxytocin and chlorpromazine 389 include hallucinations, excessive salivation and anxiety, while shivering, weight gain, 390 abdominal pain, nausea, and constipation are common side-effects also shared by other 391 drugs within the subgraph. Currently, larger scale clinical trials of intranasal oxytocin 392 in schizophrenia are underway. If the early positive results hold up, it may signal the 393 beginning of an new era in the treatment of schizophrenia, a field which has seen little 394 progress in the development of novel efficacious treatments over recent years. Manuscript to be reviewed Computer Science Figure 5. Predicted repositioning of ramelteon (red) for the treatment of bipolar I disorder based on its proximity to ziprasidone (grey). Along with ziprasidone, phenelzine, milnacipran and tranylcypromine are all used to treat mood disorders.

396
Ramelteon, currently indicated for the treatment of insomnia, is predicted to be useful 397 for the treatment of bipolar depression ( Figure 5). Ramelteon is the first in a new 398 class of sleep agents that selectively binds the MT 1 and MT 2 melatonin receptors 399 in the suprachiasmatic nucleus, with high affinity over the MT 3 receptor [82]. It is 400 believed that the activity of ramelteon at MT 1 and MT 2 receptors contributes to its 401 sleep-promoting properties, since these receptors are thought to play a crucial role in 402 the maintenance of the circadian rhythm underlying the normal sleep-wake cycle upon 403 binding of endogenous melatonin. Abnormalities in circadian rhythms are prominent 404 features of bipolar I disorder, with evidence suggesting that disrupted sleep-wake circadian 405 rhythms are associated with an increased risk of relapse in bipolar disorder [83]. As 406 bipolar patients tend to exhibit shorter and more variable circadian activity, it has been 407 proposed that normalisation of the circadian rhythm pattern may improve sleep and 408 consequently lead to a reduction in mood exacerbations. Melatonin receptor agonists such 409 as ramelteon may have a potential therapeutic effect in depression due to their ability 410 to resynchronize the suprachiasmatic nucleus [84]. In Figure 5, evidence supporting the 411 repositioning of ramelteon comes from ziprasidone, an atypical antipsychotic used to 412 treat bipolar I disorder and schizophrenia [85]. Ziprasidone is the second-ranked drug 413 by partial correlation coefficient; a number of other drugs used to treat mood disorders 414 can also be located in the immediate vicinity including phenelzine, a non-selective 415 and irreversible monoamine oxidase inhibitor (MAOI) used as an antidepressant and 416 anxiolytic, milnacipran, a serotonin-norepinephrine reuptake inhibitor used to treat 417 major depressive disorder, and tranylcypromine, another MAOI used as an antidepressant 418 and anxiolytic agent. The co-location of these drugs in the same region of the graph 419 suggests a degree of overlap in their respective mechanistic pathways, resulting in a 420 high degree of similarity between their side-effect profiles. Nodes in this subgraph also 421 have a relatively large degree indicating a tighter association than for other predictions, 422 with common shared side-effects including dry mouth, sexual dysfunction, migraine, and 423 orthostatic hypotension, while weight gain is shared between ramelteon and ziprasidone. 424 Manuscript to be reviewed Computer Science Figure 6. Predicted repositioning of meloxicam (red) for the treatment of non-Hodgkin lymphoma based on its proximity to rituximab (grey).
via the mobilisation of autologous peripheral blood stem cells from bone marrow. By in-428 hibiting cyclooxygenase 2, meloxicam is understood to inhibit generation of prostaglandin 429 E 2 , which is known to stimulate osteoblasts to release osteopontin, a protein which 430 encourages bone resorption by osteoclasts [86,87]. By inhibiting prostaglandin E 2 and 431 disrupting the production of osteopontin, meloxicam may encourage the departure of 432 stem cells, which otherwise would be anchored to the bone marrow by osteopontin [88]. 433 In Figure 6, rituximab, a B-cell depleting monoclonal antibody that is currently indicated 434 for treatment of non-Hodgkin lymphoma, is the top ranked drug by partial correlation, 435 which provides evidence for repositioning to this indication. Interestingly, depletion of B-436 cells by rituximab has recently been demonstrated to result in decreased bone resorption 437 in patients with rheumatoid arthritis, possibly via a direct effect on both osteoblasts 438 and osteoclasts [89,90], suggesting a common mechanism of action between meloxicam 439 and rituximab. Further evidence is provided by the fifth-ranked drug clopidogrel, an an-440 tiplatelet agent used to inhibit blood clots in coronary artery disease, peripheral vascular 441 disease, cerebrovascular disease, and to prevent myocardial infarction. Clopidogrel works 442 by irreversibly inhibiting the adenosine diphosphate receptor P2Y12, which is known to 443 increase osteoclast activity [91]. Similarly to the ramelteon subgraph, many drugs in the 444 vicinity of meloxicam are used to treat inflammation including diclofenac, naproxen (both 445 NSAIDs) and betamethasone, resulting in close association between these drugs, with 446 shared side-effects in the subgraph including pain, cramping, flushing and fever, while 447 swelling, indigestion, inflammation and skin rash are shared by meloxicam and rituximab. 448 449 While the side-effects shared within the subgraphs of our three examples are commonly 450 associated with a large number of drugs, some of the side-effects shared by the three 451 drug pairs such as hallucinations, excessive salivation and anxiety are somewhat less 452 common. To investigate this relationship for the data set as a whole, we calculated 453 log frequencies for all side-effects and compared these values against the normalized 454 average rank of pairs where the side-effect was shared by both the query and target 455 drug. If we assume that a higher ranking in our model indicates a higher likelihood of 456 drugs sharing a protein target, this relationship demonstrates similar properties to the 457 observations of Campillos et al. [20] in that there is a negative correlation between the 458 rank and frequency of a side-effect. The correlation coefficient has a value of -0.045 which 459 is significantly different from zero at the 0.001 level, although the linear relationship 460 appears to break down where the frequency of the side-effect is lower than about 0.025. In this study, we have used side-effect data mined from social media to generate a 463 sparse graphical model, with nodes in the resulting graph representing drugs, and edges 464 between them representing the similarity of their side-effect profiles. We demonstrated 465 that known indications can be inferred based on the indications of neighbouring drugs 466 in the network, with 42.41% of targets having their known indication identified amongst 467 the top 5 ranked predictions, while 16.25% of drugs that are currently in a clinical trial 468 have their proposed trial indication correctly identified. These results indicate that the 469 volume and diversity of drug side-effects reported using social media is sufficient to be of 470 use in side-effect-based drug repositioning, and this influence is likely to increase as the 471 audience of platforms such as Twitter continues to see rapid growth. It may also help 472 to address the problem of side-effect under-reporting. We also demonstrate that global 473 statistical models such as the graphical lasso are well-suited to the analysis of large 474 multivariate systems such as drug-drug networks. They offer significant advantages over 475 conventional pairwise similarity methods by incorporating indirect relationships between 476 all variables, while the use of the lasso penalty allows a sparse, parsimonious model to 477 be generated with fewer spurious connections resulting in a simpler theory of relationships. 478 479 While our method shows encouraging results, it is more likely to play a role in drug 480 repositioning as a component in an integrated approach. Whether this is achieved 481 by combining reported side-effects with those mined from resources such as SIDER, 482 or by using predictions as the inputs to a supervised learning algorithm, a consensus 483 approach is likely to achieve higher performance by incorporating a range of different 484 data sources in addition to drug side-effects, while also compensating for the weaknesses 485 of any single method [19]. Limitations of our method largely stem from the underlying 486 Twitter data [58]. Only a small fraction of daily tweets contain reports of drug side-487 effects, therefore restricting the number of drugs we are able to analyse. However, given 488 that systems such as SoMeDoSEs are capable of continuously monitoring Twitter, the 489 numbers of drugs and reported side-effects should steadily accumulate over time. To address this, in the future it may be possible to extend monitoring of social media 492 to include additional platforms. For example, Weibo is a Chinese microblogging site akin 493 to Twitter, with over 600 million users as of 2013. Clearly, tools will have to be adapted to 494 deal with multilingual data processing or translation issues, while differences in cultural 495 attitudes to sharing medical information may present further challenges. Extensions 496 to the statistical approach may also result in improved performance. Methods such 497 as the joint graphical lasso allow the generation of a graphical model using data with 498 observations belonging to distinct classes [92]. For example, two covariances matrices 499 generated using data from Twitter and SIDER could be combined in this way, resulting 500 in a single model that best represents both sources. An extension to the graphical lasso 501 also allows the decomposition of the sample covariance graph into smaller connected 502 components via a thresholding approach [93]. This leads not only to large performance 503 gains, but significantly increases the scalability of the graphical lasso approach.

505
Another caveat to consider, common to many other repositioning strategies based 506 on side-effect similarity, is that there is no evidence to suggest whether a repositioning 507 candidate will be a better therapeutic than the drug from which the novel indication was 508 inferred. While side-effects can provide useful information for inferring novel indications, 509 they are in general undesirable and need to be balanced against any therapeutic benefits. 510 Our model does not attempt to quantify efficacy or side-effect severity, but it might 511 be possible to modify the natural language processing step during Twitter mining in 512 order to capture comparative mentions of side-effects, since tweets discussing both the 513