Introduction

One major concern in mental healthcare is the treatment and diagnosis gap. The rate for the minimally adequate treatment of major depressive disorder is as low as 23% in high income countries (Moitra et al., 2022). Many people with depression remain undiagnosed and, therefore, do not receive recognition or treatment. In addition to a lack of mental health literacy, access barriers impede contact with mental healthcare services. Among these constraints are direct or indirect costs (e.g., for public transport), restricted mobility (e.g., due to long distances to the nearest psychiatrist/psychotherapist) and long waiting times (Köhnen et al., 2019). Digital mental health, and specifically smartphone applications (apps), have been suggested to offer a low-threshold approach to diagnosis and psychotherapy, thus, remedying the treatment and diagnosis gap (Fiske et al., 2019; Mayer et al., 2019), and promote justice in the healthcare system. The apps “may address these constraints [of mobility, cost, and motivation] and provide better screening for depression” (Al Hanai et al., 2018, p. 1716).

This article provides an ethical analysis of questions of epistemic injustice that arise in the context of one emerging technology: so-called ‘digital phenotyping’ apps. Epistemic injustice describes a “wrong done to someone specifically in their capacity as a knower” (Fricker, 2007, p. 1), and examines how people are wronged specifically as epistemic agents, i.e., agents of the production and distribution of knowledge (Dotson, 2014). Digital phenotyping apps are typically based on an artificial multimodal neural network model,Footnote 1 and analyze text and/or audio sequences harvested from social media, video data, data from global positioning systems, or any other interaction with one’s mobile device (Al Hanai et al., 2018; Su et al., 2020). The data is analyzed for voice tone, facial expressions, content of text and speech, as well as other similar characteristics, and translated into clinically relevant ‘digital biomarkers’ (Jain et al., 2015; Torous et al., 2021). The latter are understood as “objective, quantifiable, physiological, and behavioral measures that are collected by means of digital devices that are portable, wearable, implantable, or digestible” (Babrak et al., 2019, p. 93). The data is collected at any time the smartphone is not completely switched off (Torous et al., 2021). All the traces users leave behind when ‘moving’ around in the digital space and interacting with their device result in a ‘digital phenotype’ of the individual user (Spinazze et al., 2019). The digital phenotype can be understood as “one’s digital footprint” (Jain et al., 2015, p. 463), which is expected to reveal different kinds of information about the app user, including indicators of their mental states (Leaning et al., 2024).

This technology is currently being developed for apps to predict the onset (or recurrence) of depressive episodes,Footnote 2 based on a user’s digital phenotype. In this model, the app’s algorithm compares a user’s current digital phenotype to those of users with clinically confirmed depression and issues a warning if a depressive episode is likely. Many existing mental health self-tracking apps already on the market, such as Mindstrong or Behavidance, use digital phenotyping as one among other functionalities, such as data collected based on user-app interaction, for example, by filling in questionnaires or keeping a mood diary (Leaning et al., 2024; Polhemus et al., 2022). The main feature that distinguishes the apps under scrutiny in this article is that they exclusively use digital phenotyping to assess a user’s mental state. Such apps are currently being trialed in clinical studies: A recent systematic review on the use of digital phenotyping in the context of depression identified 24 studies that exclusively used passive data to predict the onset of depressive episodes in users with a formal diagnosis of depression (Leaning et al., 2024). In these studies, the apps’ output based on digital phenotyping is correlated with clinical measures assessing depressive symptoms, suggesting that the possibility of replacing clinical assessments with digital phenotyping is investigated. Digital phenotyping is also examined in combination with active data gathering to detect the early onset of depression in healthy populations. A prospective study on the app “Warn-D”, for instance, follows a cohort of students to ‘warn’ those who have a risk of being depressed, based on the app’s analysis (Fried et al., 2023).

Given these current developments in mental health services and e-mental health, it is likely that digital phenotyping – and apps that operate solely on that basis – will be established as an alternative to standard screening tools for depression, both for clinical purposes in routine healthcare encounters and for direct-to-consumer use. While promising in terms of user friendliness and resource efficiency, the development of mental health apps that operate exclusively on digital phenotyping and gather data passively is connected with multiple epistemological and ethical questions that are part of an ongoing debate (Baumgartner, 2021; Coghlan & D’Alfonso, 2021; Stanghellini & Leoni, 2020; Tekin, 2020). Ethical considerations concerning e-mental health have generally focused on a variety of topics, such as autonomy (Schmietow & Marckmann, 2019), responsibility (Martinez-Martin & Kreitmair, 2018), and inequality (Skorburg & Yam, 2021). Another ethical topic that has recently received more attention in the use of machine learning is epistemic injustice (Hull, 2023; Pozzi, 2023a, b; Slack & Barclay, 2023; Symons & Alvarado, 2022).

We focus in this paper on apps that operate on passive data collection and digital phenotyping to screen for depressive episodes, in the following referred to as ‘DP-apps.’ We examine to what extent DP-apps generate cases of epistemic injustice by paying particular attention to their impact on epistemic agency in different contexts in which the apps can be used, including healthcare encounters and direct-to-consumer use. The paper draws on the existing literature on epistemic injustice in mental healthcare, scholarly and clinical discussions on the concept of mental illness, and the features of digital phenotyping and its underlying technological approach. Firstly, we will outline the conceptual background and further procedure, which includes three hypothetical scenarios that we use to illustrate and clarify our ethical analysis (2.). Subsequently, we argue that DP-apps have the potential to undermine different preconditions for epistemic agency and can lead to three different forms of epistemic injustice: hermeneutical (3.), testimonial (4.1) and contributory injustice (4.2). Throughout our analysis, we consider the influence of DP-apps on epistemic practices on an individual level, in healthcare settings, and on the structural level. Our overall objective is to make risks of epistemic injustice visible, promote sensitivity among all stakeholders, and develop recommendations for further development and implementation of this technology.

Conceptual background and methodology

Theories of epistemic injustice focus on harms that affect people as ‘epistemic agents.’ The latter term refers to individuals who use shared epistemic resources to collect, generate, and distribute knowledge, justified belief, or understanding, and to revise existing epistemic resources if necessary (Dotson, 2014, p. 115). ‘Epistemic resources’ refer to language, concepts, theories, and standards of judgment that epistemic agents use to formulate propositions and make sense of social experiences (Pohlhaus, 2012). We consider DP-apps as ‘epistemic tools’ that users utilize for a specific epistemic task, i.e., to learn about their mental well-being by receiving information on their personal current risk of depression.

To be a successful epistemic agent, different preconditions need to be met, which can broadly be divided into aspects regarding epistemic resources and uptake. Firstly, a person relies on having fitting epistemic resources to make sense of their experiences and share them. Fricker, for instance, argues that the development of the concept of sexual harassment helped many women to understand and name a specific form of gendered violence at the workplace (2007).Footnote 3 Gaps in collectively shared epistemic resources can lead to hermeneutical injustice, i.e., a disadvantage in making sense of or communicating important social experiences due to structural marginalization in shaping epistemic resources. Secondly, a person’s testimony needs to receive adequate social uptake on both the interindividual level, which is especially important for healthcare encounters, and the structural level. In the case of testimonial injustice, a person’s testimony receives deflated levels of credibility based on the prejudice against their social identity.Footnote 4 Thirdly, contributory injustice, a concept introduced by Dotson (2012), arises if a speaker’s testimony is unintelligible because they use epistemic resources developed in marginalized communities. If these resources are actively ignored by a listener, the latter is unable to understand the speaker’s contribution. Contributory injustice can affect people from marginalized communities whose epistemic resources are ignored by people outside the community due to societal power relations. It is often rooted in the lack of necessary social, political, and economic power to successfully spread their epistemic resources beyond their communities.

In the next chapters, we will analyze how DP-apps can impact each of these three preconditions, leading to hermeneutical (3.), testimonial (4.1), and contributory injustice (4.2). We argue, more specifically, that

Hypothesis 1: the use of DP-apps based solely on passive tracking generates or exacerbates hermeneutical injustice for users;

Hypothesis 2: the use of DP-apps may trigger testimonial injustice against users in healthcare encounters; and

Hypothesis 3: the use of DP-apps may increase the risk of contributory injustice for people with experience of depression.

Considering the broad range of mental illness, differences between severe and common mental health conditions, and the current state of technological development, we focus on one specific disorder, namely, a depressive episode. We use the term ‘user’ to designate people who experience mental health symptoms and use mental health services, such as local consultations and digital tools.

Methodologically, we introduce three different hypothetical case scenarios, telling the stories of Blake, Kay, and Noa, to clarify and exemplify specific points of our analysis.Footnote 5 While we think that the case scenarios reflect a range of plausible and relevant constellations from a perspective of epistemic injustice, they are not meant to encompass all possible and ethically interesting cases. The starting point for all scenarios is as follows: Blake, Kay, and Noa, who have no prior history of mental health concerns, download a free DP-app (that relies exclusively on passive data analysis) to track their mental health and risk of depression. They regularly use their smartphones for communication, information, and entertainment, while the DP-app runs in the background. Some weeks after they have downloaded the app, the following happens:

Scenario 1 – Blake is sad.

Blake experiences a constant sadness, a loss of motivation and appetite, and problems with sleep that last more than two weeks. Blake stops leaving their bed and responds less to the messages of their friends. Blake begins to wonder whether they might be seriously depressed. Some days after having that thought, the app, by itself, issues a warning that Blake’s symptoms could be the beginning of a depressive episode.

Scenario 2 – Kay has a conflict.

Kay is currently involved in a complex conflict with their partner, while preparing for an important exam. Kay feels sad, anxious, and stressed about the situation. To cope with the situation, Kay decides to focus on their work and to reduce contact with all the people involved in the conflict until they have time for it. They have less contact with friends and family, use their phone less, and sleep less, because they study in the evenings. Kay is satisfied with their coping strategy and feels confident that things will be OK after the exam. The app issues a warning that Kay could have a depressive episode.

Scenario 3 – Noa feels alienated.

Noa has felt sad, hopeless, numb, and alienated from their body for several weeks. Noa is a good student and manages to keep up their social life and study. They think they might have a depressive episode, something which they experience as a very profound and spiritual experience, even though painful, because they take this to be an occasion to think about their priorities and choices in life. Feeling hopeless, Noa thinks that no person could possibly guide them through this experience. The app does not issue a warning.

Resources-related aspects – DP-apps and hermeneutical injustice

A first important aspect for a successful epistemic agency concerns the availability of fitting epistemic resources, such as concepts, theories, or images, to make sense of a social experience. As Fricker points out, epistemic agents can be harmed if adequate epistemic resources are not collectively available due to social power relations within a society: “hermeneutical injustice occurs […] when a gap in collective interpretive resources puts someone at an unfair disadvantage when it comes to making sense of their social experiences” (2007, p. 1). If structural power relations negatively affect the availability or adequacy of epistemic resources that people need to make sense of or communicate their experiences, for example, concepts of illness, their experiences are left ‘obscured’ or ‘badly understood.’ This disadvantage is unfair if it originates from what Fricker calls “hermeneutical marginalization,” i.e., when an individual or a group is excluded from participating equally in significant epistemic practices (e.g., education, science, politics, media) and contributing to collectively shared epistemic resources (2007, p. 152). Fricker gives the example of a woman who suffers from a depressive episode after giving birth to her son (2007, p. 148). Lacking the concept of postpartum depression, which is not part of collectively shared concepts in her society, she fails to make sense of her experience. Instead, she feels guilty about her alleged personal deficiencies. Since postpartum depressions are, with a prevalence of around 17%, very common among mothers and have a complex bio-psychosocial etiology (Shorey et al., 2018), the woman in Fricker’s example, is disadvantaged in understanding what she is going through. According to Fricker, her misinterpretation can be traced back to the fact that women and people with mental illness cannot participate equally in the epistemic practices that build and shape concepts of women’s mental health within a patriarchal culture, and are, thereby, hermeneutically marginalized. Cases of hermeneutical injustice are, thus, characterized by three central features: (1) gaps in collectively shared epistemic resources (2) result in significant disadvantages in making sense of or communicating a social experience, (3) whereby the gaps are caused by the hermeneutical marginalization of those who are having this social experience, based on power structures. In the following, we argue that the use of DP-apps, because of their functionality, generates or promotes cases of hermeneutical injustice. We start by (1) describing hermeneutical gaps engendered by a bias towards biostatistical understandings of mental illness built into the functionality of DP-apps, and proceed by (2) discussing these gaps in relationship to the disadvantages they cause in users when it comes to making sense of their experiences. (3) We suggest that these disadvantages derive from the hermeneutical marginalization of people with depression in the apps’ development, and further contribute to their hermeneutical marginalization.

(1) Hermeneutical gaps caused by DP-apps stem from their technological approach, which builds a bias towards a biostatistical understanding of depression into the app’s functionality. In the literature, different forms of bias in training datasets and algorithmic functionality that distort predictions systematically have been described (Alvarado & Morar, 2021; Klugman, 2021; Müller, 2021). In addition to well-documented artificial intelligence (AI) algorithm biases that lead to erroneous or discriminatory analyses when it comes to marginalized groups (Norori et al., 2021), we want to highlight a more profound, conceptual bias, which would also arise in the case that representative and unbiased datasets were available. The technological approach is not neutral in theoretical terms, since the very idea of detecting depressive episodes via passive data gathering and digital phenotyping presupposes a biostatistical account of disease. Proponents of such accounts argue that disease can be defined objectively by referring to states of normal functioning based on statistical normality within the relevant reference class (Boorse, 1977). Similarly, on that account, mental disorders describe states of (e.g., behavioral, emotional, and cognitive) functioning below a ‘normal’ threshold compared to the reference class, independently of social norms and values. In this vein, passive data gathering uses objectively measurable and statistically processable data of a person’s observable functioning and compares it to the profile of other users to predict the onset of a depressive episode. Thus, it is built into the app’s underlying functionality that detecting a depressive episode is based on measurable statistical deviance.

Some depressive episodes involve symptoms that correspond to deviances in functioning that are covered by the app’s measurements, such as Blake’s behavioral changes (staying in bed, having fewer social contacts). In fact, a study by Moshe et al. (2021), that investigated the use of passive data in digital phenotyping to predict changes in depressive and anxiety symptoms, found statistically significant, but weak correlations between time in bed and depression or the variability of locations visited and depression. However, depressive episodes frequently involve feelings of pain, fatigue, burning, tension, numbness, or heaviness (Fuchs, 2013). These symptoms are better described through phenomenological approaches (e.g., Carel, 2016; Fuchs, 2013; Wardrope & Reuber, 2022). Phenomenological approaches to mental illness support that illness cannot be understood using biostatistical models because illness inherently involves changes in how the person relates to their body, and their social and physical world (Carel, 2007). In keeping with this, Fuchs (2013) argues that depression is more accurately understood as a change in a person’s ‘interaffectivity’ and their ‘intercorporeality,’ i.e., the way a person relates to their lived body and their interpersonal environment, rather than changes in individual functioning. This is particularly apparent in Noa’s case. Significant dimensions of Noa’s suffering, such as the feeling of alienation and numbness, are not detectable in their digital phenotype, because they do not translate into measurable changes. In addition, social constructivist approaches to depression assume that the definition of mental disorders inherently depends on social practices and value judgments; proponents stress that mental disorders cannot be defined or identified without referring to social factors (Horwitz, 2012; van Riel, 2016). Horwitz and Wakefield (2012) argue that not considering the social context of a change in functioning may lead to the pathologizing of behavioral or emotional responses to stressful social situations. It seems that DP-apps, by neglecting the social context, risk misidentifying states as pathological that are norm-deviant, but otherwise unproblematic. Kay’s distress, for instance, is caused by a social conflict and the important exam they have to prepare for. Their decision to concentrate on their exam and postpone solving the conflict appears as a useful coping strategy to an external stressor. It is conceivable that after the exam and the resolution of the conflict, Kay will be emotionally recovered. Some would, therefore, argue that diagnosing a depressive episode in Kay would mean to medicalize a social or interpersonal problem, and that this would be unwarranted (Horwitz & Wakefield, 2012).

Consequently, biostatistical accounts of mental disorders have been criticized to fail to acknowledge the importance of subjective meaning-making, value judgments, social norms, and structural contexts in deciding which condition qualifies as a mental disorder (Carel, 2007; Kingma, 2007). DP-apps do not require any active user input or value judgments. First-person evaluations and social context are not clearly related to a person’s observable behavior. Uusitalo and colleagues caution against the use of AI for detection and diagnosis in psychiatry because “[l]eaving aside salient features of subjective experience and social factors runs the risk of simplifying the categories to the extent that the phenomenon is misconstrued” (2020, p. 3). Consequently, DP-apps, by design, neglect social, structural, and phenomenological aspects of depression. DP-apps can detect cases of depression corresponding to the biostatistical model of depression, and systematically miss other forms of depression. Additionally, DP-apps may wrongly detect non-pathological states as depressive, because they fail to acknowledge the social context and a person’s own interpretation of a situation. DP-apps, understood as epistemic tools which users utilize to make sense of their social experiences, engender hermeneutical gaps and may produce misinterpretations.

(2) This is epistemically disadvantageous for users, because misinterpretations caused by DP-apps may complicate a user’s understanding of their lived experiences, depending on how much users rely on the app. In addition to the hermeneutical gaps that result from the design of DP-apps, AI-based systems are characterized as widely opaque black boxes, which is a major issue in AI or (big) data ethics (Müller, 2021). Opacity exacerbates the disadvantages in understanding, i.e., contextualizing and evaluating the app’s output. Opacity implies that a system’s output, even if it accurately predicts a certain outcome, remains unexplained and (perhaps) unexplainable, to the extent that the result is generated by machine learning algorithms (Alvarado & Morar, 2021; Burrell, 2016; Klugman, 2021; Theunissen & Browning, 2022; Zarsky, 2016).

Other authors have argued that the opacity of AI systems may reinforce epistemic injustices that are caused by such systems (El Kassar, 2022; Symons & Alvarado, 2022). El Kassar’s (2022) analysis of AI systems for automatic gender recognition highlights that the system’s misinterpretations of a person’s gender can lead to self-doubt in an affected individual. The black box problem exacerbates this self-doubt by making it difficult for individuals to understand on which features their gender was misinterpreted. The problem of opacity is relevant to both the users affected and IT experts who are incapable of sufficiently understanding how a particular machine learning algorithm based on a specific dataset generates a certain output.

Similarly, it remains unclear in machine learning based DP-apps how exactly the app generated a certain output; neither Blake, Noa, nor Kay might be able to sufficiently understand what patterns the algorithm used to predict the risk of a depressive episode. Medical and technical aspects that imply presuppositions (or defaults) about ‘normal’ digital interaction (such as the underlying model of mental illness, its statistical conceptualization, and digital implementation) are difficult to assess from a lay perspective. The app’s opacity makes it harder to gain a critically reflected understanding of one’s mental health issues in comparison to someone who has access to qualified human mental healthcare professionals who can be asked to make their criteria and procedures transparent and meet the individual needs for information and support. DP-apps, as an epistemic tool, can compare an individual’s digital phenotype to a statistical pattern. However, this does not (yet) provide a meaningful explanation and justification for a generated output with potentially far-reaching consequences for the user affected. This may be connected to further behavioral or cognitive changes that may also be disadvantageous for users. In the case of a positive result, Kay, who believes they are mentally healthy, is likely to be confused, unsettled, or skeptical about receiving a warning about depression. To gain a deeper understanding, Kay needs to know more than that there are (clinically relevant) signs of depression in their digital phenotype. Starting to reflect upon the result, they also need to know how and why certain aspects of their behavior are deemed pathological.

We have argued so far that misinterpretations in the usage of DP-apps are possible because DP-apps as epistemic tools involve hermeneutical gaps. Bias in the underlying technological and theoretical approach puts app users at a disadvantage in understanding their mental issues, which is exacerbated by opacity. This is especially relevant for people who lack psychiatric expertise or personal experience with mental illness, have difficulty accessing alternative mental healthcare services, suffer from severe symptoms that significantly impair self-perception, or are particularly vulnerable to epistemic harm or social stigma because of marginalized group membership. These preconditions make it more difficult for people to contextualize the app-generated outputs, evaluate them against the background of their own experience and knowledge, and, if necessary, scrutinize them critically.

(3) We will now argue that these hermeneutical gaps stem from the hermeneutical marginalization of the people affected and that the gaps can also reproduce this hermeneutical marginalization. This necessitates considering the conditions under which these apps are developed, taking into account social power dynamics. The development of DP-apps rests on the idea that the detection of depression is possible with an algorithmic system that relies on a biostatistical understanding of mental disorders. Feeding the model with passively gathered data via digital phenotyping is considered to be an adequate instrument for recognizing complex mental phenomena, such as depression. This reliance on data collection as the best means of approaching the world has been dubbed ‘data fundamentalism,’ according to which “massive data sets and predictive analytics always reflect objective truth” (Baumgartner, 2021, p. 6). Data fundamentalism is closely related to biomedical approaches within contemporary psychiatry, which, despite the advocacy for more social, relational, or phenomenological approaches, remain dominant in psychiatric research.Footnote 6 It has been argued that psychiatric research based on a biostatistical understanding excludes service users and perceives them as mere objects of research. They are not actively, as epistemic agents, participating in shaping diagnostic categories, understandings of mental illness, and the development of study interests and designs (Bueter, 2019; Crichton et al., 2017; Kidd & Carel, 2018; Kidd et al., 2022; Lakeman, 2010; Miller Tate, 2019). This trend is reproduced in the development and implementation of DP-apps: none of the studies included in the systematic review on digital phenotyping to predict the onset of a depressive episode used participatory methods (Leaning et al., 2024), which are perceived as an adequate and efficient means of allowing users to contribute to the research as epistemic agents, fostering both social and personal transformation (Thomas et al., 2023). Accordingly, the underlying model and the design of studies on DP-apps are indicative of users’ hermeneutical marginalization in the development of DP-apps.

Additionally, we assume that the implementation of DP-apps can further reinforce the hermeneutical marginalization of service users. Given that DP-apps are based on a biostatistical understanding of depression that does not rely on users’ subjective explanations, value judgments, and experiences, their implementation may further strengthen the trend of marginalizing users’ experiences of depression: if their accounts of depression are irrelevant to identifying depression, their value for psychiatric research may be misjudged and their voices may be overlooked in healthcare encounters. We conclude that there are cases in which people suffer hermeneutical injustice by using DP-apps as epistemic tools because (1) of the gaps in the epistemic resource that (2) put users at a disadvantage when it comes to making sense of their experiences (3) in the context of users’ hermeneutical marginalization within contemporary psychiatric research.

Uptake-related aspects

Epistemic agency and DP-apps in healthcare encounters

In addition to having fitting epistemic resources, the social uptake of the testimony in question is necessary for sharing knowledge successfully. As Dotson (2011, p. 237) points out, a successful linguistic exchange does not only depend on the speaker’s epistemic and communicative skills but also on the hearer’s acknowledgment of a person’s speech act as it is meant to be taken. When a person testifies knowledge, it can only be successful if they are recognized as a knower and their communication is recognized as a testimony. The testimony must, therefore, receive appropriate social uptake as being credible. Uptake can be impacted on two levels: firstly, on the individual level, if a speaker is not acknowledged as a knower, and secondly, on the structural level, if the epistemic resources of a social group are not acknowledged as such.

Starting with the first, Fricker (2007) notes that people constantly, and often unconsciously, check a speaker’s credibility, i.e., their honesty and competency, in linguistic exchange. Linguistic exchange is located in a context of social power relations which shape credibility assessments. Fricker speaks of “testimonial injustice” (2007, pp. 28 ff) if prejudice based on a speaker’s social identity unduly influences the listener’s perception, leading to a credibility deficit. In the following, we aim to argue that DP-apps can trigger testimonial injustice in healthcare settings, especially in GPs’ offices.

We first consider the baseline situation in which users and healthcare professionals interact. It has been argued that people who show or report psychiatric symptoms are particularly vulnerable to experiencing testimonial injustice from healthcare professionals during healthcare encounters based on negative stereotypes about mental illness (Crichton et al., 2017; Drożdżowicz, 2021; Faissner et al., 2022; Kurs & Grinshpoon, 2018; Scrutton, 2017). Fricker (2007) suggests that in our everyday communicative practices, credibility judgments are often made spontaneously in our automatic processing of information. Pozzi (2023b, p. 538) further argues that “credibility assessments are particularly prone to be distorted by biases and stereotypes connected to a person’s social identity because to form these, we usually rely on so-called markers of trustworthiness.” That means, people rely on markers that identify a person as a reliable knower to assess whether believing a person will generate knowledge, and stereotypes are often used as markers of trustworthiness.

Empirical evidence indicates that people are biased against those with depressive symptoms due to stereotypes on mental illness. Negative stereotypes, such as ‘mad,’ ‘crazy,’ ‘incoherent,’ ‘delusional,’ or ‘cognitively disturbed,’ are common among the general population and remain stable over time (Angermeyer et al., 2013; Thornicroft et al., 2007; von Kardoff, 2017).Footnote 7 The same appears to hold true for healthcare providers: According to a study investigating implicit biases and depression, internal medicine residents showed a significant association between depression, negative attitudes, and ideas about the uncontrollability of depression (Crapanzano et al., 2018). The literature on stereotyping and implicit biases indicates that these have important effects on clinical practice, including the choice of further diagnostic steps and treatment decisions (Puddifoot, 2019).

Being stereotyped and disregarded in healthcare settings is a common experience for people with mental illness. One example is that users with mental health diagnoses often face difficulties in making their somatic health complaints or drug-related side effects heard by their physicians (Faissner et al., 2022; Golomb et al., 2007). Mental health diagnoses strongly shape how a person is perceived in healthcare encounters: A qualitative study revealed that in the presence of a mental health diagnosis, medical staff tend to attribute all symptoms to that diagnosis, so that other relevant diagnoses are missed, including bone fractures (Shefer et al., 2014).Footnote 8 Additionally, it is a special feature of psychiatric interviews to interpret “patient’s statements as manifestations of illness” (Sakakibara, 2023, p. 490). More specifically, not being able to feel one’s own sadness is assumed to be a possible symptom of depression. According to the DSM-5, an influential diagnostic manual in mental healthcare, “in some cases, sadness may be denied at first but may subsequently be elicited by interview” and, furthermore, “in some individuals who complain of feeling ‘blah,’ having no feelings, or feeling anxious, the presence of a depressed mood can be inferred from the person’s facial expression and demeanor” (APA, 2013, p. 163). Thus, the DSM-5 seems to encourage medical staff to question patients’ testimonies, including denials of sadness, and to privilege externally observable signs of depression over a person’s self-report. DP-apps, thus, come into play in a baseline “credibility economy” (Fricker, 2007) in which service users may be perceived as unreliable givers of knowledge, and their testimony is under special scrutiny. Footnote 9

In the following, we will examine how DP-apps may influence this epistemic baseline situation. Pozzi (2023b) argues that in medical contexts, AI-based algorithmic risk scores may be treated as markers of trustworthiness to assess the credibility and relevance of a user’s testimony, especially in cases in which the symptoms reported by a patient are not related to an easily detectable source (i.e., symptoms that cannot easily be traced back to a specific medical test vs. patient reports pain and X-ray shows a broken bone). Baumgartner suggests: “self-tracking from digital devices is framed as providing trustworthy data in contrast to the individual body’s perception which is marked as untrustworthy or at least not reliable enough to be the sole ground on which diagnosis should be based” (2021, p. 6). The output of DP-app’s may be treated as a marker of trustworthiness of a users’ credibility because it could be interpreted as objective evidence, comparable to a diagnostic test with a certain reliability. This idea is supported by the literature suggesting that medical professionals exercising under the biomedical paradigm are trained to privilege data over service users’ testimonies (Drożdżowicz, 2021; Kidd & Carel, 2018; Slack & Barclay, 2023).

In order to substantiate this claim, it is helpful to further consider the context in which DP-apps are implemented. A meta-analysis on the diagnosis of depression in primary care suggests that GPs may not be adequately trained to detect and diagnose depression and often fail to do so (Mitchell et al., 2009, 2011). In fact, GPs were more likely to falsely diagnose a person with a moderate or severe depressive episode than to correctly diagnose depression: “for every 100 unselected cases seen in primary care, there are more false positives (n = 15) than either missed (n = 10) or identified cases (n = 10)” (Mitchell et al., 2009, p. 609). General practitioners who feel insecure in diagnosing depression may welcome external evidence perceived as objective, and may treat the app’s output as a marker of trustworthiness. However, as argued above, DP-apps are not neutral as they privilege a narrow understanding of depression, based on dysfunction and statistical abnormality. Thus, DP-apps can be considered unreliable markers of trustworthiness. Starting from this baseline situation, we will now assess two different constellations to analyze how DP-apps may interact with users’ credibility assessments and epistemic agency: (1) the interpretation from the app and the user misaligns, and (2) the interpretation of the app and a user aligns.

(1) In the case of Kay, there is a misalignment between the app’s and Kay’s interpretation of the situation: Kay thinks that they are okay, while the app issues a warning that Kay may have a depression. Such cases can be understood as ‘false positives,’ i.e., an identification as pathological of a non-pathological state. If Kay decides to visit a GP, we see a considerable risk that the GP may unduly privilege the app’s output over Kay’s testimony, and that Kay may suffer testimonial injustice. The reason for this claim is that implicit biases on depression, insecurities in diagnosing depression, and a general bias towards objectifiable tests in medicine all raise the epistemic weight of the app’s output at the expense of Kay’s credibility. Additionally, based on the assumption that people with depression might deny sadness, a GP may interpret Kay’s assertion to be okay as a symptom of depression. Yet, in such a case, the GP would fail to acknowledge that users, such as Kay, have direct access to their own experiences and are thereby epistemically privileged (Drożdżowicz, 2021). The GP would, thus, treat Kay as a passive ‘source of information,’ rather than an active informant, i.e., a subject of knowledge, and, thereby, wrong them (Sakakibara, 2023).Footnote 10

This claim can further be empirically substantiated by findings from qualitative research: The authors in a study on GP treatment recommendations in mental health conditions found that even though three-quarters of the patients initially resisted treatment recommendations, most patients were prescribed medication or referred to talking therapy (Ford et al., 2019). This might indicate a possible trend of disregarding users’ concerns in primary care. In a context where GPs are negatively biased against users with mental health symptoms based on structural stigma and are not well trained to diagnose depression, DP-apps are likely to be treated as markers of trustworthiness, and may thus influence the ‘credibility economy’ to the disadvantage of users. Testimonial injustice is, thus, likely to occur in the context of false positives when a user denies being depressed.

The situation is different in the case of Noa: The app does not issue a warning, even though Noa is suffering and might benefit from mental health support. Noa’s case can be described as a ‘false negative.’ Based on our prior argument and the fact that GPs are likely to wrongly identify depression in non-depressed people, we assume that if Noa visited a GP and told them that they felt depressed, the GP would take their complaints seriously. Yet, the app may, nevertheless, influence Noa’s epistemic agency. If Noa expects the app to confirm their assumption that they are depressed and the app does not issue a warning, Noa may question their own intuition and be discouraged from seeking social or medical support, as argued above. This requires further research focusing on the interaction between apps and users’ epistemic agency.

(2) In cases of alignment between the app and a user, it seems prima facie unproblematic if a GP takes the output from the app into account. The app provides information that a user cannot provide by themselves, i.e., the analysis and statistical comparison of a users’ digital phenotype. The app’s output seems comparable to external evidence (such as medical test results) that may help a GP to build an informed clinical judgment. Arguably, DP-apps may, nevertheless, support epistemically unjust behavior. We usually accept a person’s account of their physical and mental well-being, respecting their epistemic authority and granting them an epistemic privilege regarding bodily sensations and experiences. However, based on the app’s functionality, this information is not relevant to detecting depression (see the discussion of hermeneutical injustice), therefore, the app’s functionality on a meta-level justifies discarding these forms of testimony. Thus, focusing on the app’s analysis, even if it offers additional information, instead of focusing on a person’s testimony in healthcare encounters may strengthen both the trend towards data fundamentalism described above and negative prejudices about the epistemic competency of people who report mental health complaints (Slack & Barclay, 2023).

To sum up, DP-apps may promote unintentional epistemically unjust behaviors by third parties, especially by GPs, which enhances the risk of testimonial injustice against users. This may have negative consequences on users’ epistemic agency and confidence in their judgments and increase the risk of inappropriate healthcare. As Fricker (2007, pp. 48 ff) explains, systematically experiencing testimonial injustice can lead to a person losing confidence in their belief or its justification, so that the belief no longer fulfills the conditions for knowledge, or they can lose confidence in their intellectual abilities. That person might come to view themselves as an unreliable knower. Kay, for instance, could experience reduced epistemic self-confidence and agency if their testimony systematically receives less attention and interest due to the focus on the information provided by their DP-app in the healthcare encounter. These effects can also occur outside the healthcare setting in the everyday context of user-app interaction. Receiving a warning, while not being depressed, could influence Kay’s interpretation and behavior. Wyatt (2018) speaks in the context of health apps and digital healthcare about ‘apptimism,’ “an uncritical, implicit trust in apps” based on the fact that “most of us carry and use our smartphone all day, so we trust everything it brings us.” While the use of DP-apps may influence a person’s confidence in their judgments on an individual level, DP-apps may also influence epistemic practices on a structural level. We will consider this in the next section.

DP-apps and social uptake – contributory injustice

Dotson (2012, pp. 33 ff) introduces another form of epistemic injustice in her critical discussion of Fricker’s work which touches both the individual and structural level of epistemic injustice, so-called ‘contributory injustice.’ The focus here lies on the social uptake of epistemic resources developed by marginalized groups on the structural level. While many analyses of mental healthcare practices focus on testimonial and hermeneutical injustice (e.g., Crichton et al., 2017; Kurs & Grinshpoon, 2018), contributory injustice has been less discussed so far (e.g., Miller Tate, 2019). However, the concept is similarly important for understanding and carefully analyzing epistemic injustices in mental healthcare.

Dotson (2012) points out that in the case of gaps in collectively shared epistemic resources, people in marginalized social positions often develop epistemic resources to make sense of their social experiences and reality which knowers from dominant social positions do not share. Consequently, different sets of epistemic resources are socially available to make sense of a given social experience, where the resources developed by affected people would be more appropriate to capture it. Thus, Dotson’s definition of contributory injustice allows one to recognize the epistemic resources and agency of marginalized users. Contributory injustice arises if knowers in dominant social positions ignore, distort, or disregard these epistemic resources due to willful ignorance. In such cases, the use of distorted or inadequate epistemic resources by the dominantly positioned listener prevents the successful social uptake of a person’s testimony, even if the speaker uses fitting concepts, so that their epistemic agency is undermined. As Dotson summarizes, contributory injustice arises in such situations where “an epistemic agent’s willful hermeneutical ignorance in maintaining and utilizing structurally prejudiced hermeneutical resources thwarts a knower’s ability to contribute to shared epistemic resources within a given epistemic community by compromising her epistemic agency” (2012, p. 32).

The following example will serve as an illustration of Dotson’s concept. While the biomedical understanding of mental health crises in terms of the DSM-5 is the standard approach which structures most medical interactions and social practices (e.g., covering treatment costs and sick leave), this is far from being the only approach to make sense of such experiences. Several groups of mental health activists and service users have developed various concepts that differ importantly from the medical model (Hoffman, 2019; Radden, 2012; Rashed, 2019). Some of these activist groups refer to themselves as Mad Pride, in reference to other social justice movements such as Gay Pride, and suggest that framing their lives with mental diversity as deficient constitutes an unwarranted epistemic impoverishment. Instead, as the Icarus Project, one of the Mad Pride self-help activism networks, suggests, differences in mental experiences and functioning should be understood as “dangerous gifts” which need “cultivation and care,” as they can be a source of both creativity and inspiration as well as hardship and suffering (DuBrul, 2014, p. 266). In their view, the medical model reduces mental disorder to dysfunction and deficiency, which need to be eliminated. This subverts service users’ possibility of living meaningful lives with mental diversity, in which positive aspects of mental diversity can be acknowledged as a valuable part of their identity, while aspects of mental diversity which cause suffering receive adequate care. Hence, it should be considered that experiences of mental distress can be interpreted variously by different people, which leads to diverse coping strategies.

The discussion of the Mad Pride approach exemplifies that different epistemic resources, in terms of concepts, theories, and interpretations, are available to refer to mental health crises or mental diversity, beyond the biomedical approach. While studies recently started to involve users in the development process, for instance, by integrating them into app design phases (see study protocol by Young et al., 2022), the apps, so far, do not consider alternative approaches to mental health, to the best of the authors’ knowledge. This can have important effects not only on epistemic practices within healthcare encounters but also on a structural level. We can consider mental health apps as epistemic tools which, by their presence and use, influence and shape the epistemic resources that are collectively available in society. They have the potential to support specific epistemic attitudes in the general public and mental health professionals, for instance, epistemic attitudes that favor biomedical interpretations of depressive symptoms, and neglect others, such as the Mad Pride movement or phenomenological approaches (for an overall discussion of this imbalance in mental health, see Huda, 2021; Kiesler, 1999).

In one of our case scenarios, Noa experiences their time of sadness as a spiritual and painful crisis which was important to their life and, therefore, nothing they would want to miss. However, their report risks not receiving an adequate uptake. If a listener relies on an account of mental illness according to which the latter is a pathology which is always a ‘bad thing’ to have, then Noa’s account of attributing meaning to their crisis remains unintelligible. DP-apps may thus contribute to shaping an epistemic landscape in which it becomes more difficult for people with depressive symptoms to share their experience if this experience does not align with the dominant mental disorder approach. Alternative interpretations and accounts of mental health crises may become inexpressible. The apps’ use may, thereby, contribute to undermining the epistemic agency of users with depressive symptoms, reinforce the marginalization of epistemic resources of non-dominant social groups and perpetuate contributory injustice.

Conclusions

Epistemic injustice is currently an under-researched issue in digital mental health. We have applied theories of epistemic injustice, social epistemology, and the philosophy of psychiatry to analyze questions of epistemic injustice in using passive data gathering and digital phenotyping as standalone technology to predict the onset of a depressive episode. Focusing on three scenarios, the analysis has revealed how a user’s epistemic agency – a significant part of that person’s humanity – can be harmed by three different forms of epistemic injustice. We have argued that functional principles of digital technologies and theoretical presumptions about key concepts in psychiatry interdepend, and that social power relations can be hidden in and exacerbated by digital artefacts. If DP-apps become a standard in mental healthcare, comparable to 24 h electrocardiograms in cardiology, this could reinforce the trend toward approaches that stress objectifiable data and third personal knowledge in contrast to subjective firsthand reports. Such new standards of evidence could amplify the existing risk for mental health service users to suffer epistemic injustices.

We suggest that DP-apps have the potential to shape our epistemic landscape and shared epistemic resources and, thereby, lead to the further marginalization of models and approaches to mental health that do not correspond to the biostatistical understanding of mental illness. More precisely, we have argued that

  1. 1)

    the use of DP-apps based solely on passive tracking generates or exacerbates hermeneutical injustice by involving a bias towards a biostatistical model of depression which is exacerbated by the problem of opacity in machine learning algorithms;

  2. 2)

    testimonial injustice can be triggered by the app in a social context in which GPs lack skills and knowledge on depression, and have negative implicit biases about people with depressive symptoms, especially in cases in which patients’ testimony and the apps’ output misalign; and

  3. 3)

    contributory injustice may be supported through the use of DP-apps because it supports biostatistical accounts of depression and marginalizes other approaches, thereby, raising the likelihood that a listener of a user’s testimony ignores alternative approaches to depression and the epistemic resources used by the user to communicate their experiences.

DP-apps are still under development. Thus, our work provides normative considerations before a technology is widely implemented. Given the mental health diagnosis gap, the lack of skilled healthcare personnel, and the persisting stigmatization of mental illness, such apps have the potential to be improved and considered to alleviate this tight situation. We, therefore, encourage stakeholders on all levels to recognize the dimension of epistemic risks and incorporate measures into the apps’ further research, development, and deployment to reduce these risks.

Russo et al. (2023) recently proposed a novel framework to examine the development of responsible AI. Their approach aims to bridge the divide between epistemic questions pertaining to AI, such as transparency and explainability, and ethical questions, such as how values and other normative considerations, for example, vulnerabilities, are implemented. They propose two practical strategies to improve AI development and assessment: focusing on the process rather than the output, and turning to more inclusive forms of assessment rather than pure expert assessment. Following up on their suggestion, it seems beneficial to involve users in the various phases of psychiatric and technological research, development, and design. Community-based participatory research seems a good avenue for future app development (Roberts, 2013). Such projects involve the prospective users of a technology actively as researchers and developers and center their interests and insights, striving for collaboration on an equal footing between academic researchers and users.

We suggest, based on our analysis, that apps which are solely based on passive self-tracking should not be used in the general population to predict the onset of depression, but if used, passive self-tracking should be combined with further tools for user-app interactions. It seems important that the app, by design, solicits the user’s interpretation of their situation including their social context. Furthermore, DP-apps should be developed in line with different approaches to depression, including phenomenological, social, and activist approaches. Such approaches may be implemented through more holistic questions and interaction possibilities, the provision of supplementary information on different approaches to mental health and depression, and tools for community building.

From a structural perspective, exclusionary practices in mental healthcare, research, and development should be further examined and abolished, thus, preventing unjust epistemic power relations from being perpetuated or reinforced by digital artefacts. Our epistemic practices must progress and become more inclusive, appreciative, and emancipatory – going hand in hand with technological developments.