Journal of Medical Internet Research

Background: While still in its infancy, Internet-based diabetes management shows great promise for growth. However, the following aspects must be considered: what are the key metrics for the evaluation of a diabetes management site? how should these sites grow in the future and what services should they offer? Objectives: To examine the needs of the patient and the health care professional in an Internet-based diabetes management solution and how these needs are translated into services offered. Methods: An evaluation framework was constructed based on a literature review that identified the requirements for an Internet-based diabetes management solution. The requirements were grouped into 5 categories: Monitoring, Information, Personalization, Communication, and Technology. Two of the market leaders (myDiabetes and LifeMasters) were selected and were evaluated with the framework. The Web sites were evaluated independently by 5 raters using the evaluation framework. All evaluations were performed from November 1, 2001 through December 15, 2001. Results: The agreement level between raters ranged from 60% to 100%. The multi-rater reliability (kappa) was 0.75 for myDiabetes and 0.65 for LifeMasters, indicating substantial agreement. The results of the evaluations indicate that LifeMasters is a more-complete solution than myDiabetes in all dimensions except Information, where both sites were equivalent. LifeMasters satisfied 32 evaluation criteria while myDiabetes satisfied 24 evaluation criteria, out of a possible 40 in the framework. Conclusions: The framework is based on the recognition that the management of diabetes via the Internet is based on several integrated dimensions: Monitoring, Information, Personalization, Communication, and Technology. A successful diabetes management system should efficiently integrate all dimensions. The evaluation found that LifeMasters is successful in integrating the health care professional in the management of diabetes and that MyDiabetes is quite effective in providing a communication channel for community creation (however, communication with the health care professional is lacking). (J Med Internet Res 2002;4(1):e1) doi: 10.2196/jmir.4.1.e1

Physicians are increasingly experiencing patients bringing Internet printouts to the consultation, although estimates of the frequency of this occurrence vary from 1-2% [9], to 58% [10], to over 70% [11]. The low prevalence of Internet-savvy patients of only 1-2% in the study by Potts and Wyatt [9], published in this issue of the Journal of Medical Internet Research, is surprising given the findings from consumer surveys on the frequency of accessing online health information that are cited above. Potts and Wyatt used a cross-sectional survey method asking respondents to retrospectively estimate the number of their patients who had used the Internet for health information in the past month. It is possible that recall bias may have led to an underestimate, but it is also likely that not all patients who consult the Internet reveal this to their doctor. The potential impact of the wide availability of online health information on the practitioner-patient relationship has been debated [12,13]. The Internet is a key influence in changing the balance of (knowledge) power between health care professionals and the public, empowering patients to become more involved in health care decision making and contributing to the deprofessionalization of medicine. Empirical research is beginning to investigate this impact [9,14].
Much of the limited evidence as to who the consumers of Internet health information are and what they are looking for comes from United States market-research surveys and Web-usage statistics, both quantifying the numbers of users and types of information accessed. Women are more likely than men to seek health care information on-line, and the highest proportion of usage is in those between 30 and 64 years old [15]. Use of the Internet for health information declines with age [16,17]. Despite the much-discussed "digital divide" between the higher-income, more-educated "have-nets" and the lower-income, less-educated "have-nots," there is no evidence of differences in health-information seeking by income group once they have on-line access [18,19].
A 1999 Harris Poll of 2000 US adults found that mental health issues dominated the most popular on-line health topics, with depression, bipolar disorder, and anxiety problems accounting for 42% of the use of the Web to find health information [20]. Further work to investigate which health topics are most frequently accessed on-line will be valuable. Most on-line health seekers are looking for a specific answer to a specific health question, and start by submitting a topic to a general search engine [21]. Far fewer users go to health portals or direct to a specific health site, and in general, users do not simply browse for health-related information [22]. Most users research specific health issues that are currently affecting a friend, relative, or themselves, frequently in connection with a visit to their doctor [15]. Few use health sites to communicate with health services, purchase pharmaceuticals, or participate in health-related chat-room discussions [15]. However, the majority of US users report a desire for more on-line interaction with their doctors, including e-mail consultations and reminders [23].
The research in this area is notable for a relative lack of qualitative work exploring the reasons behind on-line information seeking and the attitudes and behavior of health users towards the World Wide Web. Sociological work has led to a better understanding of the process of help-seeking behavior, but this work now needs to be updated to take into account the use of the Internet by patients and caregivers.
Users report valuing the convenience, anonymity, and volume of online information [15]. It is likely that individuals will use the Web at different points in the trajectory of illness and health care. The California Healthcare Foundation has attempted to categorize 3 types of user -the well; the newly diagnosed; and the chronically ill and their caregivers [24]. The well group carries out episodic searching for information relating to short-term medical conditions, pregnancy, and prevention issues. The newly diagnosed carries out very intensive searching for specific information, valuing the ease of access and broad range of information. The chronically ill and their caregivers carry out regular searching for information related to new treatments, nutrition advice, and alternative therapies. In addition, the latter 2 groups both value and use on-line communities and chat rooms. Several studies have shown the importance of the World Wide Web in providing social support, particularly to groups with chronic health problems such as diabetes patients [25. 26] or individuals with HIV [27].
It is likely that much of what is required from online information will be similar to that required from more-traditional routes: clear, well-presented information, with advice on further sources. However, there may well also be particular advantages that can be gained from the interactivity, personalization, and creative ways of managing knowledge that the Internet provides. For example, preliminary work suggests that the Internet may be an effective means of delivering psychological therapies [28].
In an era of user involvement, consumer empowerment, and the wide dissemination of information on health and health services, it is important that we identify who the consumers of online health information are, what their information needs are, and understand why and how they seek information online. This will enable information to be provided in ways that will have benefits from the worldwide to the individual level, and will inform current debates over the quantity and quality of information provision and issues of privacy and access. 16

Introduction
Management of patients with chronic conditions is a long-standing challenge for health care organizations. These conditions include diabetes, chronic heart failure (CHF), chronic obstructive pulmonary disease (COPD), Asthma, HIV/AIDS, and cancer. Patients are required to adopt lifelong exercise, diet, and drug regimens to maintain optimal health and avoid the complications of the disease. These complications can arise suddenly and be life threatening; therefore, patients with chronic diseases must be monitored constantly [1].
In recent years, Internet-based home telemonitoring systems have become available [2]. These sites leverage the Internet to record, measure, monitor, manage, and deliver health care. These information-technology solutions are creating a link between patient and caregiver that enables patients to supply a steady stream of valuable health information to caregivers. For example, diabetics can report their blood glucose readings, thus creating a history of their glucose control, which caregivers can use to evaluate the impact of a therapy (eg, short acting insulin) or the need for a different one [1]. Conversely, caregivers have the ability to provide their patients with crucial information and feedback on the management of their disease. For example, patients can be notified about screening appointments for the complications of diabetes. Therefore, patients benefit from an improved control and understanding of the disease; the ability to self-monitor from home reduces the burden of the disease. These solutions have resulted in dramatic improvements in disease management as measured by hospitalizations [1] and in an overall reduction in costs [3]. Further, patients report higher levels of satisfaction and better control of their conditions [4].
Diabetes is a chronic disease that affects 30 million people worldwide [5] and is the seventh leading cause of death in the United States [6]. The total annual economic cost of diabetes in 1997 was estimated to be US $98 billion. That includes US $44 billion in direct medical and treatment costs and US $54 billion for indirect costs attributed to disability and humanity [7] and a significant intrusion in the life of an individual. In managing diabetes, success is measured by positive change in prognostic indicators and outcomes. Below is a list of measurement criteria used in diabetes management [8,9,10]. Primarily, diabetes must be managed by the patient because it requires adherence to stringent dietary, physical, and medical regimes [8]. Internet-based diabetes management systems have the potential of reducing the burden of disease both to the patient and to the health care system. A recent study found that a high proportion of patients are willing to use Internet resources in the management of their disease [9]. The driving forces behind the proliferation of technology for disease management is the patients' demands to get real-time help, get real-time information, and keep in contact with their physician [1]. Not surprisingly, several diabetes-specific sites have recently appeared [10], including myDiabetes, Health Hero Network, LifeChart, LifeMasters, and Medifor.
The purpose of this paper is to review the patient's and the health care professional's needs in an Internet-based diabetes management solution and to examine how these needs are addressed in practice. An evaluation framework was constructed by grouping the requirements of an Internet-based diabetes management solution into 5 categories: Monitoring, Information, Personalization, Communication, and Technology. Two of the market leaders (myDiabetes and LifeMasters) were selected and evaluated to illustrate the use of the framework.

Methods
A literature search was conducted on medical databases (Medline, Pre-Medline, EMBASE, Cochrane, and PubMed) and a nonmedical database (Expanded Academic ASAP). The articles were identified by diabetes, chronic disease, internet, and technology. The searches were based on the following AND combinations of these keywords.
The exact search methodology differed among databases due to differences in their user interfaces. The methodology for each database is summarized in Table 1.
The abstracts of the articles retrieved by the searches were screened for relevance by the authors. The relevant articles were reviewed in order to compile a comprehensive list of requirements for an Internet-based diabetes management solution. These requirements were identified on the following basis: •

No interdependence between requirements
• Requirements can be assessed as present or not present • Equal implementation effort required to satisfy the requirements.
The implementation effort was quantified by the number of Use Cases as defined by the Universal Modeling Language (UML) [11,12]. The number of Use Cases ranged from 1 to 3 for each requirement. For example, the requirement defined as User defined parameter-Patient allows patients to define which health parameter they wish to monitor. This functionality requires 3 Use Cases: Identify User,Retrieve Parameters, and Save Parameters.
The requirements for Internet-based diabetes management were compiled into the criteria of an evaluation framework. The evaluation criteria were grouped into 5 categories: Monitoring, Information, Personalization, Communication, and Technology. The evaluation framework is presented in Table 2 and the evaluation criteria are discussed in detail in the "Evaluation Criteria" section of the "Results" section.
To illustrate the use of the evaluation framework, we have applied it to 2 existing Internet-based diabetes management systems: my Diabetes (www.myDiabetes.com) and LifeMasters (www.lifemasters.com). These 2 sites were selected because they were first movers in the arena of Internet-based diabetes management. MyDiabetes.com was one of the first sites going live in July 1999, shortly followed by LifeMasters.com in October 1999.
The sites were evaluated from November 1, 2001 through December 15, 2001. The evaluations were performed by 5 independent evaluators who were not aware of each other's ratings. All evaluators are computer literate and are familiar with the use of the Internet. The evaluators included a physician, 3 diabetic patients, and one author [CM]. All the evaluators registered separately with both sites (registration was free). Each evaluator was given a detailed description of the evaluation criteria, as described in the "Results" section, and Table 2, which describes the framework. The evaluators were also given an evaluation form to fill out (effectively Table 3 without results). For each criterion, the evaluators rated the sites as Yes if the criterion was satisfied or No if it was not satisfied. The evaluations were not supervised. Figure 1 and Figure 2 are screen shots of the entry forms for the daily glucose measurements forms at myDiabetes and LifeMasters respectively. This basic function of diabetes monitoring requires the user to input his or her blood glucose levels and the time of the readings. The data is stored, effectively creating a log of the glucose control of the patient. LifeMasters records glucose levels based on relative times such as Bedtime and asks for symptoms of high and low blood glucose as well as diabetic complications. Mydiabetes records the exact time of the blood glucose measurement but does not screen for any symptoms; this is done in another section of the site. The 4 terms were searched separately. The search results were combined using the AND condition. The search history is described below: diabetes chronic disease internet technology 1 and 3 1 and 4 2 and 3 2 and 4

AND All Fields All Years
The terms were searched in combination using the AND condition. The terms were searched in All Fields and for All Years indexed. diabetes AND internet diabetes AND technology chronic disease AND internet chronic disease AND technology

Statistical Analysis
Cohen's multi-rater kappa [13,14] was used to evaluate the agreement between raters for the evaluation framework as a whole. The multi-rater kappa was calculated with SPSS statistical software using the mkappasc procedure.

Evaluation Criteria
In this section, we describe in detail the evaluation criteria presented in Table 2.

Monitoring
Successful patient monitoring is reliant on efficiently extracting the relevant information from a patient without excessive intrusiveness to both patient and health care professional. Several parameters can be monitored; some examples are blood glucose, weight, blood pressure, diet, foot care, smoking, and nutrition [4,15,16]. Health care professionals should be able to designate which parameters they want to monitor and specify the ranges for each patient. The health care professional should be able to indicate which course of action the system should take if the readings are outside the ranges (eg, notification, triage).
Patients should also be able to designate parameters in an effort to improve self-management and goal setting (addressed in the "Personalization" section of "Evaluation Criteria") [17]; these, however, should be in addition to -and clearly differentiated from -the parameters specified by the health care professional. Patient-designated parameters should not be shared with the health care professional unless the patient desires that they be shared.
The degree of intrusiveness is a fundamental consideration when designing a diabetes management system. A major problem with many disease-management programs using information technology is that they try to collect too much data too often [1]. The desire to collect as much data as possible must be balanced with the disruption it may cause in a patient's life [4]. Successful strategies to reduce intrusiveness are based on automatic data gathering such as Glucometers that transmit glucose readings via the Internet and the use of simplified questionnaires for triage and screening. Intrusiveness to the health care provider is also an important consideration. If systems were designed to send alerts each time a patient's blood sugar readings are outside the normal parameters, the result would be many false alarms. Therefore, systems must have processes in place designed to not overwhelm health care professionals. These processes include entry validation, screening with the use of questionnaires, and patient involvement in the decision to launch an alert [1].
Effective patient monitoring is not limited to the collection of health data, it also requires a multidisciplinary approach, proactive outreach, and feedback.

Multidisciplinary Approach
The management of diabetes spans multiple medical specialties as evidenced by the use of multidisciplinary diabetes management teams. For example, an endocrinologist will manage medications and glucose levels, a dietitian will design an appropriate diet, and a psychologist will manage the mental aspect of dealing with diabetes. Internet-based diabetes management programs should be based on a multidisciplinary teamwork. This element consistently appears in successful chronic-disease management systems [18]. Patients should have the ability to interact with multiple specialists to manage each facet of their disease and the Internet can provide a communication channel to enhance this interaction. Successful evaluation tools have been created to effectively measure diabetes management outcomes along multiple dimensions (medical, social, psychological, etc.). Some examples of these tools are the Diabetes Quality of Life Measure (DQOL) developed for use in the Diabetes Control and Complications Trial (DCCT) [19] and the SF-36 [20].

Proactive Outreach
Proactive outreach and patient tracking are critical success factors for an Internet-based diabetes management system. Proactive outreach consists of notifications sent to patients to take their medication, visit the health care professional, or simply exercise once a day. The benefit of a proactive approach is well documented in the management of other chronic diseases such as chronic heart failure, where increased compliance and monitoring have resulted in a decrease in the number of hospitalizations for cardiovascular diagnoses and hospital days were reduced from 0.6 to 0.2 (P = .09) per patient per year [21]. Proactive outreach also applies to health care professionals. Reminders to physicians of routine testing for patients can be implemented in an Internet-based diabetes management system. A study determined that the use of a diabetes management system increases the likelihood of physicians ordering lipid-profile testing (19%) and retinal exams for their patients [22].

Feedback
The role of the patient has become central in the management of chronic disease; therefore, monitoring must integrate the patient [22]. A crucial aspect of patient integration is feedback. Patients must have the ability to review their medical data at anytime. On-line graphical tools can allow patients to visualize their medical information in much the same way a physician would. Feedback also provides a valuable motivational tool that improves compliance [1] and system usage, both of which are linked to an improved outcome in diabetes management [23].

Information
The Internet has always served as a source of health information; 70 million of the 110 million American Internet users have searched the Web for health information in the past year. Currently they can choose from 20,000 health care sites with 1,500 more coming on-line each month [24]. A successful Internet-based diabetes management system should be a source of quality information for the patients who use it. The quality of information on the Internet is a source of great debate. The low barriers to publication on the Internet result in the presence of vast amounts of low-quality and inaccurate information. This misinformation or information that is out of date has the potential of misleading and even harming patients. Consequently, independent agencies such as the Health on the Net Foundation [25] were created to certify the content of medical information on the Internet. Information delivery is based on 2 models: pull and push.

Pull Model
The pull model relies on the patient retrieving the information he or she seeks. Two pathways are provided to this end. The patient can retrieve documents by navigating through the Web site or can retrieve information with a search engine.
Navigation requires a clearly-defined information structure. This is effectively implemented with a hierarchical structure that users can follow to retrieve information of increasing level of detail. Navigation should be facilitated by a clear on-screen indication of the user's location in the information hierarchy.
Search engines allow users to search for documents based on keywords. Search engine technology is capable of cataloguing documents based on several criteria. In its simplest form, documents will be catalogued based on their text. Therefore, a search will yield all the documents containing the word that was searched for. However, a successful implementation of a search engine will categorize documents based on several criteria such as topic, author, date, and relevance. Users can then use these criteria to refine their searches.

Push Model
The push model involves presenting the information to the patient who has opted to receive it. Relevant information could include new research or newly-released drugs for patients who have specified an interest. Interest can be formally expressed by the patient or can be inferred by the system in an effort to personalize the service (see the "Personalization" section of "Evaluation Criteria").
Information delivery in the push model can be implemented in several ways. Patients can be presented with the relevant information upon logging into the system. Alternatively, technologies such as mobile phones and pagers can be used for delivery. A successful Internet-based management system will implement both models of information delivery.

Self-management Plan
The management of any chronic disease must be personalized to the individuals, as they are ultimately responsible for its success. Consequently, an Internet-based diabetes management system must allow patients to tailor the intervention to their specific needs. Patients benefit from a proactive approach to their management (in which they are not told what to do) and gain a valuable insight into the management options that may be available to them [17]. Patient involvement and contribution to disease management has demonstrated improved results and compliance [26].
The comprehensive management of diabetes can be based on several models. It is not the purpose of this paper to discuss these management models but rather their successful implementation as Internet-based diabetes management systems. One such model [17] (multilevel social-ecological model for self-management and support for behavior change) was implemented as a physical-activity intervention study [17]. This model is based on the creation of a personal action plan that is the result of both the patient's and health professional's requirements [27]. The creation of a personal action plan can be expressed as these self-management action steps: assessment and feedback, collaborative goal setting, identification of barriers and supports, individualized problem solving, follow-up support, and construction of a personal action plan. Glasgow and Bull have identified the strengths and limitations of interactive technologies such as the Internet for Self-Management Action Steps [17]. Nonetheless, a successful implementation of an Internet-based diabetes management system should provide the patient with the ability to navigate through each action step towards the creation of a personal action plan or the equivalent (depending on the disease-management model used). Piette et al [28] demonstrated that an Automated Telephone Disease Management (ATDM) system produced positive results with an ethnically-diverse diabetic-patient population. Internet-based diabetes systems can reach different ethnicities by offering their services in multiple languages. In some groups where language may be a barrier to medical care, such systems may provide substantial benefits.

Language and Ethnicity
Inevitably, this opens the discussion of Internet demographics splitting patients between haves and have-nots. This is particularly relevant for type II Diabetes where some minority groups are disproportionately affected and have limited access to the Internet. However, the report from the National Telecommunications and Information Administration indicates a rapid change in Internet demographics that is reflective of the general population of the United States [29].

Communication Between Health Professional and Patient
Most efforts in health care technology focus on assisting the doctor in diagnosing and treating a disease. This approach tends to omit a key component of the health care cycle: the patient. An Internet-based diabetes management system must be a channel of communication between patients and their health care providers. The communication system can follow 3 models: synchronous, asynchronous, and indirect. Synchronous communication allows the patient and health care provider to communicate directly by using teleconferencing or videoconferencing. Traditionally, these services were in the realm of telemedicine [31] where specific technical equipment was installed to allow the communication to happen. However, the advent of multimedia on the Internet does allow for real-time voice-based and image-based communication. Although at its first steps, synchronous communication can be a valuable part of an Internet-based diabetes management system. Equally, the asynchronous communication model is a crucial part of a management system. Simple solutions such as secure text communication between patient and health care provider can be of great benefit in the management of diabetes. A study at the University of Pittsburgh describes a model of asynchronous communication between doctors and patients that reduced some of the differences in communication in terms of expectations, vocabulary used, and other factors [32]. This study was based on a communication system that allowed patients to familiarize themselves with the relevant domain terms at their own pace. The system also allowed physicians to request more information of patients while providing contextual information. This allowed patients to understand the underlying reasons for the questions.
Lastly, the indirect communication model is based on the concept of representation of the health care professional by technology. Such solutions have been implemented using software agents, a form of artificial intelligence that interacts with its environment and reacts to changes. In this case, the agent can interact with the patient and carry out a basic dialogue -and functions as information search and triage [33]. While still experimental, the use of indirect communication in Internet-based diabetes care shows great potential.

Community Creation
Community creation is based on a many-to-many communication channel compared with the one-to-one communication that occurs between health care professional and patient. Community support is a fundamental aspect of self-management of disease. Diabetes patients benefit from discussing topics that concern management of the disease, anxiety as to what the future holds, and interpersonal and social relationships.
The Internet can enable the creation of communities based on the same models of synchronous and asynchronous communication models. One study followed a diabetes chat room for 21 months and found that 79% of all respondents rated participation in the chat as having a positive effect on coping with diabetes [34]. Another study established a chat room for adolescents affected by diabetes and moderated by a diabetologist [35]. The results indicated a decrease in HbA 1c and an improved capacity for self-management. Anonymity undoubtedly favors a greater freedom of expression of individual problems. Community creation and maintenance should be an integral part of any Internet-based management systems. The implementation can be as synchronous chat rooms or as newsgroups where users communicate asynchronously by posting their comments. Further, experts can moderate chat rooms.

Technology
The complex network of human and machine relations involved in managing diabetes via an Internet-based system has strong implications for the design of such a service.

Security
One of the main concerns with any medical informatics solution is security and privacy of the data. The success of any Internet-based diabetes management system is reliant on the user's trust that the user's data is secure, private, and confidential. This is possible with the recent availability of strong cryptographic tools used for 2 main purposes: authentication and encryption [23].

Authentication
Identification of users is a crucial step in gaining access to the system. Users are granted access to data based on their security profile. For example, only the treating physician can modify a specific patient's blood glucose ranges. Therefore, authentication is both the identification of a user (usually with a combination of username and password) and the enforcement of the security profile. Naturally, user identification is required for more-advanced functions like personalization as mentioned earlier.

Encryption
All data transmitted between a patient and the system must be secure. Several encryption algorithms exist, with different strengths and speeds. Generally speaking, most Web servers can establish secure communication links using Netscape's Secure Socket Layer (SSL), which is de facto the Internet standard. Recently, 128-bit encryption has been made available worldwide. Any transmission of patient data should be encrypted at the highest level.

Usability and User Acceptance
Testing usability and user acceptance is a critical part of any computerized system and should be a continuous process during the life of the system. Typically, evaluation instruments have consisted of on-line questionnaires, on-line commenting (e-mail), telephone interviews, video-based testing, and tracking of system usage [36].
Many physicians believe that the key success factor in managing diabetes is simplicity [1]. Consequently, the implementation of an Internet-based diabetes management system should strive towards simplicity for both patient and health care professional. Internet technologies can be a great supplement but if the implementation is not user-friendly, it can become a real barrier [1]. Although the technology has enormous potential, developers should not lose sight of the real purpose of these systems: to collect small amounts of data rapidly and efficiently. Therefore, an Internet-based diabetes management system will only be successful if implemented with a simple user interface used to collect the minimum amount of data from the patient (thus reducing its intrusiveness).

Reliability and Availability
One of the great advantages of the Internet is that it allows users to access systems anytime and from almost anywhere. This results in a need for systems to always be operational, that is, without downtime. Zero downtime (or close to it) requires fault-tolerant systems. Several technical solutions exist both at the software and hardware level. It is outside the scope of this paper to examine all the solutions; however, it is reasonable to expect an Internet-based diabetes management system to not require downtime for maintenance and to have a fault-tolerant hosting environment.

Open Platform
Open technologies are based on nonproprietary standards; therefore, a system can be built using technologies from multiple vendors. This is particularly useful for future expansions or medications to accommodate for increased scalability and functionality requirements. An Internet-based diabetes management system should be based on an open platform, particularly for data exchange. Open standards for data representation such as the eXtensible Markup Language (XML) are being adopted by multiple industries. Consequently, a system built using XML will be able to interface with multiple systems and devices. The same system could deliver its services via multiple devices (Internet, mobile phone, handheld computer, etc.) effectively making the Internet open platform the standard.

Evaluation of 2 Existing Services
To illustrate the use of the evaluation framework, we have applied it to 2 existing Internet-based diabetes management systems: my Diabetes (www.myDiabetes.com) and LifeMasters (www.lifemasters.com).
To produce an overall evaluation, a criterion was considered satisfactory if the majority of the raters evaluated it positively (Yes rating). The results of the evaluations were numerically converted by assigning a value of 1 to all positive (Yes) ratings and a value of 0 to all negative (No) ratings. The results of all the evaluations are compiled in Table 3. The agreement level is reported for each individual criterion. This was calculated by dividing the number of ratings consistent with the overall rating (the majority) by the number of raters. For example, if a criterion was rated satisfactory or unsatisfactory by 4 out of the 5 raters, the criterion has an agreement level of 80% (4/5).
The technology criteria registered the lowest agreement (60%-80%). The different levels of technical expertise of the evaluators may explain this difference. The Personalization criteria also showed lower levels of agreement between evaluators. This is due to the different interpretations of the criteria between evaluators. Personalization remains a difficult dimension to quantify and evaluate. The quality-of-information agreement levels were also low (60%-80%). Both sites displayed the HON code logo and stated that they subscribed to the HONCode principles. However, neither site was HON registered, although -as of December 14, 2001 -LifeMasters was under review process.
The multi-rater kappa for myDiabetes was 0.75 and for LifeMasters was 0.65, indicating a substantial level of agreement as defined by Landis and Koch [37]. There was an important difference between the kappa of MyDiabetes and the kappa of LifeMasters. Further testing is required to clarify the reasons for the difference.

Graphical Representation
We believe that a graphical representation of the evaluation results is particularly useful for comparing 2 systems and for determining in which direction the systems should expand their services. To this purpose, a radar graph with the 5 axes representing the 5 dimensions of Monitoring, Information, Personalization, Communication, and Technology is a useful representation. The value of each axis is normalized by conversion to a percentage of the maximum score. The evaluation of myDiabetes.com and LifeMasters.com is represented in Figure 3.
The results of the evaluation indicate that LifeMasters is a more-complete solution than myDiabetes in all dimensionsexcept Information, where both sites were equivalent. This is primarily due to LifeMaster's inclusion of the health care professional in the disease-management cycle. On the other hand, myDiabetes is uniquely interfaced with the patient and is quite good in providing a communication channel for community creation, however, communication with health care professional is lacking, hence the lower score than LifeMasters.

Discussion
The Internet will undoubtedly change the way we deliver health care services. Chronic disease management, which accounts for 60% of the U.S. medical care costs [38], is a desirable target for the efficiencies of the Internet. Chronic-disease management on the Internet is estimated to have a market potential of US $700 billion [24]. Already we are seeing several Internet-based chronic-disease-management sites arising; however, there is little evidence as to how these solutions are answering the needs of the consumer (the patient).
Consumer health informatics research greatly contributes to the health care sector by attempting to systematize and codify consumer's needs, values, and preferences and by trying to build and evaluate information systems that interact directly with consumers and patients [39]. In this paper, we have attempted to catalogue the critical success factors for an Internet-based diabetes management system based on the available literature and the authors' experience. The result is a first step towards a comprehensive evaluation framework. The framework is based on the recognition that the management of diabetes via the Internet is based on several integrated dimensions, namely, Monitoring, Information, Personalization, Communication, and Technology. A successful diabetes management system should efficiently integrate all dimensions. Therefore, the framework provides a model for evaluation and, more importantly, for strategic growth planning for existing sites. For example, a site that is deficient in the communication dimension may enhance its offerings by adding a synchronous chat room.
This paper reports an initial evaluation of 2 sites. The results indicate a high-level inter-rater agreement as measured by Cohen's multi-rater kappa. However, this is based on a small sample of evaluations (5). Future research should focus on validation of the framework by consistency between larger samples of raters and on correlation with the success of the multiple sites available today. Key metrics for success include the number of enrolled patients; length of time managed; clinical, economic, and quality-of-life outcomes; and patient-satisfaction measures [24].

Conflicts of Interest
None declared.

Introduction
The importance of the Internet for contemporary public health has been acknowledged for some time. People have used the Internet for many years to access health-related information.
Pallen points out that, although health professionals originally assumed that health-related Internet sites would be something used by themselves for research, consultation with colleagues, continuing education, and library work, this concept has been extended and modified [1]. Now the importance of the Internet as a source for health information for the layperson is increasingly acknowledged [2,3].
The Graphics, Visualization & Usability Center at Georgia Institute of Technology estimated that 27% of female Internet users and 15% of male Internet users use the Internet to get medical information on a regular basis [4]. These figures have now mushroomed to 63% of women on-line and 46% of men on-line [5]. The growth rate in lay use of Internet health sites is rapid: a Harris Interactive study estimated that, from April 1999 to September 1999, the number of Internet users in America accessing health information increased from 60 million to 70 million [6]. Given this large-and-growing audience, the quality of medical information on the Internet has become an increasingly-important concern, as expressed in Eysenbach and Diepgen and the associated commentaries [7]. This is particularly true given that approximately half of the Internet users surveyed in the Fox et al [5] study said that they had acted upon information gleaned from the Internet to change their health behavior, including, if they were ill, changing aspects of their treatment and care. Such information may be a matter of life and death [8]. There have been warnings that a lot of the information on the Internet is either harmful or misleading [9].
Studies that have evaluated the information on the Internet have often found it to be incomplete and sometimes dangerous [2,7,10,11]. The concerns of lay users of the Internet reflect the concerns of medical professionals: 86% of Internet users are concerned about the reliability of the health information available on the Internet [5]. Despite these concerns, however, 52% of people who regularly use health sites on the Internet consider the information on those sites to be credible, particularly people with low levels of formal education [5]. In addition, most Internet users gain access to health sites by Internet search rather than recommendation by a professional [5]. It is therefore important to have a solid empirical basis for selecting the criteria for rating medical sites on the Internet, whether it is lay users or medical professionals doing the rating.
Leaving aside the question of whether a reliance on medical opinion will "dismiss the input of non medical readers" [12], we would argue that a greater problem is that some of the studies using medical raters suffer from an overreliance on one medical opinion. For example, no statistics are given about the agreement between medical raters and Sandvik [11] explicitly acknowledges this weakness of his study: "A stronger design would be to include judgements from several experts to allow assessment of judge's reliability." The present study attempts to overcome this weakness by asking more than one medical expert to categorize the information given on a well-used newsgroup dealing with a chronic illness. The illness has a relatively-high prevalence and is one seen regularly in both primary care and more-specialized medical services. It is an illness for which misleading information would be harmful and potentially fatal. The categories used were designed by our experts and reflected the current importance of evidence-based medicine.

Participants
The 5 medical experts who participated were all doctors experienced in the treatment of the chronic illness chosen. They worked together in the same specialist unit and all had at least 5 years experience in treatment of the chosen illness.

Materials
The material to be categorized came from a newsgroup used mainly by nonprofessional medical sufferers of the illness. Overall, there were 61 threads (series of connected messages), selected from a week's posting because they contained medically-related information, to be examined by at least one medical expert; however a random sample of 18 threads was assessed by all 5 experts. These 18 threads form the basis of this report.
Each thread consisted of a start message; usually in the form of a question; and a number of responses. Both the start message and the responses were rated using a coding scheme devised by the medical experts. For start messages, there was a 6-part scheme: A = excellent; B = less good but with some details; C = poor with little detail; D = vague; E = misleading or irrelevant; F = incomprehensible. The responses were also coded according to a 6-part scheme: A = evidence based, excellent; B = accepted wisdom; C = personal opinion; D = misleading, irrelevant; E = false; F = possibly dangerous.

Statistical Analysis
There are 3 main ways (kappa, gamma, and Kendall's W) to analyze the agreement of judges rating the threads from the Internet. Perhaps the most familiar to medical researchers and practitioners is Cohen's kappa. We present the version of kappa described in Siegel & Castellan [13] in which a single kappa statistic reflects the agreements across all 5 judges; this statistic is equivalent to the average of all kappa statistics calculated pair wise. However, this statistic assumes the data is nominal in measurement. The data we have is ordinal (ie, the scale from A to F has a fixed order) and so Cohen's kappa, although familiar and often used, is inappropriate for this data. We include it only because it is so often used for this type of data in other studies.
There is a choice of the most appropriate statistic to analyze such data. One could use a weighted-kappa procedure, but this statistic is controversial because the values of the weights for each level are arbitrary [14]. The gamma statistic [13] is related to the weighted kappa statistic and so is presented instead for comparison with the unweighted kappa values. This statistic has been computed for all pair-wise combinations of experts, and the Bonferroni adjustment for multiple comparisons has been applied to the significance levels. Perhaps a more-powerful statistic is Kendall's W, which is similar to the unweighted kappa value in that one statistic represents the overall agreement between the 5 experts. Kendall's W is linearly related to the average rank correlations between ratings assigned by the judges to the threads [13], so it ranges from 0 to 1; hence, it is relatively easy to interpret and can be converted to a c 2 statistic to test for significance. It also provides us with a relatively-powerful measure of average agreement among our experts, unlike the average of pair-wise rank correlations.

Start Messages
For the start messages, the kappa statistic was 0.024; this value was not significant ( z= 0.45, P> .05). It is generally accepted in medical circles that a kappa of over 0.75 represents excellent agreements and between 0.4 and 0.74 represents fair-to-good reliability [15]. However, distribution and base rate can affect the kappa statistic [16]. In this case, there is poor agreement between the experts using kappa as a measurement of agreement. However, some power is lost treating ordinal data as nominal, although a similar result occurs if the gamma statistic is used. Only 1 of the 10 pair-wise gamma statistics was significant, and this was negative (Table 1), showing significant dis agreement between those 2 experts (gamma = -0.659, P< .01)! The other gamma statistics were all positive and ranged from 0 to 0.475. There is no agreement between raters using this measure. The value of Kendall's W for the ratings of start messages, however, tells a different story. It reflects a modest, but highly-significant, amount of agreement between judges (W = 0.266, c 2 (4) = 19.2, P< .001). We suspect that this statistic is due mainly to the single strongly-negative relationship between the ratings of 2 experts. If the agreements of the other experts were weak and randomly distributed, then a single value would dominate the W statistic and so produce a significant result. As W cannot be negative (more than 2 judges cannot all disagree with each other), the result will be a statistic that is misleading. It is therefore important that researchers consider both overall and pair-wise statistics when assessing inter-rater reliability.

Replies
Overall, the results for the agreement of rating of responses to these start messages were somewhat better. The kappa statistic for these ratings was 0.243 and was significant ( z= 5.49, P< .001). Individual agreement between raters, as assessed by the gamma statistic, ranged from a low of 0.311 to a high of 0.730 ( Table 2). The majority of gamma values were significant; however, 3 failed to reach significance (maximum nonsignificant value was 0.431). There is general agreement, but it is not as high as one might hope. The W statistic, however, was extremely low and only just significant (W = 0.037, c 2 (4) = 10.4, P< .05). The overall pattern of agreement is not clear, even though individual pairs of experts appear to agree with each other. This strongly suggests that there are a number of different pairings within our expert panel that contradict each other.  A more-imaginative approach to the problem of assessing reliability and validity for ratings of this type was suggested by an anonymous reviewer. The first suggestion was to treat the data as interval level rather than ordered categorical, which would allow greater flexibility in analysis. Furthermore, this approach is relatively common in the social sciences and more particularly in psychometric research. The second suggestion was that a simple and effective way of presenting the data would be to give the Spearman rank order correlation for raters. We present these for the ratings of the replies in Table 3. The third suggestion was that we treat the data like psychometric test data and take each rating as similar to an item on a test instrument. We can then calculate Cronbach's alpha and use this as a measure of reliability. Further we can then use the Spearman-Brown prophecy formula to predict how the reliability of the ratings would increase if we had different numbers of raters. This formula is used in psychometric research to estimate the increases in reliability expected if the number of items is increased. In this case, the Cronbach's alpha for the 5 doctor's ratings of the replies was 0.78. This reliability, however, would be increased to 0.876 by doubling the number of raters to 10 and to 0.914 if we increase the number of raters to 15. If we only have 2 raters, the reliability is reduced to a very-worrying 0.59.

Increasing Reliability
For medical evidence of this type, we would want to have information that is as reliable as possible; 5 doctors as in our example may be too few. The reliability can be increased by increasing the number of items to be rated as well as by increasing the number of raters. The Spearman-Brown formula is limited to estimating differences in one dimension -in this case, the number of raters. Brown [17] has suggested the use of generalizability theory that can provide answers in more than one dimension; that is, what would happen to reliability if we increase the number of raters and the number of items rated?

Discussion
Overall, the results suggest that there is a fair degree of disagreement between medical experts when they are asked to rate medically-related postings from the Internet. In this case, the experts were using a system that was devised by them, so any possibility of this result being forced on them by a poor or deliberately-misleading category system is negated. We acknowledge that the start-message coding is less important as it deals with questions rather than answers, includes a small sample, and its coding seems by its nature to be less precise, which may explain the very-low levels of agreement. The rating of responses, however, seems to us to use sensible and relatively-transparent categories. The agreement between response ratings is still relatively poor, and certainly not consistent across all the experts.
One particularly interesting finding was the divergence of the different statistics used to measure agreement in the same ratings. It seems that the choice of a statistic to measure the agreement of judges in this sort of research could be problematic. Consideration of the power of a statistic and the use of pair-wise versus overall statistics are the two main issues. In particular, we have shown that it is possible to achieve a reasonably-high level of agreement with an overall test when individual pair-wise statistics show no agreement or significant disagreement (as was the case for start messages). We have also shown that overall statistics can conflict with pair-wise statistics when there are subgroups within the raters who agree with each other, but disagree with the other subgroups. This was the case with the replies: the overall level of agreement was very low, but individual pair-wise statistics showed high agreement between pairs of raters. The selection of a homogeneous group of experts (such as ours) did not seem to eliminate this problem.
The anonymous reviewer's suggestion for adopting psychometric techniques to look at the reliability of the raters is interesting, and we believe could be a valuable procedure for the future. Both factor analysis and latent structure analysis [18] could also be usefully employed with this sort of data but would require larger samples than we have here.
These results call into question the numerous studies that have claimed to show that the information on the Internet is of poor quality, and suggest that future studies should employ more than one rater. That one expert fails to agree with the Internet is perhaps less important than that several experts disagree with each other. It is possible that training or other resources might increase agreement between experts, and future research could consider this. Any measure producing a greater agreement between raters of Internet sites could have great benefits to medical and nonmedical users of the Internet alike.

Introduction
It is well recognized that cognitive behavior therapy (CBT) is an effective treatment for depression when delivered face-to-face, via self-help books (bibliotherapy), and through computer administration [1,2,3]. CBT programs have also been shown to be effective in preventing depression [4,5,6]. However, the public health impact of these treatments and programs has been limited by cost and the lack of trained practitioners and programs.
MoodGYM is a free Internet-based CBT intervention designed to treat and prevent depression in young people with access to the Internet (for screenshots see PowerPoint Multimedia Appendix). Where face-to-face treatment or prevention using CBT is unavailable, the Internet provides an excellent way of disseminating preventive CBT programs. The information is widely accessible, can be updated, is available 24 hours a day, and is self-paced. The interactive and multimedia possibilities afforded by standard Web browsers offer the potential to engage the target population in ways that are not possible using conventional delivery methods. The Internet is able to support software applications that can be tailored to individual needs, and such customized interventions are recognized as important ingredients in successful prevention work [7].
To date, mental health Web sites have been used to provide information [8], to survey mental health [9], to assist in the delivery of anxiety treatment [10], and to provide support [11]. However, they have not been widely used to deliver specific mental health prevention interventions to all Internet users.
We describe the usage of the MoodGYM site and the characteristics and outcomes of the first visitors and registrants to the site over almost a 6-month period. In this paper, we report on 3 aspects: 1. site usage information, including the number of users who register on the site, the number of sessions recorded, the dates and times when modules were completed, and average time on the site; 2. characteristics of registrants including gender, age, and scores on the Goldberg Anxiety and Depression Scales [12]; 3. change in anxiety and depression scores experienced by registrants as they progress through the site (because the assessments are repeated, we were able to examine whether psychological distress decreases as a function of module use).

Participants
Data from all visitors were recorded in the almost-6-month period between the release of the site on April 1, 2001 to the download of data on September 27, 2001. Visitors were individuals who accessed at least one page of the site. Registrants were individuals who entered details about themselves on the site, gave permission for their data to be used in research, and were allocated an individual database record. Registration was required before participants were able to access the site modules. There were 2909 registrants. Of these, 1503 completed one or more online assessments. Also, 71 university students enrolled in an Abnormal Psychology course who visited the site for educational training were included and examined separately. The students gave permission for their server data to be used for research purposes although they were not explicitly aware that their data would be compared directly with data of general public users.

Site Description
The site consists of a set of 5 cognitive behavioral training modules, a personal workbook (containing 29 exercises and assessments) that records and updates each user's responses, an interactive game, and a feedback evaluation form. Module 1 introduces the site "characters" (who model patterns of dysfunctional thinking) and demonstrates the way in which mood is influenced by thinking, using animated diagrams and interactive exercises. Module 2 describes types of dysfunctional thinking, the methods to overcome them, and provides self-assessment of "warpy" (dysfunctional) thoughts. Module 3 provides behavioral methods to overcome dysfunctional thinking, and includes sections on assertiveness and self-esteem training. Module 4 assesses life-event stress, pleasant events, and activities, and provides 3 downloadable relaxation tapes. Module 5 covers simple problem solving and typical responses to relationship breakup. Workbook exercises are integrated seamlessly into each of the modules.
Each module was designed to take from 30 minutes to 45 minutes to complete, but users can opt to skip sections. Module 1 has approximately 30 "pages" but many of these contain browser-supported interactive features (creating additional pages) and supplementary pop-up windows. Module 3 has over 60 pages, but users are directed to specific sections depending on their scores on earlier tests and thus may not access all pages.
Online assessments include the Goldberg Depression and Anxiety Scales [12]. Each of the Goldberg scales comprises 9 items. These scales are ideal for use on the Internet because they are brief, well accepted, of satisfactory reliability and validity, have been previously used in epidemiological survey research using a handheld computer interface [13], and their use on our site does not breach copyright. The scales are administered prior to each module.
Although users were encouraged to proceed through the assessments and modules in order, they were free to move about within the site at will. Thus, some registrants started with later modules and did not necessarily work through them in order. Data from each registrant was recorded in an SQL (Structured Query Language) database on a stand-alone server.

Web-data Retrieval
Server Web statistics were processed using LiveStats [14] and a computer program tailor written for the current analysis.

Site Usage Statistics
A total of 17646 sessions were recorded from April 1, 2001, through September 27, 2001. Sessions provide an indication of the number of visitors to the site. Since visitors can access the site more than once, the number of sessions is a good but imperfect indicator of the number of visitors. Across the 181 days, the site recorded 817284 hits and 297046 page views. A hit is an initial request to a computer to deliver a file and is a rough indicator of the amount of Web traffic on a site. On average, each session lasted 9.47 minutes. However, many visitors spent less than 1 minute on the site during which time they viewed only 1 or 2 pages. Table 1 shows the breakdown of sessions as a function of the number of pages viewed and the length of time on the site. Approximately 20% of sessions lasted 16 minutes or more, indicating that individuals were interacting with the material for extended periods. Session statistics include return visits so these summary data are likely to underestimate individual exposure time. Web analysis suggests that individuals spend between 0.6 to 6.7 minutes per site on average [15]. The number of sessions each day across the 181 days varied from 34 to 359. For those sessions where the visitor's location could be identified, the most common geographical location of the visitor was the US (34.9%) followed by Australia (33.2%), Asia (6.9%) and Europe (1.3%). Apart from some limited media publicity in Australia in May and July, there was no direct marketing of the site.
The mean and median ages of users who supplied age data was 35.5 (SD = 13.0, range = 10 to 80), and 34 respectively. To enable gender-specific information to be returned to the user, gender was a required field. Sixty percent of users were female.

Anxiety and Depression Scores at Module 1
Of the 2909 people who registered, 1503 completed at least 1, and 465 at least 2 of the depression assessments. Some registrants chose to start with later assessments, so only 1145 people completed the assessment associated with Module 1. A total of 1049 completed at least 1 and 223 at least 2 of the anxiety assessments although only 717 completed the anxiety scale for Module 1.
Scores for the Goldberg Depression Scale and Goldberg Anxiety Scale at Module 1 as a function of gender are shown in Table  2. Also shown are the scores achieved by a representative population sample of 2354 young adults aged 20-24 from the Canberra region [13]. This sample completed the scales anonymously on hand-held computers, but in the presence of interviewers, as part of a large survey of health and well-being.  Analyses of variance indicate that both depression and anxiety scores are significantly higher for females than for males for the Web-based sample (P< .0001 for depression; P= .006 for anxiety), that there is no significant difference between the population sample and the sample of university students (P= .897 for depression; P= .600 for anxiety) but that the Web-based sample has significantly higher scores than either the population sample or the university students (P< .0001 for both anxiety and depression; pairwise comparisons using Bonferroni correction) where the critical value ( a ) is divided by the number of comparisons (in STATA -7 software [16]). These findings suggest that visitors to the site have much higher levels of anxiety and depression than are present in the Canberra community. The possibility that the higher scores in the registrants result from the use of Web-based questionnaire methods rather than computer administration is unlikely, particularly given that the University student's scores did not differ from the scores of the representative sample.

First Analysis
Our first analysis assumed that users progressed through the modules in order, but that not all modules were necessarily completed. The analysis included all individuals who had completed at least 2 modules. To predict the depression and anxiety scores, we fitted regression models for repeated-measure data, with random effects for individuals, to the data using the xtreg procedure in STATA-7 software [16]. The xtreg procedure estimates linear regression in panel data where there are complex error structures. It is useful where data are correlated, as in repeated-measures designs. Predictors were gender and module. We made separate analyses for the Web-based population and the university students, because of complex significant interaction terms.
For the Web-based population, both depression and anxiety scores decreased significantly as individuals progressed through the modules. Depression scores decreased significantly with module, (Beta = -0.67; 95% CI = -0.80 to -0.55; P< .0001), indicating that depression scores fall on average nearly 3 points (2.7; 95% CI = 2.2 to 4.2) if all 5 modules are completed. Females had significantly higher depression scores than males (Beta = 0.62, 95% CI 0.13,1.11, P= .014). Anxiety scores decreased significantly with module, (Beta = -0.82; 95% CI = -1.06 to -0.58; P< .0001), indicating a decrease on average of more than 3 points (3.3; 95% CI = 2.3 to 4.2) over the 5 modules. There was no evidence of nonlinearity and there were no significant differences in anxiety scores for males and females (Beta = 0.53, 95% CI=-0.19 to 1.25; P= .150). Scores for the group of 71 university students who completed the modules as part of their abnormal psychology course were lower than for the Web-based sample and there was no significant change across the modules. Figure 1 plots the actual trajectories (paths) of those individuals who completed assessments in depression or anxiety for at least 2 of the modules. Figure 1 also shows, in heavier lines, the predicted trajectories for females (upper line) and males (lower line) based on the statistical modeling described above.

Second Analysis
Our second analysis was for individuals with adequate data on the dates and times when modules were completed. The change in scores between the first occasion of measurement and the last occasion, independently of which modules were completed, were compared using repeated-measures ANOVA (Analysis of Variance) from SPSS-10 [17]. Independent variables were gender and time between first and last assessment. Time between assessments was recoded into 3 categories: completed on the same day (n = 869 for depression, n = 644 for anxiety), last assessment completed within one week of the first (n = 31 for depression, n = 18 for anxiety) and last assessment completed at least one week after the first (n = 78 for depression and n = 47 for anxiety). Analyses were made separately for the Web-based sample and the university students. For the depression scores of the Web-based sample, there was a significant decrease over time (P< .0005) which was more marked for those who spent longer than a day between assessments (P< .0005). Similarly, anxiety scores decreased significantly (P< .0005), and to a large extent for those who had spent more than one day between assessments (P< .0005). Estimated marginal means are shown in Table 3. Due to the small numbers in the sample of university students (not shown), time spent between assessments was dichotomized to 0 or more days (the latter combining the categories of within a week, and over a week or more). For this sample, there was no significant change in depression (P= .852) or anxiety (P= .752) scores for those who spent more than a day between assessments compared to those who spent less than a day.

Discussion
Visitors who register on the MoodGYM Web site have high levels of anxiety and depression symptoms relative to population samples. For community registrants who choose to go through the training program, there is evidence that anxiety and depression symptoms resolve with progress across the modules. However, university students who start the intervention with low symptom levels show no change over the period. To evaluate the plausibility of the intervention and its "dose" effect, we examined change in scores between the first occasion of measurement and the last, independently of which modules were completed. Three periods were observed: less than one day between completing two assessments, last assessment within one week, and last assessment completed at least one week after the first. The findings from these analyses suggest that greater change in symptoms is associated with longer exposure to the site, as indexed by longer periods between completed assessments. However, given the small change that occurred over an interval of less than one day, the data are consistent with recent reports of the effectiveness of one-session cognitive behavior therapy interventions [18,19].
MoodGYM registrants decline on average 3 points over the 5 modules if all modules are completed. More specifically, Table  3 illustrates that users have average starting scores of between 6.33 and 4.42, and average post-intervention scores of between 3.08 and 5.24. The significance of these changes can be determined by both examining the distribution of anxiety and depression scores in appropriate population samples [13] and the highest scores of individuals who are likely to be clinical cases. Given the prevalence of clinical depression is about 7% in Australia [20], those scoring at a level to reach the top 10% range might be regarded as meeting or nearly meeting clinical criteria. For young people (aged 20-24 years) a drop from a score of 6 to 3, indicates a shift from a percentile rank of 79.4 to that of 38.1. For a person aged 40-44, the drop corresponds to a drop of 90.2 to a rank of 63.8. These data suggest substantial shifts down from high (but not clinical) levels for the younger users, and shifts from clinical levels in older adults.
Due to the limitations of the present design, we cannot conclude that the training program was responsible for the changes in mental health symptoms. Randomized controlled trials are necessary to evaluate MoodGYM and other psychological interventions on the Internet relative to both waitlist control conditions and standard treatments. Because such methodology was not employed, it is difficult to know whether the changes were due to depressive symptoms resolving over time [21]. Regression to the mean may also explain the findings. Selection (or self-selection) on the basis of high symptoms at a particular time will result in reversion to more normal levels on a second testing. Moreover, individuals with fewer mental health problems may be differentially inclined to fill in questionnaires in later modules in the site. Nevertheless, the findings from the study demonstrate the feasibility and highlight the potential public health implications of Internet use in mental health. From a public health perspective, the use of the Web in treatment, prevention, and promotion is likely to increase enormously given its potential for providing services for those who do not seek or cannot obtain help from health professionals for reasons of cost, lack of accessibility, or the perceived stigma associated with seeking professional help.
The use of community-collected Web data raises interesting methodological, epidemiological, and statistical issues. It is difficult to identify the population to which samples refer when there is no clear sampling frame or method of sampling and where there is no direct subject contact. Appropriate methods to deal with the vast amount of incomplete and missing data are needed. If we can assume data are missing at random (MAR) [22] if not missing completely at random (MCAR), we need to collect data to describe the incomplete and missing data that can be incorporated in appropriate methods of analysis (eg, Full Information Maximum Likelihood Methods) [22]. Finally, the suitability of intention to treat analyses in the context of large-scale community Web interventions (where adherence to the training program may be neither desirable nor achievable), requires careful consideration.
To date, mental health Web sites have been found to be useful for screening the public for depression using the Centers for Epidemiological Studies Depression (CES-D) scale [9]. There is some evidence that Web sites may be a useful adjunct to treatment in clinical settings [10,23]. However, to our knowledge there has been no previous published evidence concerning the impact of a Web-based therapy intervention on the mental health of community users.
MoodGYM illustrates the means by which the Internet might be harnessed to prevent depression, and early results from the site point to the public health potential of mental health Web sites. At the time of writing, MoodGYM was ranked 15th of about 1790 sites in Google's "Mood" subcategory, indicating that it is popular and linked to other "high quality sites" [24]. It may be of practical interest to general practitioners in all countries since it provides a free service that might, like cognitive behavioral bibliotherapy, be used as an adjunct to standard consultation.

Introduction
While predictions have been made [1], little is known about how patient use of the Internet currently affects frontline clinicians. High quality information on the Internet is assumed to be vital for patients. Poor quality information presents obvious risks, including self-mistreatment and misdiagnosis (which can lead in turn to mistreatment or unnecessary worry in the patient), but the misunderstanding or misinterpretation of high quality information is also a potential problem. Even high quality information used well can challenge clinicians, leading to increased patient demand for their time and services [2]. A common, disheartening scenario is that of the patient entering the doctor's consulting room laden with Internet printouts.
However, increased information can improve the patient's understanding of the patient's condition, self-care, and state of mind [3], or even educate the doctor [2,4]. The right information can avoid unnecessary consultations, yet ensure prompt help-seeking when needed -the rationale behind NHS Direct Online [5] in the UK. The Internet can also act as a medium for social support [6]. It is important to recognize that patients may not want the same kind of information as clinicians. For example, patients may wish to read other's autopathographies [7], narratives about another's experience of illness. Such texts may fare badly under the usual evidence-based criteria, but may provide the personal experience and reassurance desired.
The Internet is not only about exchanging information: it can also provide access to services, such as buying drugs and other health products. It remains unclear how harmful or beneficial such services may be [8]. The activity is currently largely unregulated [9] and the American Medical Association has warned of the dangers of online prescribing [10], which has become a popular route for obtaining sildenafil and, since the events of late 2001 in the US, the anthrax antibiotic, ciprofloxacin [11-14. To explore the range of benefits and problems that Internet use by patients produces for themselves and for health services, we conducted a survey through an Internet service provider exclusively for UK doctors. Although not a representative sample, as early adopters, such users are likely to be more familiar with the Internet themselves and, thus, more aware of their patients' Internet use. While this cannot be a definitive survey, it explores the range of benefits and problems seen with patient Internet use in order to guide future research.
We did not ask patients about their experiences, but only their doctors. By surveying doctors, we could concentrate on Internet use that has a palpable effect on the patient's health and for the health care system. However, we need to bear in mind that some patient Internet use will be obscure to the clinician. Moreover, respondents' views of patients' experiences will be filtered through their own perceptions. We suspect that doctors' responses to questions about benefits or harms from their patients accessing Internet health information will vary according to their personal attitudes to the Internet and their general willingness to share information with patients. We therefore included questions to explore these suspicions, implemented as questions the trustworthiness of Internet financial advice and views on patient leaflets.

Methods
An anonymous questionnaire was presented via Medix [15], a free Internet service provider and Web portal available exclusively to UK General-Medical-Council -registered practitioners. At the time of the survey, Medix had about 9100 members, approximately 4% of GMC (General Medical Council) registrants. Medix is a commercial venture and carries out regular profit and not-for-profit survey research among members. Financial incentives are offered for responding to questionnaires but not for responding to specific questionnaires. Awards are given to Medix members using an algorithm that takes into account their having done questionnaires during a particular time period.
Two versions of the questionnaire were presented to any Medix member registered as practicing full-or part-time (based on information given at first registration). One version of the questionnaire (Appendix 1) asked about possible benefits of the Internet, the other (Appendix 2) about possible harms, with participants randomly assigned to one version by proprietary software. This was done to avoid framing effects (questions about negative effects biasing answers to later questions about positive effects, or vice versa) and to keep the questionnaire short. Background questions were included on both versions, as was an identical overall question about patients' experiences of the Internet. Some questions have not been analyzed in this paper. Respondents were not required to complete any fields on the questionnaires beyond their GMC number and password. Each version was presented to 400 doctors between September 27 and 3 October 3 2001 inclusive.
When Medix members log on to visit the Web site [15], they must give their GMC number and self-assigned password. Proprietary software checks this information and a list of available questionnaires. If the demographics of the member are suitable for an available questionnaire and the member has not already done or refused the questionnaire (either questionnaire in this case), the questionnaire is offered. The member can defer doing the questionnaire, refuse to do it, or do it. If the questionnaire is refused, the member is never asked about that questionnaire again. Responses are stored on a central database and proprietary software ensures, based on the GMC number, that multiple responses are not possible. All responses, rejections, and deferrals are date stamped and time stamped by the server on receipt.
Data were analyzed in SPSS for Windows 10.0.0 (SPSS Inc.). Confidence intervals for medians were calculated in Stata 5.0 (Stata Corporation) by bootstrapping. This involved calculating 999 simulated (bootstrap) samples from the empirical distribution function (see [16]).

Quantitative Results
The questionnaire was answered by 748 doctors (374 for each version), a 94% response rate. Fifteen doctors said they did not see patients and are excluded from further analysis (10 doctors from the positively-framed questionnaire and 5 from the other questionnaire). On the key question of "Overall, how would you describe your patients' experiences with Internet health material?", a Mann-Whitney test showed no significant difference between respondents answering the positively-and negatively-framed versions of the questionnaire (U = 63815, P= .7, n = 719). Thus, responses to identical questions on both versions were combined. Gender and year of qualification of respondents were checked and found to be similar to the general Medix membership. Compared to all GMC registrants, Medix has a lower proportion of female members (who make up 30% of GMC registrants, where gender is known) and a higher proportion of members who qualified between 1970 and 1999. Medix members match (UK resident) GMC registrants on proportions split by the first letter of their postcode.
Asked on the same scale about the general quality of financial information on the Internet, many more (272) responded don't know. For those who made a judgement, 36% rated financial information as unreliable versus 32% rating health information as unreliable. On a Wilcoxon test, respondents were significantly more trusting of health information than of financial information ( z= 2.97, P= .003, n = 431). We also asked for respondents' judgement of the value of patient-information leaflets, such as those from Cancer BACUP [17]. Only 32 answered not sure. Of those who made a judgement, 90% rated them as very useful or sometimes useful rather than neutral, sometimes harmful or often harmful. The rating of Internet-health-information quality was significantly correlated with both that for Internet financial information quality ( r S = 0.16, P< .001) and the value of health information leaflets ( r S = 0.11, P= .004). The ratings of Internet financial-information and health-information leaflets were not significantly correlated ( r S = 0.02, P= .6).
Asked whether patients had experienced problems or benefits from using the Internet, many doctors answered not sure. However, among those who responded, there were many more reports of patients experiencing benefits than problems (Table  2). When prompted with specific examples, more respondents selected actual problems and benefits than on the earlier question (Table 3). Of the respondents: 184 (50%) did not report any problems for their patients and 108 (29%) reported 2 or more problems; 97 (27%) did not report any benefits for their patients and 186 (51%) reported 2 or more benefits. The problems and benefits were matched to allow comparison. Overall, benefits outweigh problems, although different aspects emerge on each list. The Internet was seen as being valuable for informing, advising, and providing support for patients about their condition. However, becoming misinformed about one's condition was also the most-selected problem. was not especially large ( r S = 0.30, P< .001, n = 718). These data did not significantly vary by region (Kruskal-Wallis chi-squared(10) = 7.7, P= .7) or by specialty (Kruskal-Wallis chi-squared(7) = 13.0, P= .07).
Asked about problems for themselves and for the health service (Table 5), 47 (13%) did not report any problems for themselves and the health service and 181 (49%) reported 2 or more problems. 74 (20%) did not report any benefits and 113 (21%) reported 2 or more benefits.

Qualitative Results
Respondents could give free-text responses under the Other heading for the questions on specific problems or benefits. They were also able to comment on the questionnaire as a whole. Certain themes emerged. Respondents recognized the value of the Internet in providing information, which could lead to more productive consultations. However, these also tended to be longer, a luxury not always available. Problems were often not with the information per se, but for the patient (and the clinician) to be able to sift through and evaluate the information.
Particular problems raised were patients' desire for new, generally-unavailable treatments: a cult of the new, engendered by our technophile society? Many other problems focused on alternative therapies. Respondents commented about how patients can put too much faith in the Internet and that this can undermine faith in the doctor, although it could also back up the doctor and improve confidence, a result seen in other research [18].
The Internet has no geographical boundaries, but it does have linguistic ones and US sites dominate the English-speaking Internet. UK patients, unused to the nature of the US health care system, may be especially vulnerable to the direct advertising of health care services. Concern was expressed in our survey that, unlike US patients, UK patients may be less likely to bear in mind commercial biases in information presented. Other problems concerning the unsuitability of advice written from within the US context were also reported.
Two particular diseases were mentioned often in connection with problems: multiple sclerosis and chronic fatigue syndrome. It is not surprising that chronic, debilitating diseases with limited treatment options, often affecting a young population, should be highlighted. The Internet's value when dealing with rare diseases was also highlighted. The ability of the Internet to bring together, from all around the world, patients with rare diseases and experts on rare diseases is significant [4. In terms of serious health problems from using the Internet, 3 actual deaths were described: an accidental overdose of Viagra ordered over the Internet, and 2 delayed presentations of cancers after the patients had tried remedies found on the Internet. A fourth comment was ambiguous about whether a fatality occurred from a purposeful overdose performed based on information on how to do it from the Internet, a concern raised previously [19][20].

Discussion
Overall, our survey paints a fairly-rosy picture of patient Internet use, although it is notable that respondents are only aware of a surprisingly-small proportion of their patients using the Internet for health material. Many more benefits than problems for patients were reported. Information, advice, and social support were frequently-identified benefits of the Internet for the patient, although becoming misinformed was also the most commonly-reported problem for patients. Reports of problems and benefits for the doctor and the health service were more mixed. Confirming past research [2], over half our doctors reported longer consultations as a problem for the health service, while nearly half named unnecessary investigations. Improved coping and self-care were identified as the main benefits to the health service.
Debate rages about the frequency of adverse effects from Internet use [19,21,22]. Five of our respondents reported cases of serious injury, with comments describing 3 or possibly 4 deaths resulting from Internet use. With no time frame placed on the question, this represents the experience over many years of several hundred doctors, so we feel it represents a quite-low rate of severe events.
A survey of primary care staff in Glasgow [2 found that those under 40 were more likely to refer to the Internet for drug information. In this study, we found that more-recently qualified doctors considered health information on the Internet more reliable. It is not surprising that a younger generation of clinical staff is more comfortable using the Internet. Many respondents pointed out that their clientele were socially deprived and without net access. We must not overlook that the Internet may also exacerbate existing socioeconomic inequalities of health and that it may be less relevant to some groups [23].
Clearly, both benefits and problems exist with patients' use of the Internet. It is reassuring that these doctors see more benefits for patients, but that is not a reason to be complacent about the problems. Poor-quality information matters less if patients can effectively judge it so. High-quality information is less useful if patients are overwhelmed with its volume. The relationship between the quality of information on the Internet and patient experiences is not straightforward. There is plenty of scope for more detailed research in this area.
Many respondents felt unable to answer some of the questions. Of 732 respondents, 82 said they were unsure how many of their patients had been accessing Internet health information, while 89 said they did not know what the quality of health information on the Internet is like. While current research may help with the latter, with the former we note that patient Internet use can be obscure.