An annotated dataset of tongue images supporting geriatric disease diagnosis

Hospitalized geriatric patients are a highly heterogeneous group often with variable diseases and conditions. Physicians, and geriatricians especially, are devoted to seeking non-invasive testing tools to support a timely, accurate diagnosis. Chinese tongue diagnosis, mainly based on the color and texture of the tongue, offers a unique solution. To develop a non-invasive assessment tool using machine learning in supporting a timely, accurate diagnosis in the elderly, we created an annotated dataset of 15% of 688 (=100) tongue images collected from hospitalized geriatric patients in a tertiary hospital in Shanghai, China. Images were captured via a light-field camera using CIELAB color space (to simulate human visual perception) and then were manually labeled by a panel of subject matter experts after chart reviewing patients’ clinical information documented in the hospital's information system. We expect that the dataset can assist in implementing a systematic means of conducting Chinese tongue diagnosis, predicting geriatric syndromes using tongue appearance, and even developing an mHealth application to provide individualized health suggestions for the elderly.


a b s t r a c t
Hospitalized geriatric patients are a highly heterogeneous group often with variable diseases and conditions. Physicians, and geriatricians especially, are devoted to seeking non-invasive testing tools to support a timely, accurate diagnosis. Chinese tongue diagnosis, mainly based on the color and texture of the tongue, offers a unique solution. To develop a non-invasive assessment tool using machine learning in supporting a timely, accurate diagnosis in the elderly, we created an annotated dataset of 15% of 688 ( = 100) tongue images collected from hospitalized geriatric patients in a tertiary hospital in Shanghai, China. Images were captured via a light-field camera using CIELAB color space (to simulate human visual perception) and then were manually labeled by a panel of subject matter experts after chart reviewing patients' clinical information documented in the hospital's information system. We expect that the dataset can assist in implementing a systematic means of conducting Chinese tongue diagnosis, predicting geriatric syndromes using tongue appearance, and even developing an mHealth application to provide individualized health suggestions for the elderly.
© 2020 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license.

Specifications Table
( continued on next page ) and translated into English. The treatment plan corresponding to the admission diagnosis was reviewed and annotated by the remaining two physicians. A total of 12 items must be merged into an annotated document, including various indices related to tongue diagnosis, physical or mental factors, clinicians' observations, and more. To mitigate this, we used a previously designed algorithm to generate templates automatically. Under the K -means paradigm, our previously designed algorithm (1) embedded each annotated document into a vector representation for the first 200 patients, (2) partitioned those vectors into several (e.g., K = 10) clusters, and (3)  Value of the Data • The data is extensible, comparable, and compatible. Data collection processes are standardized to acquire data by considering the requirements and expectations of not only patients but also various researchers. Specifically, patients desire non-invasive, simple, and effective diagnostic tools. Clinicians are curious and sometimes want to collect data that doesn't exist in any pre-existing table of the database. Data analysts are interested in grouping data into categories that might not exactly fit the data. The dataset pursues at least three purposes. First, it covers almost all possible indicators of tongue diagnosis in Chinese and Western medicine and adds the content of face consultation additionally. Second, it aims to adopt the epidemiological method of investigation by (1) limiting the target population to Asia's elderly population aged 65 and over, and (2) scheduling the collection time as the first day of hospitalization. Thirdly, the data can be easily linked to data from different systems, such as CT (computerized tomography) scans or MRIs (magnetic resonance imaging) and clinical laboratory indicators, relying on more than 20 years of previous HIS (hospital information system) experience. • The data is labeled by clinicians with rich clinical experience. A total of 16 physicians in the department of geriatrics participated in manually labeling the data with the admission diagnosis. Each patient's diagnosis is determined through a panel of subject matter experts. The data will be updated if the patient is readmitted to the hospital. The dataset meets the requirements for use as a training set and is suitable for artificial intelligence and machine learning. Some preliminary results are able to correct false medical information or misleading claims concerning tongue and face consultation on the Internet and social media. • This continuously growing dataset (up to 688 patients) is new and original, and the data has not been published elsewhere.

Purpose of collection
Geriatric syndromes may be complicated and heterogeneous [1] , and geriatric patients with multiple diagnoses are prone to treatment complications.
A unique, non-invasive approach to monitoring health [2] for millennia, tongue diagnosis purports that the tongue's color and texture are outer manifestations of the status of the internal organs [3] and provide insights into patient status in conditions like inflammation, infection, and endocrine disorders. Recently, tongue diagnosis has seen gradual acceptance in modern Western medicine, with the term "geographic tongue [4] " used to describe tongue discolorations or cracks accompanying illness. One case study of multiple systemic disorders published in the New England Journal of Medicine describes a patient's "smooth, shiny tongue [5] ." Applying machine learning to tongue images might provide useful diagnostic tools. Existing tongue appearance data is inadequate in both quality and quantity; therefore, we manually created an annotated tongue diagnosis dataset to support future work.

Sample collection Data acquisition and image annotation process
We ensured that the data covers as many indicators as possible, including the content of face consultation, in both Chinese and Western medicine. We used a patented light-field camera (CN201520303463.5) called the intelligent mirror using Commission on Illumination (CIE) L * a * b * color space (CIELAB) [6] . Patients' health conditions may pose challenges during data collection. For example, some patients with cerebral infarction may have difficulty sticking out their tongue.    Table 3 indicates other items that were manually evaluated via expert judgment, including tongue shape (i.e., thick index), size (i.e., macroglossia index), the judgment of tooth marks or fissured tongue, and the degree of smoothness or shininess. We additionally documented the condition of each patient when taking photos in a text file located in the same directory as the face and tongue images, including the patient's sitting height and the distance between the patient and intelligent mirror. Table 4 lists all the indicators used in the manually annotated documents.

Experimental design, materials and methods
Manual annotation is a massive workload for physicians. A total of 12 items need to be merged into an annotated document, including various indexes related to tongue diagnosis, physical or mental factors, clinicians' observations, and so forth. To help mitigate the workload associated with the process of manual annotation, we used a previously designed algorithm to generate templates automatically [7] . Under the K -means paradigm, our algorithm (1) embedded each annotated document into a 64-bit vector representation for the first 100 patients, (2) partitioned those vectors into several (e.g., K = 10) clusters via the Hamming distance, and then (3) designated each cluster representative as a prototype template, which is a vector of real an-  Table 4 ) clinical observations Opinion for laboratory indicators free text 20 indicators (see Table 4 ) clinical observations Additional information free text the patient's sitting height and the distance between the patient and intelligent mirror other factors Table 4 Indicators used in the manually annotated dataset. notated document closest to the centroid. For the remaining 588 patients, we used one of the specified prototype templates (manually evaluated as the best) to assist with the annotation. To our knowledge, this is the largest ongoing study to date to create a dataset of continuously collected, labeled tongue images. We envision that applying machine learning (such as deep neural networks) to tongue images might provide a useful diagnostic tool for geriatricians. This valuable labeled dataset may serve as training data in multiple scenarios in the future, such as describing Chinese tongue diagnosis systematically, predicting geriatric syndromes using tongue appearance, and even developing an mHealth application to provide individualized health suggestions based on tongue causes.

Limitations
Our work was based on a single organization, Yueyang Integrated Traditional Chinese Medicine and Western Medicine Hospital, collecting from a single race in Asia, Chinese, and therefore might not be generalizable to other races and ethnicity categories.
The two free-text annotated documents may be misleading due to having been translated from their original language as well as algorithmic bias. These documents were initially written in Chinese, then translated into English by a medical student and approved by at least one of the experts.

Ethics statement
We obtained trial approval at ClinicalTrials.gov. This project was approved by Yueyang Hospital's IRB without any "minority groups or other sensitive or disempowered populations." All participants signed the consent form and agreed to share their data with face and tongue images.