Injury prevention for older adults: A dataset of safety concern narratives from online reviews of mobility-related products

Older adults are among the fastest-growing demographic groups in the United States, increasing by over a third this past decade. Consequently, the older adult consumer product market has quickly become a multi-billion-dollar industry in which millions of products are sold every year. However, the rapidly growing market raises the potential for an increasing number of product safety concerns and consumer product-related injuries among older adults. Recent manufacturer and consumer injury prevention efforts have begun to turn towards online reviews, as these provide valuable information from which actionable, timely intelligence can be derived and used to detect safety concerns and prevent injury. The presented dataset contains 1966 curated online product reviews from consumers, equally distributed between safety concerns and non-concerns, pertaining to product categories typically intended for older adults. Identified safety concerns were manually sub-coded across thirteen dimensions designed to capture relevant aspects of the consumer's experience with the purchased product, facilitate the safety concern identification and sub-classification process, and serve as a gold-standard, balanced dataset for text classifier learning.


a b s t r a c t
Older adults are among the fastest-growing demographic groups in the United States, increasing by over a third this past decade. Consequently, the older adult consumer product market has quickly become a multi-billion-dollar industry in which millions of products are sold every year. However, the rapidly growing market raises the potential for an increasing number of product safety concerns and consumer product-related injuries among older adults.

Keywords:
Safety concerns Health informatics Older adults Injury preventions Recent manufacturer and consumer injury prevention efforts have begun to turn towards online reviews, as these provide valuable information from which actionable, timely intelligence can be derived and used to detect safety concerns and prevent injury. The presented dataset contains 1966 curated online product reviews from consumers, equally distributed between safety concerns and non-concerns, pertaining to product categories typically intended for older adults. Identified safety concerns were manually sub-coded across thirteen dimensions designed to capture relevant aspects of the consumer's experience with the purchased product, facilitate the safety concern identification and sub-classification process, and serve as a gold-standard, balanced dataset for text classifier learning.  Data was obtained by extracting publicly available Amazon.com reviews through the use of an automated script. The software artifact was designed to identify and extract product reviews pertaining to the 'assistive-aid' category, which encompasses all relevant mobility assistive devices. Data format Raw, Analyzed Description of data collection The script extracted a total of 633,141 reviews, distributed across 3260 unique products (Amazon Standard Identification Numbers [ASINs]), from assistive-aid product categories often used by older adults (itemized in

Value of the Data
• A dataset with this volume of narratives does not exist in the older adult injury-prevention community. Researchers, manufacturers, and regulators alike may utilize the sub-coded narratives to identify safety concerns, prevent older adult mobility-related injuries, and extract valuable context that helps better understand product-risk vs personal-risk among older adults. • The dataset was intentionally balanced by product category (equal number of safety concerns and non-concerns for each product type) to facilitate machine learning classifier training and testing. • Trained classifiers can be used to automate the review labeling (sub-coding) process and quickly determine individual product risk levels across large datasets of online consumer reviews of products for older adults. • Regulating agency product-risk early warning systems may better benefit from the use of a dataset focused around injury prevention, such as the one presented, rather than conventional, historic hospital-incident-reporting datasets.

Data Description
Although formal medical narratives describing consumer-product-related hospitalizations of older adults are retrospectively captured in the United States' National Electronic Injury Surveillance System (NEISS), informal consumer narratives highlighting prospective safety concerns for these products have not been systematically collected. The dataset we describe here aims to fill this void. The data presented contains 1966 sub-coded Amazon.com mobility-related product reviews, distributed across eight mobility-related product categories (seen in Table 1 ). Each product category was intentionally stratified and balanced (equal number of safety concerns and non-concerns), to facilitate the usage in training and testing of machine learning classifiers. Additionally, within the file, safety concerns and non-concerns are separated across two worksheets, 'Concerns Dataset' and 'Non-Concerns Dataset', respectively. Identified concerns were manually sub-coded and classified according to the categorical and binary dimensions presented in the following section.

Hazard narrative sub-coding categories
Hazard narratives are expressed throughout a total of 43 descriptive features. The first eleven features correspond to general review information, such as title and date posted, while the remaining thirty-two contain relevant sub-coded information regarding product performance, consumer injuries, design guidance, and author type. These sub-coded features are split across a number of categorical (5) and binary dimensions (7), which are designed to capture relevant aspects of the customer's narrative with the purchased product ( Fig. 2 ).
Categorical dimensions (described below) contain information concerning body part affected, injury type, severity, product performance, and design guidance: • Injury timing: Injury severity (none, potential, minor, major), as per definitions in Binary features document consumer product-related falls, guidance type, author type, and specific product defects, namely, pain while using (DSPI), part breakage, and poor surface handling: • Safety concern: Coded as 1 when Injury Timing and Severity dimension was Potential Injury, Minor Injury, or Major Injury; coded as 0 otherwise. That is, coded as 0 only when Injury Timing and Severity dimension was No Injury. • Fall: Coded as 1 when consumer described occurrence of a fall by themselves or a fall by someone using the product; coded as 0 otherwise. Example: "My 91-year-old father has had this item for less than 9 months and he fell when getting out of bed". 1 • Part Breakage 2 : Coded as 1 if the review clearly states a piece broke off or fell off the product; coded as 0 otherwise. Examples: "The scooter scooted fine for 6 weeks. Then the front wheel assembly snapped off. The metal literally cracked and separated.", "Wheel fell off after 2 months. Had to buy replacement bolts at ACE hardware to repair." • DSPI: Coded as 1 when user experiences consistent Discomfort, Soreness, Pain, or Irritation while using the product; coded as 0 otherwise. Examples: "The lettering on the cushion hurt my knee to the point it made my knee bleed" or "Had to take it off my chair because it scrapped my knuckles every time I would push up out of my chair. • Poor surface handling: Coded as 1 if product handling performance is unsatisfactory on different surfaces, product keeps getting caught, or is a hazard on different surface types; coded as 0 otherwise. Examples: "if you go over a rumble strip or an incline be prepared to fall ", "quite effectively and painfully delivers any shock from rough surfaces right to the injured ankle! And their small narrow size makes them easily 'chock.' A lamp cord or piece of gravel can send you tumbling if you are not careful!" • Design guidance type: Coded as 1 if the customer provided explicit design guidance; coded as 0 if the design guidance had to be inferred. Example of explicit design guidance: "If the walker would have attached easily I think we might have avoided this disaster." • Author type: Stored as two separate binary columns, capturing whether the review author explicitly mentions they are an older adult (Example: "tripping hazard for me (80 + years old) ") or they are a caregiver to an older adult (Examples: "My mother should have had her walker …", "My husband fell because …". Table 2 shows five example safety-concern reviews from the final dataset, with the major coded attributes alongside, for illustration. Additionally, detailed dimension summary statistics can be found within the "Safety Concern Counts" sheet, available in the labeled (sub-coded) file, and in Figs. 1-3 , and Tables 3 and 4 .   Safety concern binary dimension Distribution and aggregate counts (total reviews).

Experimental Design, Materials and Methods
Data was obtained by extracting Amazon.com reviews through the use of an automated script. The script extracted a total of 633,141 reviews, distributed across 3260 unique products (Amazon Standard Identification Numbers [ASINs]), from assistive-aid product categories often used by older adults (itemized in Table A.1 in Appendix A ). In the Initial Exploratory Phase of data coding (described in detail under Technical Validation: Initial Exploratory Phase later on) 50,0 0 0 randomly selected reviews were labeled, in order to detect and flag potential safety concern narratives. A total of 3100 flagged narratives were shortlisted from the large-scale labeling effort.

Product category examination
Prior to beginning safety concern sub-coding, one of the investigators, a full professor in Human Development and internationally recognized expert in gerontology at an R1 University in the United States, examined the shortlisted review product categories. Examination efforts resulted in the identification of nineteen major product categories, seen in panel (a) of Table 5 . Categories determined to be unrelated to mobility-assistance, such as shower aids -seen in panel (b) of Table 5 -were removed from the dataset, leaving 1045 potential safety concern narratives for mobility-related products for older adults. While most categories in panel (a) of Table 5 contain products predominantly used by older adults (e.g. rollators, walkers, canes, wheelchairs, and ramps), some categories (e.g. kneewalkers, scooters, and crutches) comprise products that are frequently also used by younger adults. The latter categories were also retained as there was occasional use by older adults and the safety-related consumer narratives can be regarded as helpful regardless of the age of the writer of the product review. The retention of solely mobility-related products allowed for the creation of finely tailored, category-specific subcoding dimensions, facilitating the construction of a high-quality dataset.

Data coding (confirmatory phase)
In the Confirmatory Phase, two graduate assistants, at an R1 public land-grant university in the United States, individually labeled the 1045 shortlisted potential safety concerns, according to Injury Type (described in the previous section), confirming that 983 reviews were indeed safety concerns. The graduate assistants then further sub-coded features across a number of categorical and binary dimensions (described in the previous section). Categorical dimensions contain information concerning injury timing, body part affected, injury type, injury severity, design guidance, and general product performance attributes that relate to the safety concern (e.g. slippage and breakage). N-ary dimensions were reduced to multiple binary dimensions. For example, Body Part Affected is reduced into binary columns for body groups such as head, arms, legs, and so forth. Binary features also capture consumer-product-related falls, guidance type, author type, and specific product defects, such as pain while using, part breakage, and poor surface handling. Coding of the Injury Type and Body Part Affected dimensions by the two graduate assistants in the Confirmatory Phase was performed according to the Consumer Product Safety Commission (CPSC) National Electronic Injury Surveillance System (NEISS) coding manual [2] .

Technical validation: initial exploratory phase (50,0 0 0 reviews)
In the Initial Exploratory Phase, safety-concern discovery volume was prioritized. Coding was completed by two hundred and twenty-nine (229) graduate (Masters-level) students at an R1 public land-grant university in the United States. We assessed inter-rater reliability via quadratic weighted Cohen's κ (1960) [1] , as our labeling structure is best represented as that of an ordinal problem with varying degrees of disagreement between coders. For example, coding a review as "minor injury" versus "major injury" means the two coders were close and not totally discordant, whereas coding a review as "major injury" versus "no injury occurred" is a more significant disagreement. Hence, traditional Cohen's κ scores may not accurately reflect our inter-rater reliability, as they fail to factor in varying degrees of disagreement between coders.
Quadratic weighted scoring, popular because of its practical interpretations, assigns nonlinearly distributed weights to 'observed' and 'by chance' probabilities, allowing for the representation of varying levels of agreement between raters and therefore providing a more accurate reading of our inter-rater reliability [3] .
Kappa scores were calculated by identifying double-coded reviews (2260) and filtering out unreliable coders who marked an unusually high proportion of reviews as safety concerns ("rogue coders"). This step generally accounts for most of the disagreements (see Fig. 4 ). Rogue coders were filtered out using the following criteria: • More than 20% of their total labels are safety concerns.
• Tagged more than four reviews with 'Major Injury'.
Filtering efforts resulted in the retention of 1258 double-tagged reviews, from which the weighted Cohen's κ scores were calculated. The weighted quadratic Kappa score for Injury Timing was κ = 0.59 (1258 cases; 133 disagreements; 1125 agreements; 89% agreement). Per Landis and Koch (1977) [4] , these κ scores represent moderate inter-rater agreement, indicating satisfactory inter-rater reliability for the exploratory safety concern discovery phase. In the case of disagreement among coders, we allowed the most conservative decision to prevail; that is, an item was regarded as a safety concern if any coder regarded it as a safety concern. Fig. 4 , shown above, provides a visual representation of the effect unreliable coders have on Kappa scores, for the Initial Exploratory Phase coding. (For readability, the figure is truncated Fig. 4. Kappa score and remaining reviews by unreliable raters removed. to reflect only the 15 most unreliable coders). The removal of two highly unreliable coders (xaxis) almost doubled quadratic Kappa scores (left y-axis), while still retaining just under 20 0 0 double-tagged reviews (right y-axis), justifying rogue coder filtering efforts.

Technical validation: confirmatory phase (1045 safety concerns)
In the Confirmatory Phase, safety-co ncern labeling reliability was prioritized. The list of safety concerns shortlisted in the Initial Exploratory Phase was re-coded by two graduate assistants (Coder A and Coder B), and sub-coded along additional dimensions. Coder A and B first coded a random set of 72 shortlisted potential safety concerns (from the Initial Phase). For the initial 72 items, the Cohen's Kappa scores were κ = 0.68, 0.37, and 0.79 for the body part affected, injury type and injury timing dimensions, respectively. Both coders resolved discrepancies in their coding by discussion. A second random set of 72 items was coded to guarantee satisfactory Cohen's Kappa scores prior to final dataset labeling. The Cohen's Kappa scores were κ = 0.84, 1.00, and 0.61 for the body affected, injury type and injury timing dimensions, respectively. Once again, coders resolved discrepancies by discussion and then proceeded to continue coding the complete dataset in parallel. The Cohen's Kappa scores for the final dataset were κ = 0.95, 0.91, and 0.89 for the body affected, injury type and injury timing dimensions, respectively. Hence, there was further improvement in inter-rater agreement. Table 6 shows the inter-rater agreement between graduate Coder A and graduate Coder B on the full set of 1045 items in the Confirmatory Phase.
Both coders discussed labels to resolve the remaining disagreements to arrive at the final labels for each record

Ethics Statements
No conflict of interest exists in this submission. The authors declare that the work described in this paper is original and not under consideration for publication elsewhere. Its publication is approved by all the authors listed.

Appendix B
Detailed sub-coding protocols followed by the graduate assistants throughout the Confirmatory Phase labeling process. Code numbers, names, and definitions for the "Body Part Affected" ( Table B.2 ) and "Injury Type" ( Table B.3 ) dimensions are from the CPSC NEISS Coding Manua [2] , and are replicated here for convenience.

Injury Timing and Severity Definition
No Injury There is no indication of an actual or potential injury as a result of product usage. Example: "A real lifesaver after Achilles Tendon surgery. So happy with the product. It is sturdy and had no problems with corners or tipping." Potential Future Injury Injury could possibly occur; thus, reviewer is cautious about using the product. Example: "I bought the bed cane for my mother who has Parkinson's. We thought it was awesome until she nearly hung herself on it . She slid off the bed and got her head caught under the hooked side. Luckily, I heard her before she suffocated." Minor Injury Someone was actually hurt by the product, but it was a minor incident and no doctor's visit, or hospital visit was required. Example: "My 91-year-old father has had this item for less than 9 months and he fell when getting out of bed , after 2 pieces of it came off." Major Injury Someone was actually seriously hurt by the product, and required a doctor or hospital visit, or died. Example: "The chair was assembled exactly as required in the instructions but was very uneven. My disabled husband used it and fell off requiring fire rescue and a three day hospital stay ."  Other 84 -25-50% of body 25 -50% of the body is identified as injured.

-Not recorded
The review does not identify an affected body part. Product or product parts breaking/bending, poor material finishing (rough surfaces, sharp edges), pieces coming loose or falling off. General design issues such as product or product parts bumping into user, poor maneuverability (turning radius), inadequate height or width adjustments, missing pieces. Example: "the most annoying is my good right foot keeps catching on the back right wheel , cutting my right ankle bone" Example: "Back wheel fell off. When I went to attach the wheel that is when I noticed the plastic groves were shredded." Example: "adjustable pin in the front that prevents the range of motion in the front axle. It frequently works itself loose allowing the pin to drop and letting the axel turn completely sideways."

-Unstable
Product or product parts (directly relating to product stability, such as walker wheels) are unsteady, wobbly, tip over, or uneven. Example: "This thing is a death trap. It does not come with instructions; the wheels are wobbly and uneven " Example: " Wobbly and unsafe " Table B.5 Codes and definitions for "design guidance" dimension.
Design Guidance (Improvement Type) Definition

-Improve Design
Suggestions for the improvement of product design, e.g., wider base, leg length, weight.

-Improve Construction Quality
Suggestions for manufacturer, product assembly process improvement, e.g., wobbly/loose pieces, brake issues.

-Improve Material Quality
Suggestions for the use of better manufacturing materials, relates to product part breakage and overall product quality.

-Improve Comfort
Suggestions for the improvement of product comfort, e.g., increase product padding, surface softness.

-Add Non-Slip Surface
Suggestions for the addition of a non-slip surface, e.g., water-proof rubber tips, rubber wheels. 6 -Remove Sharp/Protruding Piece Suggestions for the removal of protruding pieces, such as screws, or sharp edges from the product. 7 -Improve Surface Handling Suggestions for the improvement of product performance on different surfaces.

-Other
Suggestions for other, customer-specified product improvements 9 -Implied Design Change Not Clear Suggestions for product design change could not be determined from the provided information.