Text mining datasets of β-hydroxybutyrate (BHB) supplement products’ consumer online reviews

The current dataset is obtained by text mining of β-hydroxybutyrate (BHB) supplement products’ consumer online reviews. The text data of 71 BHB products’ consumer reviews were extracted with the aid of the Web Scraper Chrome extension. Then, a lexicon-based sentiment analysis approach was developed to classify the sentiment or polarity of BHB products’ consumer reviews. Both word-level and sentence-level sentiment analyses were conducted to score the analyzed text snippets. In terms of word-level sentiment analysis, word clouds of selected BHB products’ reviews were generated to give direct observation, and the statistics of high-frequent sentiment words were listed for comparison. In terms of sentence-level sentiment analysis, two factors such as flavor and package were taken into consideration to map the products' polarity distributions. Besides, the complex analysis provides us with the basic statistics of the analyzed BHB customer reviews data.


a b s t r a c t
The current dataset is obtained by text mining of βhydroxybutyrate (BHB) supplement products' consumer online reviews. The text data of 71 BHB products' consumer reviews were extracted with the aid of the Web Scraper Chrome extension. Then, a lexicon-based sentiment analysis approach was developed to classify the sentiment or polarity of BHB products' consumer reviews. Both word-level and sentence-level sentiment analyses were conducted to score the analyzed text snippets. In terms of word-level sentiment analysis, word clouds of selected BHB products' reviews were generated to give direct observation, and the statistics of high-frequent sentiment words were listed for comparison. In terms of sentence-level sentiment analysis, two factors such as flavor and package were taken into consideration to map the product distributions. Besides, the complex analysis provides us with the basic statistics of the analyzed BHB customer reviews data.
© 2020 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license.
( http://creativecommons.org/licenses/by/4.0/ ) Specifications Table   Subject  Medicine and Dentistry   Specific subject area  Health Informatics  Type of data  Table, Figure  How data were acquired A lexicon-based sentiment analysis approach was developed to classify the sentiment or polarity of β-hydroxybutyrate (BHB) products' consumer reviews obtained from Amazon.com. Data format Raw, Analyzed Parameters for data collection The text data of β-hydroxybutyrate (BHB) products' consumer reviews were extracted with the aid of the Web Scraper Chrome extension. Positive, negative, neutral, and compound scores were assigned to the analyzed text snippets through sentiment analysis. The basic text statistics including number of characters, number of words, number of sentences, and number of unique words in those reviews were obtained through text complexity analysis.

Description of data collection
The entire text data of β-hydroxybutyrate (BHB) products' consumer reviews on Amazon.com were collected within 2 months of the year 2019. Then, the word-level and sentence-level sentiment analyses were conducted based on the collected text data. Through analysis, the scores such as positive, negative, neutral, and compound were assigned to the analyzed text data. Among them, the compound score gives the overall rating within the range from −100% to + 100%. Factors such as flavor and packaging were considered to map the BHB product distributions. Besides, the complexity analysis was used to provide the text statistics of analyzed BHB product reviews including the word number, sentence number, and character number in the reviews. Data source location Amazon.com (an online data source) Data accessibility Data is available in the supplementary file attached with this article.

Value of the Data
• The datasets of text-mining β-hydroxybutyrate (BHB) supplements' consumer online reviews is a new marketing research of dietary supplements. It helps the researchers, product developers, and marketers in the field of nutrition to develop new healthcare products with affinity to customers. • The researchers, product developers, and relevant marketing professionals in multiple fields such as functional food, dietary supplement, and nutrition can indirectly or directly benefit from those data. • Those processed consumers' feedback data covers the impacts of flavor and packaging upon the consumer acceptance of novel dietary supplements. • The sentiment analysis used here provides us with an innovative approach to resolve the customer feedbacks upon fast-moving consumer goods (FMCG) products.

β-hydroxybutyrate (BHB) is the conjugate base of the organic compound hydroxybutyric acid.
Previous studies demonstrated that BHB possessed the functions of stress reduction [1] , neural protection [2] , seizure alleviation [3] , weight loss [4] , and body metabolism in starvation [5] . In this investigation, we conducted a lexicon-based sentiment analysis of BHB supplement products' customer reviews obtained via Amazon.com. The statistics of the entire data pool is summarized in Table 1 . For the word-level sentiment analysis, the word clouds of brand A's BHB powder products with different flavors ( Fig. 1 ), berry flavor-involved BHB powder products under different brands ( Fig. 2 ), and BHB capsule products under different brands ( Fig. 3 ) were displayed for direct observation. In addition, the high-frequency sentiment words' compositions for branded BHB powder/capsule products were shown in Fig. 4 . In terms of sentence-level sentiment analysis, two factors such as flavor and packaging were taken into account to map the BHB products' polarity distributions ( Fig. 5 A, Fig. 6 A), respectively. During sentence-level sentiment analysis, the score assignments such as positive, negative, neutral, and compound for partiallyselected BHB products' consumer reviews were displayed in Table 2 . At the same time, the average compound scores in the categories of flavors and packages were calculated and shown in Fig. 5 B and Fig. 6 B, respectively. Lastly, the complex analysis of partially-selected BHB products were displayed in Table 3 .   Fig. 1 A  Fig. 3. Word clouds of β-hydroxybutyrate (BHB) capsule products under different brands.

Online review scrape
The text data of β-hydroxybutyrate (BHB) products' customer reviews were collected from Amazon.com with the aid of the Web Scraper, a Chrome extension. After collection, text data sometimes require pre-process or cleaning before text mining to minimize the noises or bias [6] . For the reviews in this research, most users express their comments in a brief and straightforward way. There are not many noise and uninformative parts as HTML tags, scripts and advertisements as other online texts [6] . We simply cleaned the text data by removing special characters and reorganizing the content for further analysis. On another side, we tried maintaining the originality of the review contents as much as possible.

Word-level sentiment analysis
An external lexicon served as resource to judge the text sentiment or polarity [7] . The words in online reviews of one product are obtained with NLTK tokenization before sentiment classification [8] . Then, they are classified into categories of positive and negative for further analysis. Besides, word clouds are generated based on the word-tokenized text contents with the wordcloud function in NLTK [8] .

Sentence-level sentiment analysis
Vader sentiment analysis of sentence-tokenized text of BHB products' reviews is performed to gain sentiments such as positive, negative, and polarity scores [9] . This approach provides how positive or negative a snippet under analysis is. In details, the sentence-level snippets are then classified into the categories of positive, negative, neutral, and compound, during which scores are assigned to each snippet. Among the four categories, the compound score measures the sum of all the lexicon ratings (positive, negative, and neutral) that have been normalized between −100% (most extreme negative) and + 100% (most extreme positive). The higher the compound score, the more overall positive we obtain.

Text complexity analysis
Text complexity analysis gives a statistical summary of the text data we collected. The text complexity analysis summarizes the number of online reviews for one product, number of characters, number of words, number of sentences, and number of unique words in those reviews. With text complexity analysis, we can take one more dimension to view those text data, judge the text feature, and predict the product market confidently.  6. Sentence-level sentiment analysis of BHB products' online reviews with package focus. A Bubble chart of BHB product package versus dosage form; Bubble size indicates the compound score ranging from −100% to + 100%; B Bar chart of BHB product average compound score versus package; The line Average Compound = 20% is differentiating line. Table 1 shows the statistics of BHB product review data collected on Amazon.com. The BHB product reviews in text were collected within 2 months of the year 2019. The entire text data set include 30,877 reviews, 105,703 sentences, and 1,574,171 words. Those product reviews reflect the clients' feedbacks and comments to 71 products under 26 brands.

Conflict of Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.