Lifecycle forecast for consumer technology products with limited sales data

Early lifecycle demand forecast is critical to consumer technology products with a fast innovation speed, as firms which compete on these products focus on timely responding to market changes through new product development and efficient product diffusion, rather than sustaining product sales. The challenge for obtaining an accurate long-range forecast is that sales volumes at the early lifecycle stages are small, which limits the forecast accuracy. We propose a two-step lifecycle forecast approach for consumer technology products with limited sales data. First, we segment products based on market and clustering. Second, we apply the Bass model to aggregated products in a group using the average periodic sales of all products in the group and then use the forecast for related new products. We validate our approach using a dataset collected from Philips Netherlands, which contains consumer healthcare products sold in US and China over an 8-year timespan. The results suggest that for forecasting the lifecycle of a new product, models based on aggregated products generally perform better than models based on an individual product. It highlights the value of data aggregation in product lifecycle forecasts. Clustering is also useful for improving the forecast accuracy: when aggregation is done using sufficient product sales data, the aggregated model based on products with which the new product has the most sales pattern similarities could provide a more accurate forecast than other aggregated models. Based on our results, we provide a practical guideline to firms for obtaining an accurate early product lifecycle forecast.


Introduction
In today's business world, firms often need to repeatedly introduce new products or services that satisfy previously unmet demand to maintain high profitability.This is especially true for firms which focus on products with a fast innovation speed, i.e., the speed at which product technology is being updated in the market (these products are also known as innovative products in Fisher (1997)).Examples of such products are fast fashion, health care services, and consumer technology products, including electronic devices used for personal entertainment, communications, and recreation.Resulting from a fast innovative speed, these products are only available in the market for a short amount of time: around 1.5-5 years (Gaimon and Singhal, 1992).A report from McKinsey shows that the lifecycle of fast fashion clothing can be as short as 2 weeks (Achim Berg et al., 2018).For innovative products, the chance of consumer repurchasing is slim due to constant new product launches in the market.Therefore, rather than sustaining product sales, firms with innovative products aim at timely responding to market changes through new product development and efficient product diffusion (Larina, 2017).
To respond quickly, e.g., deciding whether to increase marketing efforts at a given time, firms need to characterize (remaining) sales trajectories of products.For short-lifecycle products, such forecasts are more valuable when obtained at the earlier lifecycle stages, given the four lifecycle phases, i.e., the introduction phase, the growth phase, the maturity phase, and the decline phase (Saaksvuori and Immonen, 2008).Especially for consumer technology products of which sales concentrate on the short growth phase and quickly fall off afterwards (An et al., 2021), if lifecycle forecast is only conducted at the late stages, firms will have little time to react and implement strategies.The challenge lies in obtaining an accurate lifecycle forecast at the early lifecycle stages with limited sales data.Short lifecycles generally imply small volumes of sales.Compared to some other innovative products with short lifecycles such as fast fashion clothing, consumer technology products have an even smaller sales volume due to their relatively expensive prices and durable product attributes.In addition, with sales concentrating on the short growth phase, sales volume of consumer technology products is extremely small at the introduction phase.
The challenge lies in obtaining an accurate lifecycle forecast at the early lifecycle stages with limited sales data.Short lifecycles generally imply small volumes of sales.Compared to some other innovative products with short lifecycles, such as fast fashion clothing, consumer technology products have an even smaller sales volume due to their relatively expensive prices and durable product attributes.In addition, with sales concentrating on the short growth phase, the sales volume of consumer technology products is extremely small at the introduction phase.We propose a lifecycle forecast approach for consumer technology products with limited sales data and irregular lifecycle patterns.Our approach consists of two steps.First, we segment products based on market and clustering.Second, we apply the diffusion model to each group of products using the average periodic sales of all products in the group and then use the forecast for related new products.We apply our method to a dataset collected from Philips Netherlands, which contains consumer technology products such as electronic toothbrushes, sold in two major markets, the US and China, over an 8-year timespan.By applying the Bass model to each product, we identify the innovation effect, the imitation effect and the market potential of the product.Furthermore, we apply the Bass model to forecast the lifecycles of new products.We generate various Bass models using aggregated products in a market or cluster.By comparing the fit of these models to the independent testing products, we find that models based on aggregated products generally provide a better forecast accuracy than models based on an individual product.When the aggregation is done using sufficient product sales data, the aggregated model based on products with which the new product has the most sales pattern similarities could provide a more accurate forecast than other aggregated models.
We contribute to the literature and practice in three ways.First, we extend the seminal product lifecycle forecast method to consumer technology products with limited sales data.Current lifecycle forecast methods focus on products which have a mid-range lifecycle and thus sufficient sales data.Second, by comparing the performance of different models based on clusters of different amounts of products with which the new product has different amounts of similarities, we reveal the value of clustering, the value of aggregation, and the value of applying the diffusion model in product lifecycle forecast.Current methods for forecasting sales patterns of new products mostly focus on clustering with a sufficiently large product sample and use previously observed patterns in a cluster for related new products.Third, we generate practical steps for obtaining an accurate early lifecycle forecast for products which are not limited to consumer technology products since the lack of data issue generally applies to all products at the early lifecycle stages.Our approach does not require firms to collect data of external factors, instead, it uses clustering and aggregates the existing sales data in a cluster to improve the forecast accuracy.
The remainder of the paper is organized as follows.In Section 2, we review the related literature.In Section 3, we introduce our two-step lifecycle forecast method.In Section 4, we validate our method using a dataset collected from Philips Netherlands.In Section 5, we generate a practical guideline to firms for obtaining an accurate early lifecycle forecast.

Literature review
Our work is related to two streams of literature: the literature of sales forecast and the literature of product lifecycle forecast.First, there are three main categories of sales forecast methods: causal models, time series forecast methods, and AI-based forecast methods.Causal models express relevant causal relations between external factors and product sales (Varian, 2016).Some examples of such factors are consumer attitudes obtained by using surveys, macroeconomics indicators such as inflation rate (Pal et al., 2014), as well as product features (Spirtes and Zhang, 2016).Regression models which assume a specific relation between independent variables and dependent variables are the most commonly used causal models (Lu and Wang, 2010).Sales of some products can also be used as an indicator for sales of some other products (Tsoumakas, 2019).Kohli et al. (2021) developed a k-nearest neighbors regression model which first group stores based on the store type and then use factors such as the number of competing stores in the neighborhood and promotion activities to predict product sales at each chain drug store in Germany.Causal models usually require at least 2 years of relevant data to achieve a good forecast accuracy (Brillio, 2018).However, for products with short lifecycles, such data requirements are difficult to meet.Furthermore, the current applications of causal models in sales forecast are limited to point forecasts in a near future, e.g., sales for next month.For short-lifecycle products, an important task is to predict the entire sales trajectory, i.e., a long-range forecast, as it helps firms better respond to market changes.
The second category of sales forecast methods use patterns in historical time-series sales data to predict future sales.Pongdatu and Putra (2018) used the exponential smoothing method for forecasting sales of clothing for the next season.Arunraj and Ahrens (2015) developed a seasonal autoregressive integrated moving average model for forecasting daily sales of a perishable product.Hiranya Pemathilake et al. ( 2018) proposed a hybrid model based on ARIMA and neural network for forecasting monthly sales of Apple products.Similar to causal models, sufficient data is required for time-series forecast methods to detect repeated patterns (De Gooijer, 2017) and time-series forecast methods also mainly focus on near-future point forecasts, which limits their applications to other forecast problems.
The third category of sales forecast methods use machine learning algorithms such as artificial neural network (Chawla et al., 2019), support-vector machine (Villegas et al., 2018), and random forest (Punia et al., 2020).Jain et al. (2015) proposed a Bayesian model which learns the patterns in previous consumer purchases for forecasting next-month demand.Ma et al. (2016) developed an AI-based method which uses historical sales data and factors such as product prices for forecasting weekly grocery sales.Loureiro et al. (2018) used deep neural networks for forecasting sales of women handbags for the next season.Compared to the previous two categories of forecast methods, AI-based methods require an even larger amount of data input as their goal is to provide a precise forecast with a smaller time range, e.g., real-time forecast.
Our work belongs to the stream of literature on product lifecycle forecast, which predicts the diffusion process (thus the lifecycle) of durable products.Different from near-future sales forecast, which gives a possible future sales value, diffusion process forecast generates a sequence of sales over a period of time further into the future.There are two main categories of models for forecasting the diffusion process: diffusion models (Qin and Nembhard, 2012;Guo, 2014;Dev et al., 2020) and AI-based diffusion recognition models (Miao et al., 2017;Velasco et al., 2019).As one of the earliest diffusion models, the Bass model predicts sales growth for a durable new product using only sales data (Bass et al., 1994).Seol et al. (2012) proposed a diffusion model for new products which compete with existing products.It investigates the impact of product competition on the diffusion process.Ganjeizadeh et al. (2017) adopted the Bass model to generate sales forecasts for a new high-tech product, using data of previous products which share similar features as the new product.Tseng et al. (2012) used the diffusion model to analyze the development of Taiwan's TV market over the next decade.Song et al. (2015) proposed a hybrid Bass-Markov model to predict the diffusion process of wireless broadband service in Korea.
AI-based diffusion recognition models use methods such as machine learning and deep learning to acquire previous diffusion patterns.By using historical sales data and data about promotion events, pricing, weather conditions, and competitors' behaviors, Pavlyshenko (2019) compared different machine learning approaches for capturing lifecycle patterns of drug store products.Xiao and Han (2016) developed an agent-based model built on a hidden influence network for forecasting the diffusion process of a new product.By analyzing consumer reviews and ratings, Aggrawal et al. (2017) developed a data mining-based approach to predicting long-term diffusion patterns of products on X. Li et al. interactive e-commerce sites.Diffusion process forecast for a new product can also be obtained by clustering and using the diffusion process of previous products related to the new product.Hu et al. (2019) developed a clustering-based product lifecycle forecast method.They fitted various-shaped lifecycle curves to historical customer sales data, clustered the curves of similar products, and used the representative curve of a cluster as the forecast for a related new product.
The current diffusion process forecasts are mainly for products with sufficient sales data (Massiani and Gohs, 2015).If a product has a short lifecycle and irregular sales patterns, the applicability of the current methods is not guaranteed.In addition, to capture the pattern in a range, learning models often require multivariate data inputs, which limits the model generalization to different industry examples.Our research focuses on short-lifecycle products and our lifecycle forecast method is built based on an existing diffusion model, which requires only time-series sales data.By clustering and aggregating products, we show how the forecast accuracy of the model can be improved.

Methodology
We develop an approach to projecting lifecycle demand for consumer technology products with a short lifecycle.Firms that compete on these products focus on timely responding to market changes through new product development and efficient product diffusion, rather than sustaining product sales.To respond quickly, e.g., deciding whether to increase marketing efforts at a lifecycle stage, firms need to perform timely and accurate forecasts on (remaining) sales trajectories of products.Sales of short-lifecycle consumer technology products usually concentrate at the growth phase, after which sales rapidly decline.This highlights the importance of lifecycle forecasts at the early stages.However, the sales volume before the growth phase is extremely small, which makes it difficult for firms to obtain an accurate forecast.
To tackle the data limitation issue, our lifecycle forecast approach consists of two steps.First, we segment products based on market and clustering.Second, for forecasting the (remaining) lifecycle of a new product or a product with limited sales data, we apply the diffusion model to a group of products with which the new product has sales pattern similarities.Then we use the estimated diffusion pattern for the new product.

Step 1. product segmentation
The first step is to segment products based on heterogeneity in sales patterns.If products are sold in multiple markets, they can be first grouped based on market (Hahn et al., 1994).If there are distinct differences in terms of product features, products can also be grouped based on features.If products in the same market or category still show significant sales pattern heterogeneity, clustering methods should be adopted to further segment products.Time series sales data can be directly used in clustering, and similarities between multiple time series are defined by Euclidean distance metrics.There are three types of classic clustering methods: k-means clustering, mean-shift clustering, and hierarchical clustering.All three methods partition n data points into k clusters based on heterogeneity in data.The difference is that the value of k can be endogenously determined in k-means clustering (Chen and Lu, 2017), whereas it is exogenously determined in mean-shift clustering and hierarchical clustering.Mean-shift clustering iteratively assigns data points to clusters by shifting points towards the mode (i.e., the highest density of data points in the region).Hierarchical clustering starts with considering each data point as an individual cluster and iteratively merges similar clusters until one cluster or k clusters are formed.Different from the other two clustering methods, hierarchical clustering strives to build up a hierarchical graph of data, i.e., a set of nested clusters that are arranged as a tree.Among the three clustering methods, k-means clustering guarantees convergence due to the endogenous choice of k, whereas the other two methods are often not computationally tractable for large data sets.Thus, k-means clustering is the most commonly used clustering method.
To apply k-means clustering, the Silhouette method can be used to determine the appropriate value of k (Rousseeuw, 1987).Given k, it calculates the distance between product i in a cluster and neighboring clusters, referred to as the Silhouette width of product i, as follows: where subscript (i, t) refers to sales data of product i in period t. x(i, t) represents the average distance between sales of product i and that of all other products in the same cluster in period t, and y(i, t) represents the minimum average distance between sales of product i and that of all products in any other clusters in period t.The value of k, which leads to the largest average Silhouette width w(i) for all products, is the optimal value of k.An efficient way to select an appropriate value of k is to select a range of integer k values and compare the corresponding average w(i) (Lletí et al., 2004).This range of k values starts with 1, and the largest possible value of k is proportional to the total number of products in a dataset.
Once the value of k is determined, k-means clustering decides which data points to be assigned to each cluster, c 1 , …, c k , in order to minimize the distances between all points in each cluster to the cluster center as follows: where i t is the sales of product i in period t, μ n t is the mean of data in period t in cluster c n .

Step 2. diffusion model application
The second step of our approach is to select an appropriate lifecycle forecast model and apply it to each group of products obtained from product segmentation.We aggregate products in a group by calculating the average periodic sales of all products in the group, and thus, the resulted forecast is for the representative (average) product of the group.Using the average sales to represent the sales of a group is also a method that is little constrained by data limitation.For forecasting the lifecycle of a new product, we first identify the group which the new product resembles in terms of sales pattern or the group in the same market as the new product.Then we use the resulted forecast for this group as the forecast for the new product.
The forecast method of choice depends on the context of the forecast problem and the availability of data.For a long-range forecast, diffusion models are commonly used, which aim at predicting the diffusion process of durable new products among different groups of consumers.There are different diffusion models based on data input.One stream is characterized by using multivariate data, including sales, advertising, and pricing data, to forecast a new product's diffusion pattern.The strength of this type of diffusion model is that it incorporates more factors which affect product lifecycles, and thus, it is plausible to provide a more accurate forecast.The disadvantage is that it may not be applicable to many industry examples due to data limitations and the complexity of the analysis.Another stream of diffusion models uses only sales data.The strength is then its wide applicability.Among all types of diffusion models, the simplest one is the Bass model, which uses only time-series sales data.It describes the process of how new products are adopted by different classes of consumers based on their level of innovation.Potential adopters are divided into one of the two groups: the early adopters whose decisions are affected by mass media or the late adopters whose decisions are affected by word-of-mouth communication with the early adopters.The basic assumption of the Bass model is that the probability of an initial purchase of a consumer is a linear function of the number of previous buyers.The number of purchases at a given time can then be derived as follows: where S(T) represents sales at time T. Y(T) represents the total sales in the interval (0, T), i.e., Y(T) = ∫ T 0 S(t)dt p and q are the coefficients of innovation and imitation, respectively, and m indicates the total market size.Since Y(0) = 0, p is also the probability of an initial purchase at T = 0, reflecting the fraction of all consumers who are innovators in the market.q/m × Y(T) reflects the pressures operating on imitators as the number of previous buyers increases.
When applying to discrete time-series sales data, the Bass model in equation ( 3) is equivalent to the following equation: where Y(T − 1) is the cumulative sales through period T − 1, i.e., . a, b and c are constants which estimate pm, q − p, and − q/m in equation (3), respectively.Model coefficients can be estimated using the maximum likelihood estimation (MLE), nonlinear least-squares (NLS) or ordinary least squares (OLS) techniques.Srinivasan and Mason (1986) showed that NLS is slightly better than MLE regarding accuracy.There is no notable difference in performance between OLS and NLS, but it is easier to implement OLS due to its simple formula.After obtaining a, b and c of equation ( 4), p, q and m can then be derived as follows: In equation ( 7), ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ b 2 − 4ac √ represents the total innovation and imitation effect of a product, i.e., p + q, which is also referred to as the contagion rate of a new product (Bass et al., 1994).Empirical evidences show that estimates of usually lie between 0.3 and 0.7 (Lawrence and Lawton, 1981).When b 2 − 4ac is negative, it is not possible to estimate the corresponding value of m.This situation usually happens when a product has just been launched to the market, or its lifecycle had ended swiftly (thus with very limited sales data).
By measuring the innovation effect and the imitation effect of a product, i.e., p and q, and the market potential, i.e., m, the Bass model predicts the speed, timing, and amount of purchases from innovators and imitators for the product.For products with a strong innovation effect, i.e., a high value of p, firms should put more investments in advertising at the early lifecycle stages to accelerate sales growth.For products with a strong imitation effect, i.e., a high value of q, firms should pay more attention to potential after-sales problems from the early buyers to reduce the loss of sales from imitators.A product can have both a strong innovation effect and a strong imitation effect.For such products, sales will not only accelerate rapidly but also fall off quickly after reaching the maximum (Lilien et al., 2017).Hence, firms need to take quick actions regarding both advertising at the early lifecycle stages and product termination at the late lifecycle stages.For products with high market potential, i.e., a high value of m, firms should take more effort on planning marketing activities for the entire lifecycle.
Firms should also identify products with a small market potential, represented by a low value of m, or with a poor innovation or imitation effect, represented by negative p or q.Since the market potential of a product is usually fixed (Mahajan et al., 1993), a low value of m suggests firms not to put a large number of marketing investments in the product or consider not to launch the product if possible.A poor innovation effect implies that there are barriers in the early adoption process for the product, possibly due to inappropriate pricing or unclear product positioning (Lei and Moon, 2015).Katz and Shapiro (1985) showed an example of such barriers: due to consumers' concern with less familiar brands, early adoption for foreign automobiles was difficult.However, after the manufacturers revised their pricing and marketing strategies in the early lifecycle stages, these products started to gain popularity and sales quickly picked up.To overcome early adoption barriers, marketing efforts should be directed towards innovators.Different from other consumers who can be easily reached by mass media, innovators can only be reached by channels such as professional magazines and dedicated interest communities (Claessens, 2017).A poor imitation effect implies a declining motivation for imitators to purchase the product as the number of early consumers increase.It does not necessarily mean that early consumers are disappointed with their purchases.Instead, a poor imitation effect is often associated with product negative externality, i.e., an effect where benefits from a product decline as more consumers purchase it (Nagler, 2011).For example, the value of a training course on rare skills declines with the number of students who take the course, thus at the late lifecycle stages, consumers are discouraged from taking the course.When facing a product with a negative imitation effect, firms should monitor sales at each lifecycle stage and possibly terminate the product before sales reach a value that discourages new consumers from purchasing the product.
Based on whether p < q, product lifecycles can be classified into two categories.If p < q, which is normally the case for most innovations, then imitation behavior dominates, and the plot of sales over time will resemble a bell shape (Bass et al., 1994).It indicates that sales will attain their maximum value at about the time that cumulative sales are approximately one-half m.If p > q, as is the case for blockbuster products, then innovation behavior dominates, and the plot of sales over time will resemble an inverse J shape (Van de Bulte, 2002).It indicates that sales peak at introduction and decline in every subsequent time period.
The advantage of the Bass model is that it requires only sales data, and there is high interpretability of model results.However, if there are other factors, such as pricing and advertising, which have a direct impact on the product lifecycle, the Bass model should be extended to incorporate these factors.For example, incorporating the marketing effort as an exogenous variable, the Bass model can predict product growth under different marketing schemes.

Application to the philips dataset
We apply our method to a dataset collected from Philips Netherlands, which contains consumer healthcare products, such as electronic toothbrushes and shavers, sold in two major markets, the United States and China.The dataset consists of monthly sales data over eight years, starting from the January 1, 2011 to the June 1, 2018.Table 1

Table 1
Market and product lifecycle information in the Philips dataset.details of the products in our dataset.The product ID is generated based on the nature of the product.However, if a product is sold in multiple markets, different sales trajectories would occur and sales are record for each product lifecycle.Philips also considers the same product in different markets as different products in their sales dataset.In our dataset, 16 products are sold in the US market, and 14 are sold in the Chinese market, among all, 13 are sold in both markets.Consequentially, there are 30 separate products lifecycles (2 × 13 + 3 + 1).They cover the entire product collections in two product categories at Philips during the 8-year timespan.We consider a product to have completed its lifecycle in a market if there are no sales of this product in the market for at least 6 months prior to the right censoring date of the dataset.During the 8-year timespan of the dataset, 10 product lifecycles have been completed, of which 7 are in the US market, and 3 are in the Chinese market.The columns "EndPLC in US" and "EndPLC in CN" in Table 1 indicates whether a product has completed its lifecycle in a market.
In the dataset, we observe that a few products have an abnormal long tail of sales near the end of their lifecycles.An example of such a long tail is given in Fig. 1: around the 42 nd month, the product sales dropped to 0.001 of the cumulative sales, however, this small amount of sales lasted for 24 months until the end of the 65th month.Based on our discussion with Philips, these long tails of sales are not consumers' purchases at the time.In practice, when a product is near the end of its lifecycle, manufacturers will give the last batch(es) of products to distributors and retailers for future after-sales services according to their earlier contracts (ConvergeOne, 2018).Philips registers these deliveries as sales entries.Since these long tails are different from regular sales, we consider the start of the long tail as the end of the product lifecycle, i.e., removing the tail of sales data from the dataset.After removing the tails, the dataset contains sales data over 2315 months.

Product segmentation
In our industry example, product sales data are grouped based on the market, and according to our discussion with the firm, market information is among the most important indicators for product sales.In Table 1, it clearly shows that products in the US market have shorter lifecycles than those in the Chinese market, i.e., more products in the US have completed their lifecycles during the 8-year timespan.In addition, we conduct a dependent sample t-test to check if the means of product sales in the two markets differ.The p-value is smaller than the significance level of 0.05, showing that there are significant differences in product sales patterns in the two markets.Therefore, in the first step of our method, we first segment products based on the market and then perform clustering to products in each market to further capture sales pattern heterogeneity among products.Because the majority of the products in the dataset come from one product category, i.e., electronic toothbrushes, product features are not used in product segmentation.
We apply k-means clustering to products in each market using monthly sales data.To determine the appropriate number of clusters, i. e., the value of k, we adopt the Silhouette method which computes the average silhouette metric for all possible integer k values in a range.We choose the range k ∈{1, 2, …, 10} for each market since there are 30 products in the dataset, 16 for the US market, and 14 for the Chinese market.We then plot the Silhouette curve of different average silhouette width values, each of which is an estimate of the average distance between clusters given the number of clusters, i.e., k (see an example of such a curve, i.e., results for the US market, in Fig. 2).For both the US and Chinese markets, the average silhouette width reaches the maximum when k = 2.We then apply the k-means clustering with k = 2 to the products in each market.The clustering results are shown in Table 2.In the US market: 12 products belong to Cluster 1, of which 4 products have finished their lifecycles; and 4 products belong to Cluster 2, of which 3 products have finished their lifecycles.In the Chinese market: 11 products belong to Cluster 3, of which only 1 product has finished the lifecycle; and 3 products belong to Cluster 4, of which 2 products have finished their lifecycles.

Diffusion model application
The second step of our method is to apply the diffusion model.As one of the fundamental diffusion models, the Bass model forecasts the lifecycle and the diffusion process of a product, using only sales data.The outputs of the model, such as the sales forecast for each lifecycle stage, is easy to interpret, therefore, the Bass model has been widely used in various types of industries, including spare parts (Ismail and Abu, 2013), technology products (Lee and Huh, 2017) and service (Seol et al., 2012).However, such models have never been applied to consumer technology products.To our knowledge, diffusion models have also never been used at Philips for new product launches and management.In this situation, a simple model is better perceived by the firm and also because we only have access to sales data, we adopt the Bass model for our industry example.
As an example of how a simple diffusion model can directly generate actionable insights for a firm, we first apply the Bass model to each product lifecycle to estimate the coefficients, p, q, and m.Based on the estimated coefficients, we then provide suggestions to the firm.Table 3 shows the resulted model coefficients.In the US market, Product 3090, 8614, and 3101 have the highest values of p, q, and m, respectively.Therefore, in this market, Philips should put more investments in advertising for Product 3090 at the early stages and should pay more attention to after-sales problems of Product 8614 and to marketing activities for Product 3101 in each lifecycle stage.In the Chinese market, Product 7171 shows the lowest m value.Although this product has not yet completed its lifecycle, this result suggests Philips not to put more efforts on marketing in the product, considering its low market potential.In addition, in the Chinese market Product 8180 has the highest values for both p and q, suggesting that Philips should pay close attention to the entire sales trajectory of this product.As shown in Table 1, Product 8180 has not yet finished its lifecycle.If a product is in the growth phase, both imitation and innovation effects are present, and thus, the estimated p and q values of the Bass model for this product could both be high (Massiani and Gohs, 2015).Among all product lifecycles, Product 3105 and 3101 in the US market and Product 3080 and 3105 in the Chinese market have a negative p value and thus Philips should focus on developing appropriate pricing strategies for these products or positioning them properly in the market.In the Chinese market, the average p value of all products is 0.005, and the average q value is 0.082, whereas, in the US market, both the average p value and the average q value are higher (average p is 0.015 and average q is 0.1).
The implication is that products in the US market generally have a stronger innovation effect and a stronger imitation effect than products in the Chinese market.A similar finding is also discovered in Deepa and Gerard (2007), who applied the Bass model to products in developing and developed countries.Table 3 also shows the regression coefficient of determination, R 2 , which represents the goodness-of-fit of the Bass model to the sales data of a product lifecycle.The results show that the Bass model fits well for most of the product lifecycles.In the US market, 11 products for which R 2 is higher than 0.85.Only for two products, Product 7171 and Product 6272, the fit is poor.A possible explanation is that these products have a long maturity phase, at which the sales curve looks like a straight line, and at the right-censoring date of the dataset, these products are still in the maturity phase.For data that form a straight line, the quadratic regression (e.g., the OLS method we use to estimate the model coefficients) does not provide a good fit.Similarly, in the Chinese market, 11 products for which R 2 is higher than 0.85.Only for Product 3080, the fit is poor.We find that this product was quickly removed from shelves after the product launch and was replaced by an upgraded new product.Therefore, the sales data of this product is limited for the model to get a good fit.There are five product lifecycles (i.e., Product 4395, 6272, and 7171 in the US market, and Product 3101 and 3159 in the Chinese market) for which b 2 − 4ac is negative in equation ( 7) and thus we cannot estimate the Bass model coefficients.We verify with the firm that four of the five products (Product 4395,6272,7171,and 3101) were new to the market at the time of the analysis, and the other product (Product 3159) had a very short lifecycle due to low sales volumes.This result indicates that the application of the Bass model based on an individual product may be limited.For obtaining the lifecycle forecast for new products or products with limited sales data, the Bass model can be applied to a group of related products which have sufficient data.
Next, we assess the forecast accuracy of the Bass model when applying it to new products.We assign the products which have completed their lifecycles in a market to the training sets and the other products to the testing sets.Since there are more sales data and a more complete sales pattern for products with complete lifecycles, our choice for the training products could help improve the forecast accuracy of the Bass model.For each training set, we aggregate products by calculating the average monthly sales of all products in the set.Since there are distinct differences in the product sales, e.g., the average monthly sales of some products are thousands of units, whereas those of other products are hundreds of units (the m value in Table 4 indicates that the size of the total market varies significantly among different products), we normalize the sales data by dividing the monthly sales of a product by its cumulative sales during the 8-year timespan.For each aggregated group

Table 2
Product clusters for the US and Chinese markets. of products, its monthly sales are the average normalized monthly sales of all products in the group.In total, seven Bass models are estimated, donated as Mi, i = 1, …, 7: one for each cluster in a market, one for each market, and one for the combined two markets (referred to as the global market).The details of the seven models are shown in Table 4.As we use normalized sales data, the total market potential will always be estimated to be 100%, i.e., m = 1 for all seven models, and thus m is not shown in the table.Among all models, it is not possible to estimate the coefficients for M4, which is based on the training set in Cluster 4 of the Chinese market.It is because there are only two products in this set and one of the products has very limited sales data (Table 3 shows that it is not possible to estimate model coefficients for Product 3159).Among all available models, M1 and M3 have the highest values of p and q, respectively, indicating that on average, products in Cluster 1 of the US market have a strong innovation effect and products in Cluster 3 of the Chinese market have a strong imitation effect.The implication for Philips is that if a new product resembles products in Cluster 1 or Cluster 3 in terms of sales patterns, then this product may have a strong innovation effect or a strong imitation effect, and corresponding strategies should be adopted.
For each training set, the predicted total length of the product lifecycle (in months) is also listed in Table 4. M2 for products in Cluster 2 of the US market has the longest lifecycle, and M3 for products in Cluster 3 of the Chinese market has the shortest lifecycle.The difference between the two lifecycles is significant, i.e., over a year, and thus Philips should adopt different strategies for new products which resemble products in Cluster 2 and Cluster 3, respectively.For example, for long-lifecycle products, Philips should adopt a flexible marketing strategy which allows for constant changes in marketing campaigns based on the latest consumer feedback.For short-lifecycle products, Philips should focus on increasing decision speed by putting a large amount of preparation in advance for different future scenarios.The average lifecycle of products in the Chinese market, indicated by M6, is 12 months longer than that in the US market, indicated by M5.It suggests that in the US market, new products should be launched more frequently.This result is in line with our observation in the dataset, i.e., more products in the US market have completed their lifecycles than in the Chinese market during the 8-year timespan (see Table 1).
The lifecycle curves of each training set, associated with each estimated Bass model, are depicted in Fig. 3.For all seven models, the lifecycle curves have a bell shape with differences in the timing and the duration of the sales peak.In the lifecycle of the products in the Chinese market, i.e., the one associated with M5, the biggest sales volume occur around the mid-point of its lifecycle, whereas, the sales peaks of the other sets of the product occur earlier.Among all sales peaks, one of the products in Cluster 3 in the Chinese market, i.e., the one associated with M3, lasts the shortest amount of time.
We apply the estimated Bass models to the products which have not yet completed their lifecycles (i.e., products in the testing sets) to test the fit of each model to an independent dataset.The results for the US and Chinese markets are shown in Tables 5 and 6, respectively.For each testing product, we compare the fit of three possible Bass models: one estimated based on the training set in the same cluster as the testing product, one estimated based on the training set in the same market as the testing product, and the other one estimated based on the testing set in the global market.This way, we investigate the value of clustering and the value of aggregation in product lifecycle forecast.We should emphasize that the training set in Cluster 3 of the Chinese market only contains one product, thus, M3 is estimated based on one product and it is different from all other models, which consider at least three products.
In terms of the level of aggregation, M6 is not equivalent to M7 or M5 as M6 is estimated based on three products, whereas M5 is estimated based on 7 products and M7 is estimated based on 10 products.Based on the average value of R 2 associated with each model, we find that the models based on aggregated products provide a better fit than the model based on an individual product.For the testing products in Cluster 3 of the Chinese market, the model estimated based on the training set in the entire Chinese market, i.e., M6, fits better than the model estimated based on one product, i.e., M3, and the model estimated based on the training set in the global market, M7, also fits better than M3 (see Table 6).The aggregated models with sufficient product sales data generally perform well.For example, for the testing products in Cluster 1 of the US market, the three models, i.e., M1 estimated based on the training set in the same cluster, M5 estimated based on the training set in the entire US market, and M7 estimated based on the training set in the global market, all perform similarly well, i.e., R 2 > 0.8 for all three (see Table 5).When aggregation is done using sufficient product sales data, the model based on products that are most related to the testing product could provide a better fit.For example, for the testing products in Cluster 2 of the US market, M2 estimated based on the training set in the same cluster shows a much better fit than M5 and M7.When there are different levels of aggregation, it is difficult to determine whether the more aggregated model is better or the model which is more related to the testing product is better.For example, for the testing products in Cluster 3, M6 which uses the training set in the same market, i.e., more related to the testing products, performs better than M7, which is estimated based on a larger set of products.However, this result does not hold for the testing products in Cluster 4.
In addition to out-of-sample tests of forecasting accuracy, we test the performance of the Bass models by comparing the model forecasts as shown in Table 4 with the expert estimates.We survey 13 product managers at Philips for their estimates of the total lifecycle length for each testing product.At Philips, product managers have first-hand knowledge about product technology and sales potential, and they are directly involved in deciding when to launch a product and when to discontinue an old one.The differences between the model forecasts and the average expert estimates for the US and Chinese markets are listed in Tables 7 and 8, respectively.Except for four products, Product 3105 and 3101 in the US market, and Product 3104 and 3105 in the Chinese market, the model forecasts are close to the expert estimates with MAPE ranging from 11.36% to 35.94%, which is considered as a reasonably good accuracy for long-range forecast (Chien et al., 2010).The product managers expect these four products to have an exceptionally long lifecycle that is longer than 108 months.However, the forecasts by the seven Bass model are all shorter than 90 months, as is the case for the majority of the products in the dataset.
Similar to the previous comparisons in Tables 5 and 6, we compare the average MAPE measurements of the three possible Bass models for each cluster.Our previous findings based on the fit of different Bass models to the testing products still apply.First, the forecasts by the aggregated models are closer to the expert estimates than the forecasts by the model based on an individual product.In the Chinese market, the forecasts by the model based on an individual product, i.e., M3, are not as close to the expert estimates as the other two models.Second, among all aggregated models with sufficient product sales data, the model based on products more related to the testing product provides closer forecasts to the expert estimates.For the testing products in each cluster of the US market, the forecasts by the model based on the other products in the same cluster are closer to the expert estimates than the other models, i.e., for cluster 1 or 2, MAPE of M1 or M2 is smaller than the respective MAPE of M5 and M7.When aggregated models only consider a small number of products, i.e., the size of the product group used to train the Bass model coefficients is small, the performance of the more aggregated model is not consistently better than that of the more related model.For example, MAPE of M6 is smaller than that of M7 for the testing products in Cluster 3, but for the testing products in Cluster 4, the opposite result can be found.

Implications and conclusions
We develop a two-step product lifecycle forecast approach for consumer technology products with limited sales data.The first step is to segment products based on market and clustering.The second step is to apply the diffusion model to aggregated products and use the resulted forecast for related new products.We apply our method to a dataset collected from Philips Netherlands, which contains consumer healthcare products sold in the US and China over an 8-year timespan.First, we group products based on the market and apply k-means clustering to each group to further divide products based on heterogeneity in sales patterns.Second, we adopt the Bass model to forecast the diffusion pattern of each product.To assess the forecast accuracy of the Bass

Table 5
The fitness of the estimated Bass model to the testing sets in the US market.model for new products, we train the model coefficients using the average monthly sales of all products in a market or cluster and then apply the estimated Bass model to an independent dataset.Our results show that for forecasting the lifecycle of a new product, models based on aggregated products generally perform better than models based on an individual product.It highlights the value of data aggregation in product lifecycle forecasts.When aggregation is done using sufficient product sales data, the aggregated model based on products with which the new product has the most sales pattern similarities could provide a more accurate forecast than other aggregated models.Our findings are not limited to consumer technology products.In general, an early lifecycle forecast is valuable for any type of products and the challenge often lies in gathering sufficient sales data at the early lifecycle stages.Our approach does not require firms to collect data of external factors, instead, it uses clustering and aggregates the existing sales data in a cluster to improve the forecast accuracy.Based on our results, we formulate the following two practical steps for obtaining an accurate product lifecycle forecast at the early lifecycle stages.
Step 1.In order to forecast product lifecycles, firms should record periodic sales and market information for each product during its lifecycle.Firms can group products based on the market and apply clustering to products in each market to further divide products based on heterogeneity in their sales patterns.Since the accuracy of long-range forecast and forecast on sales pattern mainly depends on the amount of historical data, the focus in product segmentation should be on gathering sufficient product sales data for each group then performing a precise clustering.
Step 2. To forecast the (remaining) lifecycle of a new product or a product with limited sales data, firms should identify the group of products with which the new product has the most sales pattern similarities or the group in the same market as the new product.Then, firms should apply the Bass model to this group using average sales of all products in the group.When there are significant differences in demand due to product nature, sales of each product should be normalized by dividing the periodic sales by the cumulative sales.The estimated diffusion pattern for the group of products can be directly used for the new product.
For products with sufficient sales data at the early stages, the forecast on the remaining lifecycle can be achieved by directly applying the Bass model to each individual product.Future research regarding product lifecycle forecast for consumer technology products should incorporate consumer data when applying clustering to segment products since consumer preference and behavior are important indicators for future sales patterns.Future research should also investigate the value of different clustering methods in product lifecycle forecast and the performance of different forecasting methods.

Declaration of competing interest
None.

Table 4
Bass model coefficients for each training set.
Fig. 3. Product lifecycle curves for each training set.X. Li et al.

Table 6
The fitness of the Bass model for the testing sets in the Chinese market.

Table 7
Comparing model forecast with expert estimate for products in the US market.

Table 8
Comparing model forecast with expert estimate for products in the Chinese market.