Full Length ArticleVariable selection in international diffusion models
Introduction
Since the 80s (Heeler & Hustad, 1980), international diffusion of new products has strongly established itself as a research stream within the international marketing literature. International diffusion1 studies predominantly seek to explain variation in new product growth patterns across countries using country characteristics, such as economics, culture or demographics (for recent contributions, see Chandrasekaran and Tellis, 2008, Talukdar et al., 2002, Stremersch and Lemmens, 2009, Stremersch and Tellis, 2004, Tellis et al., 2003, Van den Bulte and Stremersch, 2004, van Everdingen et al., 2009).
An important difference among these studies – beyond the difference in the products or countries included – is the set of country-level covariates included in the model. Model specification in terms of covariates in international diffusion models is particularly challenging. There is no consensus in the literature about which country characteristics should or should not be included in an international diffusion model. Marketing scholars justify their choice for a certain set of explanatory variables by theoretical reasoning. Especially in international diffusion, the theory is very rich and thus the number of variables that one could consider including is very large. At the same time, the data is often sparse, in terms of periodicity, and number of countries and products. Standard statistical estimation techniques often have difficulties to fit such large models on such sparse data. Therefore, scholars may drop one or more of the available variables through subjective choice and iterative testing of smaller models, at the risk of omission.
Scholars who do not restrict their model ex ante, often face ill-conditioning of the design matrix – or harmful multicollinearity – as a significant problem (see Chandrasekaran and Tellis, 2008, Tellis et al., 2003). An ill-conditioned design matrix may pre-empt inference from the full model, by which people resort again to dimensionality reduction techniques, such as estimating nested models (Stremersch & Tellis, 2004), bivariate models (Chandrasekaran & Tellis, 2008), composite models (Gatignon, Eliashberg, & Robertson, 1989) or factor models (Helsen et al., 1993, Tellis et al., 2003). Nested models and bivariate models, however, also face the risk of omitted variable bias. Composite and factor models are difficult to interpret and are unable to disentangle the effects of distinct country covariates.
This paper uses Bayesian Lasso (Hans, 2009, Park and Casella, 2008) and Bayesian Elastic Net (Hans, 2011, Li and Lin, 2010) to explore which country characteristics matter most in international diffusion. These procedures can cope with sparse data (i.e., many variables and few data points) by specifying an appropriate informative prior, which leads to a specific form of Bayesian regularization (Fahrmeir, Kneib, & Konrath, 2010). By construction of the Lasso and Elastic Net priors, some of the estimated regression coefficients will be exactly zero, identifying a subset of most important variables. The procedure simultaneously executes shrinkage and variable selection, while alternative shrinkage methods (e.g. Ridge regression) do not include variable selection and alternative variable selection methods (e.g. Bayesian model averaging) do not include shrinkage. The advantage of the Lasso and Elastic Net procedures over shrinkage methods without variable selection is that it leads to more stable estimation results and to the identification of a relatively small subset of variables that exhibit the strongest effects (Tibshirani, 1996). The advantage over variable selection methods without shrinkage is that the latter methods still lack power in a sparse data setting because the shrinkage is crucial for dealing with correlated covariates, as we show in a simulation study.
We estimate a Bayesian version of the Bass diffusion model (Bass, 1969) which was introduced by Lenk and Rao (1990) and subsequently extended by Talukdar et al. (2002). Bayesian analysis is particularly well suited for international diffusion models because of the multilevel structure of the data. The model decomposes the product- and country-variance, which is important, given that the sample of countries is typically not the same for all products and the product variance is typically larger than the country variance. Also, regularization to deal with sparse data comes natural in a Bayesian setting via the use of an informative prior. Scholars in both marketing (Lenk & Orme, 2009) and statistics (Fahrmeir et al., 2010) show an increasing attention for the usefulness of Bayesian regularization by informative priors.
We have data on the penetration levels of 6 high technology products (CD players, internet, ISDN, mobile phones, personal computers, and video cameras) in a total of 55 countries around the world. These data are also used in van Everdingen et al. (2009) and were graciously made available to us by Yvonne van Everdingen. We complement these data with an extensive set of country characteristics that encompasses the country characteristics used in previous studies on new product adoption, ranging from socio-economic over cultural to demographic and geographic characteristics.
The results indicate that even though many country characteristics have been related to new product growth in the past, in our particular set of countries and products, the following small set of variables explains most of the between-country variation. A first predominant variable is economic wealth. It has a strong positive effect on all three parameters of the Bass diffusion model. A second important variable is education which positively affects both the market potential (m) and the innovation coefficient (p). Beyond economic wealth and education, income inequality has a negative effect on the market potential (m), economic openness affects the innovation coefficient (p), while mobility affects the imitation coefficient (q) in the Bass diffusion model. Future application of variable selection techniques on other samples of international diffusion data, may yield a promising path towards generalizable findings.
Section snippets
Prior literature on international diffusion
Table 1 inventories the international diffusion literature using variations of the Bass diffusion model. For every study, we list which country characteristics are studied, whether a dimensionality reduction method is used, and which country characteristics the authors found to influence diffusion. A more general overview of diffusion and new product growth models can be found in Peres, Muller, and Mahajan (2010).
Gatignon et al. (1989) construct three country-level constructs (cosmopolitanism,
Method
In this section, we first review three penalized likelihood methods, Ridge regression, the Lasso and the Elastic Net. The latter two have a variable selection property which allows exploring which variables matter most. Next, we draw the analogy with Bayesian regularization through the choice of appropriate priors on the regression coefficients. We then describe the Bass diffusion model and illustrate the properties of the three regularization methods, as compared to the standard regression
Data
We use penetration data of six consumer durables in 55 countries listed in Table 4, gathered from publicly available sources, such as Euromonitor and the International Telecommunications Union. The country characteristics were gathered from publicly available sources such as the Statistical Yearbook of the United Nations, CIA World Factbook, World Development Indicators, U.S. Census Bureau, Euromonitor online, and Hofstede (2001). Country characteristics with multiple data points over the
Variable selection: Bayesian Lasso and Elastic Net
Table 6 presents the selected variables obtained by the Lasso and the Elastic Net and the posterior mode for a sequence of 10,000 draws after 2000 burn-in draws. The prop-values are the proportion of draws on the other side of zero than the mode. Because a variable is unselected from the model when the posterior mode is equal to zero, a prop-value cannot be calculated in such cases.
For all diffusion metrics, the predominant variable is economic wealth. Both the Lasso and the Elastic Net find
Discussion
Using the Bayesian Lasso and Elastic Net estimation procedures, we have shown that international variation in new product growth in our sample of products and countries is predominantly driven by economic wealth and education. In addition, economic inequality limits a new product's market potential. The innovation coefficient is also higher the higher the level of economic openness in a country. The imitation coefficient is higher, the higher the mobility of a country's citizens.
The application
References (54)
- et al.
Staged estimation of international diffusion models: An application to global cellular telephone adoption
Technological Forecasting and Social Change
(1998) - et al.
Globalization: Modeling technology adoption timing across countries
Technological Forecasting and Social Change
(2000) - et al.
Modeling the diffusion of scientific publications
Journal of Econometrics
(2007) - et al.
Cross-national diffusion research: What do we know and how certain are we?
Journal of Product Innovation Management
(1998) - et al.
Dynamics in the international market segmentation of new product growth
International Journal of Research in Marketing
(2012) - et al.
Innovation diffusion and new product growth models: A critical review and research directions
International Journal of Research in Marketing
(2010) - et al.
Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate: II. Radical prostatectomy treated patients
Journal of Urology
(1989) - et al.
Understanding and managing international growth of new products
International Journal of Research in Marketing
(2004) - et al.
Forecasting cross-population innovation diffusion: A Bayesian approach
International Journal of Research in Marketing
(2005) Bayesian model averaging and exchange rate forecasts
Journal of Econometrics
(2008)
Logistic-normal distributions: Some properties and uses
Biometrika
A spatiotemporal analysis of the global diffusion of ISO9000 and ISO14000 certification
Management Science
A new product growth model for consumer durables
Management Science
Regression diagnostics
A Lasso for hierarchical interactions
Annals of Statistics
Global takeoff of new products: Culture, wealth, or vanishing differences?
Marketing Science
Sustained spending and persistent response: A new look at long-run marketing profitability
Journal of Marketing Research
Global diffusion of technological innovations: A coupled-hazard approach
Journal of Marketing Research
Forecast combination and model averaging using predictive measures
Econometric Reviews
A convex optimization approach to modeling consumer heterogeneity in conjoint estimation
Marketing Science
Bayesian regularisation in structured additive regression: A unifying perspective on shrinkage, smoothing and predictor selection
Statistics and Computing
Adaptive sparseness for supervised learning
IEEE Transactions on Pattern Analysis and Machine Intelligence
Modeling multinational diffusion patterns: An efficient methodology
Marketing Science
Large-scale Bayesian logistic regression for text categorization
Technometrics
Bayesian lasso regression
Biometrika
Model uncertainty and variable selection in Bayesian lasso regression
Statistics and Computing
Elastic Net regression modeling with the orthant normal prior
Journal of the American Statistical Association
Cited by (6)
An examination of the diffusion of prepaid mobile telephony in selected emerging markets and developing economies
2020, Information and ManagementCitation Excerpt :Finally, prior research on wireless telephone services found that in emerging markets, customer satisfaction is driven by perceived value (as customers are more sensitive to factors such as price and relative income stability [52]), while customer satisfaction in developed markets is driven by perceived quality. The above discussion influenced by research findings in developed countries (e.g., [53–55]) touches on how some socio-economic factors (e.g., per capita GNP and income) together with marketing factors (e.g., price and distribution) influence the diffusion of prepaid mobile phones. The current study extends this research by analyzing additional socio-economic factors and marketing indices giving rise to the following hypothesis:
International heterogeneity in the associations of new business models and broadband Internet with music revenue and piracy
2019, International Journal of Research in MarketingCitation Excerpt :Importantly, however, we expect that these effects (i.e., the effect of broadband and new digital business models on revenue and piracy) will vary predictably across countries. Previous research has highlighted the role of economic conditions (e.g., income) on the adoption of innovations (Gelper & Stremersch, 2014). Stremersch and Tellis (2004), e.g., find that differences in the growth of innovations across countries can mostly be explained by economic conditions.
Adoption of sea water air conditioning (SWAC)in the Caribbean: Individual vs regional effects
2019, Journal of Cleaner ProductionThe direct and indirect effects of economic wealth on time to take-off
2018, International Journal of Research in MarketingCitation Excerpt :Cultural dimensions (Hofstede 2001) have also been used to capture cross-cultural differences, however Tellis, Yin, and Bell (2009) argue that Hofstede's four measures of cultural dimensions are “weak predictors” of inter-country differences in the take-off of new products. Further, Gelper and Stremersch (2014) find that cultural dimensions are not key moderating factors in international diffusion. In light of these arguments, coupled with the unavailability of Hofstede's dimensions for all of our sample, we use cultural fractionalization as a control covariate.
Efficient simulation and analysis of mid-sized networks
2018, Computers and Industrial EngineeringThe Effect of Marketing Breadth and Competitive Spread on Category Growth
2022, Production and Operations Management