Full Length Article
Variable selection in international diffusion models

https://doi.org/10.1016/j.ijresmar.2014.04.001Get rights and content

Highlights

  • International differences in new product growth are driven by socio-economic factors.

  • Culture does not explain international differences in new product growth.

  • Lasso variable selection improves statistical inference with many covariates.

Abstract

Prior research comes to different conclusions as to what country characteristics drive diffusion patterns. One prime difficulty that may partially explain this divergence between studies is the sparseness of the data, in terms of the periodicity as well as the number of products and countries, in combination with the large number of potentially influential country characteristics. In face of such sparse data, scholars have used nested models, bivariate models and factor models to explore the role of country covariates. This paper uses Bayesian Lasso and Bayesian Elastic Net variable selection procedures as powerful approaches to identify the most important drivers of differences in Bass diffusion parameters across countries. We find that socio-economic and demographic country covariates (most pronouncedly so, economic wealth and education) have the strongest effect on all diffusion metrics we study. Our findings are a call for marketing scientists to devote greater attention to country covariate selection in international diffusion models, as well as to variable selection in marketing models at large.

Introduction

Since the 80s (Heeler & Hustad, 1980), international diffusion of new products has strongly established itself as a research stream within the international marketing literature. International diffusion1 studies predominantly seek to explain variation in new product growth patterns across countries using country characteristics, such as economics, culture or demographics (for recent contributions, see Chandrasekaran and Tellis, 2008, Talukdar et al., 2002, Stremersch and Lemmens, 2009, Stremersch and Tellis, 2004, Tellis et al., 2003, Van den Bulte and Stremersch, 2004, van Everdingen et al., 2009).

An important difference among these studies – beyond the difference in the products or countries included – is the set of country-level covariates included in the model. Model specification in terms of covariates in international diffusion models is particularly challenging. There is no consensus in the literature about which country characteristics should or should not be included in an international diffusion model. Marketing scholars justify their choice for a certain set of explanatory variables by theoretical reasoning. Especially in international diffusion, the theory is very rich and thus the number of variables that one could consider including is very large. At the same time, the data is often sparse, in terms of periodicity, and number of countries and products. Standard statistical estimation techniques often have difficulties to fit such large models on such sparse data. Therefore, scholars may drop one or more of the available variables through subjective choice and iterative testing of smaller models, at the risk of omission.

Scholars who do not restrict their model ex ante, often face ill-conditioning of the design matrix – or harmful multicollinearity – as a significant problem (see Chandrasekaran and Tellis, 2008, Tellis et al., 2003). An ill-conditioned design matrix may pre-empt inference from the full model, by which people resort again to dimensionality reduction techniques, such as estimating nested models (Stremersch & Tellis, 2004), bivariate models (Chandrasekaran & Tellis, 2008), composite models (Gatignon, Eliashberg, & Robertson, 1989) or factor models (Helsen et al., 1993, Tellis et al., 2003). Nested models and bivariate models, however, also face the risk of omitted variable bias. Composite and factor models are difficult to interpret and are unable to disentangle the effects of distinct country covariates.

This paper uses Bayesian Lasso (Hans, 2009, Park and Casella, 2008) and Bayesian Elastic Net (Hans, 2011, Li and Lin, 2010) to explore which country characteristics matter most in international diffusion. These procedures can cope with sparse data (i.e., many variables and few data points) by specifying an appropriate informative prior, which leads to a specific form of Bayesian regularization (Fahrmeir, Kneib, & Konrath, 2010). By construction of the Lasso and Elastic Net priors, some of the estimated regression coefficients will be exactly zero, identifying a subset of most important variables. The procedure simultaneously executes shrinkage and variable selection, while alternative shrinkage methods (e.g. Ridge regression) do not include variable selection and alternative variable selection methods (e.g. Bayesian model averaging) do not include shrinkage. The advantage of the Lasso and Elastic Net procedures over shrinkage methods without variable selection is that it leads to more stable estimation results and to the identification of a relatively small subset of variables that exhibit the strongest effects (Tibshirani, 1996). The advantage over variable selection methods without shrinkage is that the latter methods still lack power in a sparse data setting because the shrinkage is crucial for dealing with correlated covariates, as we show in a simulation study.

We estimate a Bayesian version of the Bass diffusion model (Bass, 1969) which was introduced by Lenk and Rao (1990) and subsequently extended by Talukdar et al. (2002). Bayesian analysis is particularly well suited for international diffusion models because of the multilevel structure of the data. The model decomposes the product- and country-variance, which is important, given that the sample of countries is typically not the same for all products and the product variance is typically larger than the country variance. Also, regularization to deal with sparse data comes natural in a Bayesian setting via the use of an informative prior. Scholars in both marketing (Lenk & Orme, 2009) and statistics (Fahrmeir et al., 2010) show an increasing attention for the usefulness of Bayesian regularization by informative priors.

We have data on the penetration levels of 6 high technology products (CD players, internet, ISDN, mobile phones, personal computers, and video cameras) in a total of 55 countries around the world. These data are also used in van Everdingen et al. (2009) and were graciously made available to us by Yvonne van Everdingen. We complement these data with an extensive set of country characteristics that encompasses the country characteristics used in previous studies on new product adoption, ranging from socio-economic over cultural to demographic and geographic characteristics.

The results indicate that even though many country characteristics have been related to new product growth in the past, in our particular set of countries and products, the following small set of variables explains most of the between-country variation. A first predominant variable is economic wealth. It has a strong positive effect on all three parameters of the Bass diffusion model. A second important variable is education which positively affects both the market potential (m) and the innovation coefficient (p). Beyond economic wealth and education, income inequality has a negative effect on the market potential (m), economic openness affects the innovation coefficient (p), while mobility affects the imitation coefficient (q) in the Bass diffusion model. Future application of variable selection techniques on other samples of international diffusion data, may yield a promising path towards generalizable findings.

Section snippets

Prior literature on international diffusion

Table 1 inventories the international diffusion literature using variations of the Bass diffusion model. For every study, we list which country characteristics are studied, whether a dimensionality reduction method is used, and which country characteristics the authors found to influence diffusion. A more general overview of diffusion and new product growth models can be found in Peres, Muller, and Mahajan (2010).

Gatignon et al. (1989) construct three country-level constructs (cosmopolitanism,

Method

In this section, we first review three penalized likelihood methods, Ridge regression, the Lasso and the Elastic Net. The latter two have a variable selection property which allows exploring which variables matter most. Next, we draw the analogy with Bayesian regularization through the choice of appropriate priors on the regression coefficients. We then describe the Bass diffusion model and illustrate the properties of the three regularization methods, as compared to the standard regression

Data

We use penetration data of six consumer durables in 55 countries listed in Table 4, gathered from publicly available sources, such as Euromonitor and the International Telecommunications Union. The country characteristics were gathered from publicly available sources such as the Statistical Yearbook of the United Nations, CIA World Factbook, World Development Indicators, U.S. Census Bureau, Euromonitor online, and Hofstede (2001). Country characteristics with multiple data points over the

Variable selection: Bayesian Lasso and Elastic Net

Table 6 presents the selected variables obtained by the Lasso and the Elastic Net and the posterior mode for a sequence of 10,000 draws after 2000 burn-in draws. The prop-values are the proportion of draws on the other side of zero than the mode. Because a variable is unselected from the model when the posterior mode is equal to zero, a prop-value cannot be calculated in such cases.

For all diffusion metrics, the predominant variable is economic wealth. Both the Lasso and the Elastic Net find

Discussion

Using the Bayesian Lasso and Elastic Net estimation procedures, we have shown that international variation in new product growth in our sample of products and countries is predominantly driven by economic wealth and education. In addition, economic inequality limits a new product's market potential. The innovation coefficient is also higher the higher the level of economic openness in a country. The imitation coefficient is higher, the higher the mobility of a country's citizens.

The application

References (54)

  • J. Atchison et al.

    Logistic-normal distributions: Some properties and uses

    Biometrika

    (1980)
  • P. Albuquerque et al.

    A spatiotemporal analysis of the global diffusion of ISO9000 and ISO14000 certification

    Management Science

    (2007)
  • F. Bass

    A new product growth model for consumer durables

    Management Science

    (1969)
  • D. Belsley et al.

    Regression diagnostics

    (1980)
  • J. Bien et al.

    A Lasso for hierarchical interactions

    Annals of Statistics

    (2013)
  • D. Chandrasekaran et al.

    Global takeoff of new products: Culture, wealth, or vanishing differences?

    Marketing Science

    (2008)
  • M.G. Dekimpe et al.

    Sustained spending and persistent response: A new look at long-run marketing profitability

    Journal of Marketing Research

    (1999)
  • M.G. Dekimpe et al.

    Global diffusion of technological innovations: A coupled-hazard approach

    Journal of Marketing Research

    (2000)
  • J. Eklund et al.

    Forecast combination and model averaging using predictive measures

    Econometric Reviews

    (2007)
  • T. Evgeniou et al.

    A convex optimization approach to modeling consumer heterogeneity in conjoint estimation

    Marketing Science

    (2007)
  • L. Fahrmeir et al.

    Bayesian regularisation in structured additive regression: A unifying perspective on shrinkage, smoothing and predictor selection

    Statistics and Computing

    (2010)
  • M.A.T. Figueiredo

    Adaptive sparseness for supervised learning

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2003)
  • H. Gatignon et al.

    Modeling multinational diffusion patterns: An efficient methodology

    Marketing Science

    (1989)
  • A. Genkin et al.

    Large-scale Bayesian logistic regression for text categorization

    Technometrics

    (2007)
  • C. Hans

    Bayesian lasso regression

    Biometrika

    (2009)
  • C. Hans

    Model uncertainty and variable selection in Bayesian lasso regression

    Statistics and Computing

    (2010)
  • C. Hans

    Elastic Net regression modeling with the orthant normal prior

    Journal of the American Statistical Association

    (2011)
  • Cited by (6)

    • An examination of the diffusion of prepaid mobile telephony in selected emerging markets and developing economies

      2020, Information and Management
      Citation Excerpt :

      Finally, prior research on wireless telephone services found that in emerging markets, customer satisfaction is driven by perceived value (as customers are more sensitive to factors such as price and relative income stability [52]), while customer satisfaction in developed markets is driven by perceived quality. The above discussion influenced by research findings in developed countries (e.g., [53–55]) touches on how some socio-economic factors (e.g., per capita GNP and income) together with marketing factors (e.g., price and distribution) influence the diffusion of prepaid mobile phones. The current study extends this research by analyzing additional socio-economic factors and marketing indices giving rise to the following hypothesis:

    • International heterogeneity in the associations of new business models and broadband Internet with music revenue and piracy

      2019, International Journal of Research in Marketing
      Citation Excerpt :

      Importantly, however, we expect that these effects (i.e., the effect of broadband and new digital business models on revenue and piracy) will vary predictably across countries. Previous research has highlighted the role of economic conditions (e.g., income) on the adoption of innovations (Gelper & Stremersch, 2014). Stremersch and Tellis (2004), e.g., find that differences in the growth of innovations across countries can mostly be explained by economic conditions.

    • The direct and indirect effects of economic wealth on time to take-off

      2018, International Journal of Research in Marketing
      Citation Excerpt :

      Cultural dimensions (Hofstede 2001) have also been used to capture cross-cultural differences, however Tellis, Yin, and Bell (2009) argue that Hofstede's four measures of cultural dimensions are “weak predictors” of inter-country differences in the take-off of new products. Further, Gelper and Stremersch (2014) find that cultural dimensions are not key moderating factors in international diffusion. In light of these arguments, coupled with the unavailability of Hofstede's dimensions for all of our sample, we use cultural fractionalization as a control covariate.

    • Efficient simulation and analysis of mid-sized networks

      2018, Computers and Industrial Engineering
    View full text