Abstract
We are concerned about the design, analysis, reporting and reviewing of quantitative IS studies that draw on null hypothesis significance testing (NHST). We observe that debates about misinterpretations, abuse, and issues with NHST, while having persisted for about half a century, remain largely absent in IS. We find this an untenable position for a discipline with a proud quantitative tradition. We discuss traditional and emergent threats associated with the application of NHST and examine how they manifest in recent IS scholarship. To encourage the development of new standards for NHST in hypothetico-deductive IS research, we develop a balanced account of possible actions that are implementable short-term or long-term and that incentivize or penalize specific practices. To promote an immediate push for change, we also develop two sets of guidelines that IS scholars can adopt right away.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
That is, the entire IS scholarly ecosystem of authors, reviewers, editors/publishers, and educators/supervisors.
- 2.
We will also discuss some of the problems inherent to NHST, but our clear focus is on our own fallibilities and how they could be mitigated.
- 3.
Remarkably, contrary to several fields, the experiences at the AIS Transactions on Replication Research after three years of publishing replication research indicate that a meaningful proportion of research replications have produced results that are essentially the same as the original study (Dennis et al., 2018).
- 4.
- 5.
To illustrate the magnitude of the conversation, in June 2019, The American Statistician published a special issue on null hypothesis significance testing that contains 43 articles on the topic (Wasserstein et al., 2019).
- 6.
An analogous, more detailed example using the relationship between mammograms and the likelihood of breast cancer is provided by Gigerenzer et al. (2008).
- 7.
See Lin et al. (2013) for several examples.
- 8.
To illustrate, consider this tweet from June 3, 2019: “Discussion on the #statisticalSignificance has reached ISR. “Null hypothesis significance testing in quantitative IS research: a call to reconsider our practices [submission to a second AIS Senior Scholar Basket of 8 Journal, received Major Revisions]” a new paper by @janrecker” (https://twitter.com/AgloAnivel/status/1135466967354290176)
- 9.
Our query terms were: [Management Information Systems Quarterly OR MIS Quarterly OR MISQ], [European Journal of Information Systems OR EJIS], [Information Systems Journal OR IS Journal OR ISJ], [Information Systems Research OR ISR], [Journal of the Association for Information Systems OR Journal of the AIS OR JAIS], [Journal of Information Technology OR Journal of IT OR JIT], [Journal of Management Information Systems OR Journal of MIS OR JMIS], [Journal of Strategic Information Systems OR Journal of SIS OR JSIS]. We checked for and excluded inaccurate results, such as papers from MISQ Executive, European Journal of Interdisciplinary Studies (EJIS), etc.
- 10.
We used the definitions by Creswell (2009, p. 148): random sampling means each unit in the population has an equal probability of being selected, systematic sampling means that specific characteristics are used to stratify the sample such that the true proportion of units in the studied population is reflected, and convenience sampling means that a nonprobability sample of available or accessible units is used.
References
Amrhein, V., Greenland, S., & McShane, B. (2019). Scientists rise up against statistical significance. Nature, 567, 305–307.
Bagozzi, R. P. (2011). Measurement and meaning in information systems and organizational research: Methodological and philosophical foundations. MIS Quarterly, 35(2), 261–292.
Baker, M. (2016). Statisticians issue warning over misuse of p values. Nature, 531(7593), 151–151.
Baroudi, J. J., & Orlikowski, W. J. (1989). The problem of statistical power in MIS research. MIS Quarterly, 13(1), 87–106.
Bedeian, A. G., Taylor, S. G., & Miller, A. N. (2010). Management science on the credibility bubble: Cardinal sins and various misdemeanors. Academy of Management Learning & Education, 9(4), 715–725.
Begg, C., Cho, M., Eastwood, S., Horton, R., Moher, D., Olkin, I., et al. (1996). Improving the quality of reporting of randomized controlled trials: The consort statement. Journal of the American Medical Association, 276(8), 637–639.
Berente, N., Seidel, S., & Safadi, H. (2019). Data-driven computationally-intensive theory development. Information Systems Research, 30(1), 50–64.
Bettis, R. A. (2012). The search for asterisks: Compromised statistical tests and flawed theories. Strategic Management Journal, 33(1), 108–113.
Bettis, R. A., Ethiraj, S., Gambardella, A., Helfat, C., & Mitchell, W. (2016). Creating repeatable cumulative knowledge in strategic management. Strategic Management Journal, 37(2), 257–261.
Branch, M. (2014). Malignant side effects of null-hypothesis significance testing. Theory & Psychology, 24(2), 256–277.
Bruns, S. B., & Ioannidis, J. P. A. (2016). P-curve and p-hacking in observational research. PLoS One, 11(2), e0149144.
Burmeister, O. K. (2016). A post publication review of “A review and comparative analysis of security risks and safety measures of mobile health apps.”. Australasian Journal of Information Systems, 20, 1–4.
Burtch, G., Ghose, A., & Wattal, S. (2013). An empirical examination of the antecedents and consequences of contribution patterns in crowd-funded markets. Information Systems Research, 24(3), 499–519.
Burton-Jones, A., & Lee, A. S. (2017). Thinking about measures and measurement in positivist research: A proposal for refocusing on fundamentals. Information Systems Research, 28(3), 451–467.
Burton-Jones, A., Recker, J., Indulska, M., Green, P., & Weber, R. (2017). Assessing representation theory with a framework for pursuing success and failure. MIS Quarterly, 41(4), 1307–1333.
Button, K. S., Bal, L., Clark, A., & Shipley, T. (2016). Preventing the ends from justifying the means: Withholding results to address publication bias in peer-review. BMC Psychology, 4, 59.
Chen, H., Chiang, R., & Storey, V. C. (2012). Business intelligence and analytics: From big data to big impacts. MIS Quarterly, 36(4), 1165–1188.
Christensen, R. (2005). Testing Fisher, Neyman, Pearson, and Bayes. The American Statistician, 59(2), 121–126.
Cohen, J. (1994). The earth is round (p <0.05). American Psychologist, 49(12), 997–1003.
Creswell, J. W. (2009). Research design: Qualitative, quantitative, and mixed methods approaches (3rd ed.). SAGE.
David, P. A. (2004). Understanding the emergence of “open science” institutions: Functionalist economics in historical context. Industrial and Corporate Change, 13(4), 571–589.
Dennis, A. R., Brown, S. A., Wells, T., & Rai, A. (2018). Information systems replication project. https://aisel.aisnet.org/trr/aimsandscope.html.
Dennis, A. R., & Valacich, J. S. (2015). A replication manifesto. AIS Transactions on Replication Research, 1(1), 1–4.
Dennis, A. R., Valacich, J. S., Fuller, M. A., & Schneider, C. (2006). Research standards for promotion and tenure in information systems. MIS Quarterly, 30(1), 1–12.
Dewan, S., & Ramaprasad, J. (2014). Social media, traditional media, and music sales. MIS Quarterly, 38(1), 101–121.
Dixon, P. (2003). The p-value fallacy and how to avoid it. Canadian Journal of Experimental Psychology/Revue canadienne de psychologie expérimentale, 57(3), 189–202.
Edwards, J. R., & Berry, J. W. (2010). The presence of something or the absence of nothing: Increasing theoretical precision in management research. Organizational Research Methods, 13(4), 668–689.
Emerson, G. B., Warme, W. J., Wolf, F. M., Heckman, J. D., Brand, R. A., & Leopold, S. S. (2010). Testing for the presence of positive-outcome bias in peer review: A randomized controlled trial. Archives of Internal Medicine, 170(21), 1934–1939.
Falk, R., & Greenbaum, C. W. (1995). Significance tests die hard: The amazing persistence of a probabilistic misconception. Theory & Psychology, 5(1), 75–98.
Faul, F., Erdfelder, E., Lang, A.-G., & Axel, B. (2007). G*power 3: A flexible statistical power analysis for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175–191.
Field, A. (2013). Discovering statistics using IBM SPSS statistics. SAGE.
Fisher, R. A. (1935a). The design of experiments. Oliver & Boyd.
Fisher, R. A. (1935b). The logic of inductive inference. Journal of the Royal Statistical Society, 98(1), 39–82.
Fisher, R. A. (1955). Statistical methods and scientific induction. Journal of the Royal Statistical Society. Series B (Methodological), 17(1), 69–78.
Freelon, D. (2014). On the interpretation of digital trace data in communication and social computing research. Journal of Broadcasting & Electronic Media, 58(1), 59–75.
Gefen, D., Rigdon, E. E., & Straub, D. W. (2011). An update and extension to SEM guidelines for administrative and social science research. MIS Quarterly, 35(2), iii–xiv.
Gelman, A. (2013). P values and statistical practice. Epidemiology, 24(1), 69–72.
Gelman, A. (2015). Statistics and research integrity. European Science Editing, 41, 13–14.
Gelman, A., & Stern, H. (2006). The difference between “significant” and “not significant” is not itself statistically significant. The American Statistician, 60(4), 328–331.
George, G., Haas, M. R., & Pentland, A. (2014). From the editors: Big data and management. Academy of Management Journal, 57(2), 321–326.
Gerow, J. E., Grover, V., Roberts, N., & Thatcher, J. B. (2010). The diffusion of second-generation statistical techniques in information systems research from 1990-2008. Journal of Information Technology Theory and Application, 11(4), 5–28.
Gigerenzer, G. (2004). Mindless statistics. Journal of Socio-Economics, 33(5), 587–606.
Gigerenzer, G., Gaissmeyer, W., Kurz-Milcke, E., Schwartz, L. M., & Woloshin, S. (2008). Helping doctors and patients make sense of health statistics. Psychological Science in the Public Interest, 8(2), 53–96.
Godfrey-Smith, P. (2003). Theory and reality: An introduction to the philosophy of science. University of Chicago Press.
Goldfarb, B., & King, A. A. (2016). Scientific apophenia in strategic management research: Significance tests & mistaken inference. Strategic Management Journal, 37(1), 167–176.
Goodhue, D. L., Lewis, W., & Thompson, R. L. (2007). Statistical power in analyzing interaction effects: Questioning the advantage of PLS with product indicators. Information Systems Research, 18(2), 211–227.
Gray, P. H., & Cooper, W. H. (2010). Pursuing failure. Organizational Research Methods, 13(4), 620–643.
Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, p values, confidence intervals, and power: A guide to misinterpretations. European Journal of Epidemiology, 31(4), 337–350.
Gregor, S. (2006). The nature of theory in information systems. MIS Quarterly, 30(3), 611–642.
Gregor, S., & Klein, G. (2014). Eight obstacles to overcome in the theory testing genre. Journal of the Association for Information Systems, 15(11), i–xix.
Greve, W., Bröder, A., & Erdfelder, E. (2013). Result-blind peer reviews and editorial decisions: A missing pillar of scientific culture. European Psychologist, 18(4), 286–294.
Grover, V., & Lyytinen, K. (2015). New state of play in information systems research: The push to the edges. MIS Quarterly, 39(2), 271–296.
Grover, V., Straub, D. W., & Galluch, P. (2009). Editor’s comments: Turning the corner: The influence of positive thinking on the information systems field. MIS Quarterly, 33(1), iii-viii.
Guide, V. D. R., Jr., & Ketokivi, M. (2015). Notes from the editors: Redefining some methodological criteria for the journal. Journal of Operations Management, 37, v-viii.
Hair, J. F., Sarstedt, M., Ringle, C. M., & Mena, J. A. (2012). An assessment of the use of partial least squares structural equation modeling in marketing research. Journal of the Academy of Marketing Science, 40(3), 414–433.
Haller, H., & Kraus, S. (2002). Misinterpretations of significance: A problem students share with their teachers? Methods of Psychological Research, 7(1), 1–20.
Harrison, J. S., Banks, G. C., Pollack, J. M., O’Boyle, E. H., & Short, J. (2014). Publication bias in strategic management research. Journal of Management, 43(2), 400–425.
Harzing, A.-W. (2010). The publish or perish book: Your guide to effective and responsible citation analysis. Tarma Software Research.
Howison, J., Wiggins, A., & Crowston, K. (2011). Validity issues in the use of social network analysis with digital trace data. Journal of the Association for Information Systems, 12(12), 767–797.
Hubbard, R. (2004). Alphabet soup. Blurring the distinctions between p’s and a’s in psychological research. Theory & Psychology, 14(3), 295–327.
Ioannidis, J. P. A., Fanelli, D., Drunne, D. D., & Goodman, S. N. (2015). Meta-research: Evaluation and improvement of research methods and practices. PLoS Biology, 13(10), e1002264.
Johnson, V. E., Payne, R. D., Wang, T., Asher, A., & Mandal, S. (2017). On the reproducibility of psychological science. Journal of the American Statistical Association, 112(517), 1–10.
Kaplan, A. (1998/1964). The conduct of inquiry: Methodology for behavioral science. Transaction Publishers.
Kerr, N. L. (1998). Harking: Hypothesizing after the results are known. Personality and Social Psychology Review, 2(3), 196–217.
Lang, J. M., Rothman, K. J., & Cann, C. I. (1998). That confounded p-value. Epidemiology, 9(1), 7–8.
Lazer, D., Pentland, A. P., Adamic, L. A., Aral, S., Barabási, A.-L., Brewer, D., et al. (2009). Computational social science. Science, 323(5915), 721–723.
Leahey, E. (2005). Alphas and asterisks: The development of statistical significance testing standards in sociology. Social Forces, 84(1), 1–24.
Lee, A. S., & Baskerville, R. (2003). Generalizing generalizability in information systems research. Information Systems Research, 14(3), 221–243.
Lee, A. S., & Hubona, G. S. (2009). A scientific basis for rigor in information systems research. MIS Quarterly, 33(2), 237–262.
Lee, A. S., Mohajeri, K., & Hubona, G. S. (2017). Three roles for statistical significance and the validity frontier in theory testing. Paper presented at the 50th Hawaii international conference on system sciences.
Lehmann, E. L. (1993). The Fisher, Neyman-Pearson theories of testing hypotheses: One theory or two? Journal of the American Statistical Association, 88(424), 1242–1249.
Lenzer, J., Hoffman, J. R., Furberg, C. D., & Ioannidis, J. P. A. (2013). Ensuring the integrity of clinical practice guidelines: A tool for protecting patients. British Medical Journal, 347, f5535.
Levy, M., & Germonprez, M. (2017). The potential for citizen science in information systems research. Communications of the Association for Information Systems, 40(2), 22–39.
Lin, M., Lucas, H. C., Jr., & Shmueli, G. (2013). Too big to fail: Large samples and the p-value problem. Information Systems Research, 24(4), 906–917.
Locascio, J. J. (2019). The impact of results blind science publishing on statistical consultation and collaboration. The American Statistician, 73(supp1), 346–351.
Lu, X., Ba, S., Huang, L., & Feng, Y. (2013). Promotional marketing or word-of-mouth? Evidence from online restaurant reviews. Information Systems Research, 24(3), 596–612.
Lukyanenko, R., Parsons, J., Wiersma, Y. F., & Maddah, M. (2019). Expecting the unexpected: Effects of data collection design choices on the quality of crowdsourced user-generated content. MIS Quarterly, 43(2), 623–647.
Lyytinen, K., Baskerville, R., Iivari, J., & Te‘Eni, D. (2007). Why the old world cannot publish? Overcoming challenges in publishing high-impact is research. European Journal of Information Systems, 16(4), 317–326.
MacKenzie, S. B., Podsakoff, P. M., & Podsakoff, N. P. (2011). Construct measurement and validation procedures in mis and behavioral research: Integrating new and existing techniques. MIS Quarterly, 35(2), 293–334.
Madden, L. V., Shah, D. A., & Esker, P. D. (2015). Does the p value have a future in plant pathology? Phytopathology, 105(11), 1400–1407.
Matthews, R. A. J. (2019). Moving towards the post p < 0.05 era via the analysis of credibility. The American Statistician, 73(Sup 1), 202–212.
McNutt, M. (2016). Taking up top. Science, 352(6290), 1147.
McShane, B. B., & Gal, D. (2017). Blinding us to the obvious? The effect of statistical training on the evaluation of evidence. Management Science, 62(6), 1707–1718.
Meehl, P. E. (1967). Theory-testing in psychology and physics: A methodological paradox. Philosophy of Science, 34(2), 103–115.
Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46, 806–834.
Mertens, W., Pugliese, A., & Recker, J. (2017). Quantitative data analysis: A companion for accounting and information systems research. Springer.
Miller, J. (2009). What is the probability of replicating a statistically significant effect? Psychonomic Bulletin & Review, 16(4), 617–640.
Mithas, S., Tafti, A., & Mitchell, W. (2013). How a firm's competitive environment and digital strategic posture influence digital business strategy. MIS Quarterly, 37(2), 511.
Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G., & The PRISMA Group. (2009). Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. PLoS Medicine, 6(7), e1000100.
Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., du Sert, N. P., et al. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1(0021), 1–9.
Nakagawa, S., & Cuthill, I. C. (2007). Effect size, confidence interval and statistical significance: A practical guide for biologists. Biological Reviews, 82(4), 591–605.
NCBI Insights. (2018). Pubmed commons to be discontinued. https://ncbiinsights.ncbi.nlm. nih.gov/2018/02/01/pubmed-commons-to-be-discontinued/.
Nelson, L. D., Simmons, J. P., & Simonsohn, U. (2018). Psychology’s renaissance. Annual Review of Psychology, 69, 511–534.
Neyman, J., & Pearson, E. S. (1928). On the use and interpretation of certain test criteria for purposes of statistical inference: Part I. Biometrika, 20A(1/2), 175–240.
Neyman, J., & Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 231, 289–337.
Nickerson, R. S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5(2), 241–301.
Nielsen, M. (2011). Reinventing discovery: The new era of networked science. Princeton University Press.
Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., et al. (2015). Promoting an open research culture. Science, 348(6242), 1422–1425.
Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The preregistration revolution. Proceedings of the National Academy of Sciences, 115(11), 2600–2606.
Nuzzo, R. (2014). Statistical errors: P values, the “gold standard” of statistical validity, are not as reliable as many scientists assume. Nature, 506(150), 150–152.
O’Boyle, E. H., Banks, G. C., & Gonzalez-Mulé, E. (2017). The chrysalis effect: How ugly initial results metamorphosize into beautiful articles. Journal of Management, 43(2), 376–399.
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), 943.
Pernet, C. (2016). Null hypothesis significance testing: A guide to commonly misunderstood concepts and recommendations for good practice [version 5; peer review: 2 approved, 2 not approved]. F1000Research, 4(621). https://doi.org/10.12688/f1000research.6963.5.
publons. (2017). 5 steps to writing a winning post-publication peer review. https://publons.com/blog/5-steps-to-writing-a-winning-post-publication-peer-review/.
Reinhart, A. (2015). Statistics done wrong: The woefully complete guide. No Starch Press.
Ringle, C. M., Sarstedt, M., & Straub, D. W. (2012). Editor’s comments: A critical look at the use of PLS-SEM in MIS quarterly. MIS Quarterly, 36(1), iii–xiv.
Rishika, R., Kumar, A., Janakiraman, R., & Bezawada, R. (2013). The effect of customers’ social media participation on customer visit frequency and profitability: An empirical investigation. Information Systems Research, 24(1), 108–127.
Rönkkö, M., & Evermann, J. (2013). A critical examination of common beliefs about partial least squares path modeling. Organizational Research Methods, 16(3), 425–448.
Rönkkö, M., McIntosh, C. N., Antonakis, J., & Edwards, J. R. (2016). Partial least squares path modeling: Time for some serious second thoughts. Journal of Operations Management, 47-48, 9–27.
Saunders, C. (2005). Editor’s comments: Looking for diamond cutters. MIS Quarterly, 29(1), iii–viii.
Saunders, C., Brown, S. A., Bygstad, B., Dennis, A. R., Ferran, C., Galletta, D. F., et al. (2017). Goals, values, and expectations of the ais family of journals. Journal of the Association for Information Systems, 18(9), 633–647.
Schönbrodt, F. D. (2018). P-checker: One-for-all p-value analyzer. http://shinyapps.org/apps/p-checker/.
Schwab, A., Abrahamson, E., Starbuck, W. H., & Fidler, F. (2011). Perspective: Researchers should make thoughtful assessments instead of null-hypothesis significance tests. Organization Science, 22(4), 1105–1120.
Shaw, J. D., & Ertug, G. (2017). From the editors: The suitability of simulations and meta-analyses for submissions to academy of management journal. Academy of Management Journal, 60(6), 2045–2049.
Siegfried, T. (2014). To make science better, watch out for statistical flaws. ScienceNews Context Blog, 2019, February 7, 2014. https://www.sciencenews.org/blog/context/make-science-better-watch-out-statistical-flaws.
Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014). P-curve: A key to the file-drawer. Journal of Experimental Psychology: General, 143(2), 534–547.
Sivo, S. A., Saunders, C., Chang, Q., & Jiang, J. J. (2006). How low should you go? Low response rates and the validity of inference in is questionnaire research. Journal of the Association for Information Systems, 7(6), 351–414.
Smith, S. M., Fahey, T., & Smucny, J. (2014). Antibiotics for acute bronchitis. Journal of the American Medical Association, 312(24), 2678–2679.
Starbuck, W. H. (2013). Why and where do academics publish? Management, 16(5), 707–718.
Starbuck, W. H. (2016). 60th anniversary essay: How journals could improve research practices in social science. Administrative Science Quarterly, 61(2), 165–183.
Straub, D. W. (1989). Validating instruments in MIS research. MIS Quarterly, 13(2), 147–169.
Straub, D. W. (2008). Editor’s comments: Type II reviewing errors and the search for exciting papers. MIS Quarterly, 32(2), v–x.
Straub, D. W., Boudreau, M.-C., & Gefen, D. (2004). Validation guidelines for is positivist research. Communications of the Association for Information Systems, 13(24), 380–427.
Szucs, D., & Ioannidis, J. P. A. (2017). When null hypothesis significance testing is unsuitable for research: A reassessment. Frontiers in Human Neuroscience, 11(390), 1–21.
Tams, S., & Straub, D. W. (2010). The effect of an IS article’s structure on its impact. Communications of the Association for Information Systems, 27(10), 149–172.
The Economist. (2013). Trouble at the lab. The Economist. http://www.economist.com/news/briefing/21588057-scientists-think-science-self-correcting-alarming-degree-it-not-trouble.
Trafimow, D., & Marks, M. (2015). Editorial. Basic and Applied Social Psychology, 37(1), 1–2.
Tryon, W. W., Patelis, T., Chajewski, M., & Lewis, C. (2017). Theory construction and data analysis. Theory & Psychology, 27(1), 126–134.
Tsang, E. W. K., & Williams, J. N. (2012). Generalization and induction: Misconceptions, clarifications, and a classification of induction. MIS Quarterly, 36(3), 729–748.
Twa, M. D. (2016). Transparency in biomedical research: An argument against tests of statistical significance. Optometry & Vision Science, 93(5), 457–458.
Venkatesh, V., Brown, S. A., & Bala, H. (2013). Bridging the qualitative-quantitative divide: Guidelines for conducting mixed methods research in information systems. MIS Quarterly, 37(1), 21–54.
Vodanovich, S., Sundaram, D., & Myers, M. D. (2010). Research commentary: Digital natives and ubiquitous information systems. Information Systems Research, 21(4), 711–723.
Walsh, E., Rooney, M., Appleby, L., & Wilkinson, G. (2000). Open peer review: A randomised controlled trial. The British Journal of Psychiatry, 176(1), 47–51.
Warren, M. (2018). First analysis of “preregistered” studies shows sharp rise in null findings. Nature News, October 24, 2018, https://www.nature.com/articles/d41586-018-07118.
Wasserstein, R. L., & Lazar, N. A. (2016). The ASA’s statement on p-values: Context, process, and purpose. The American Statistician, 70(2), 129–133.
Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a world beyond “p < 0.05.”. The American Statistician, 73(Sup 1), 1–19.
Xu, H., Zhang, N., & Zhou, L. (2019). Validity concerns in research using organic data. Journal of Management, 46, 1257. https://doi.org/10.1177/0149206319862027
Yong, E. (2012). Nobel laureate challenges psychologists to clean up their act. Nature News, October 3, 2012. https://www.nature.com/news/nobel-laureate-challenges-psychologists-to-clean-up-their-act-1.11535.
Yoo, Y. (2010). Computing in everyday life: A call for research on experiential computing. MIS Quarterly, 34(2), 213–231.
Zeng, X., & Wei, L. (2013). Social ties and user content generation: Evidence from flickr. Information Systems Research, 24(1), 71–87.
Acknowledgments
We are indebted to the senior editor at JAIS, Allen Lee, and two anonymous reviewers for constructive and developmental feedback that helped us improve the original chapter. We thank participants at seminars at Queensland University of Technology and University of Cologne for providing feedback on our work. We also thank Christian Hovestadt for his help in coding papers. All faults remain ours.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix A: Literature Review Procedures
Identification of Papers
In our intention to demonstrate “open science” practices (Locascio, 2019; Nosek et al., 2018; Warren, 2018) we preregistered our research procedures using the Open Science Framework “Registries” (doi:10.17605/OSF.IO/2GKCS).
We proceeded as follows: We identified the 100 top-cited papers (per year) between 2013 and 2016 in the AIS Senior Scholars’ basket of 8 IS journals using Harzing’s Publish or Perish version 6 (Harzing, 2010). We ran the queries separately on February 7, 2017, and then aggregated the results to identify the 100 most cited papers (based on citations per year) across the basket of eight journals.Footnote 9 The raw data (together with the coded data) is available at an open data repository hosted by Queensland University of Technology (doi:10.25912/5cede0024b1e1).
We identified from this set of papers those that followed the hypothetico-deductive model. First, we excluded 48 papers that did not involve empirical data: 31 papers that offered purely theoretical contributions, 11 that were commentaries in the form of forewords, introductions to special issues or editorials, 5 methodological essays, and 1 design science paper. Second, we identified from these 52 papers those that reported on collection and analysis of quantitative data. We found 46 such papers; of these, 39 were traditional quantitative research articles, 3 were essays on methodological aspects of quantitative research, 2 studies employed mixed-method designs involving quantitative empirical data, and 2 design science papers that involved quantitative data. Third, we eliminated from this set the three methodological essays as the focus of these papers was not on developing and testing new theory to explain and predict IS phenomena. This resulted in a final sample of 43 papers, including 2 design science and 2 mixed-method studies.
Coding of Papers
We developed a coding scheme in an excel repository to code the studies. The repository is available in our Open Science Framework (OSF) registry. We used the following criteria. Where applicable, we refer to literature that defined the variables we used during coding.
-
What is the main method of data collection and analysis (e.g., experiment, meta-analysis, panel, social network analysis, survey, text mining, economic modeling, multiple)?
-
Are testable hypotheses or propositions proposed (yes/in graphical form only/no)?
-
How precisely are the hypotheses formulated (using the classification of Edwards & Berry, 2010)?
-
Is null hypothesis significance testing used (yes/no)?
-
Are exact p-values reported (yes/all/some/not at all)?
-
Are effect sizes reported and, if so, which ones primarily (e.g., R2, standardized means difference scores, f2, partial eta2)?
-
Are results declared as “statistically significant” (yes/sometimes/not at all)?
-
How many hypotheses are reported as supported (%)?
-
Are p-values used to argue the absence of an effect (yes/no)?
-
Are confidence intervals for test statistics reported (yes/selectively/no)?
-
What sampling method is used (i.e., convenient/random/systematic sampling, entire population)?Footnote 10
-
Is statistical power discussed and if so, where and how (e.g., sample size estimation, ex-post power analysis)?
-
Are competing theories tested explicitly (Gray & Cooper, 2010)?
-
Are corrections made to adjust for multiple hypothesis testing, where applicable (e.g., Bonferroni, alpha-inflation, variance inflation)?
-
Are post hoc analyses reported for unexpected results?
We also extracted quotes that in our interpretation illuminated the view taken on NHST in the chapter. This was important for us to demonstrate the imbuement of practices in our research routines and the language used in using key NHST phrases such as “statistical significance” or “p-value” (Gelman & Stern, 2006).
To be as unbiased as possible, we hired a research assistant to perform the coding of papers. Before he commenced coding, we explained the coding scheme to him during several meetings. We then conducted a pilot test to evaluate the quality of his coding: the research assistant coded five random papers from the set of papers and we met to review the coding by comparing our different individual understandings of the papers. Where inconsistencies arose, we clarified the coding scheme with him until we were confident that he understood it thoroughly. During the coding, the research assistant highlighted particular problematic or ambiguous coding elements and we met and resolved these ambiguities to arrive at a shared agreement. The coding process took three months to complete. The results of our coding are openly accessible at doi:10.25912/5cede0024b1e1. Appendix B provides some summary statistics about our sample.
Appendix B
Selected Descriptive Statistics from 43 Frequently Cited IS Papers from 2013 to 2016
Main method for data collection and analysis | Experiment | 5 |
Meta-analysis | 2 | |
Panel | 5 | |
Social network analysis | 4 | |
Survey | 15 | |
Text mining | 5 | |
Economic modeling | 1 | |
Multiple | 6 | |
Empirical data | Newly collected or analyzed primary data | 40 |
Re-analyzed or secondary data | 3 | |
Hypotheses | Testable hypotheses or propositions proposed | 38 |
No testable hypotheses or propositions proposed | 5 | |
Average percentage of hypotheses per study that were supported by the data | 82% | |
Statement of hypotheses | As relations | 0 |
As upper/lower limits | 0 | |
As directions | 13 | |
In non-nil form | 0 | |
In functional form | 0 | |
In contingent form | 2 | |
As comparisons | 6 | |
In multiple ways | 15 | |
Not formulated | 2 | |
Not applicable | 5 | |
NHST | Uses NHST techniques or terminology | 42 |
Does not use NHST techniques or terminology | 1 | |
Exact p-values | Reports exact p-values | 3 |
Reports exact p-values selectively | 8 | |
Reports indicators for different levels of statistical significance | 28 | |
Does not report p-values | 3 | |
Inverse use of p-values | Uses p-values to point at the absence of an effect or accept the null hypothesis | 11 |
Does not use p-values to point at the absence of effect or accept the null hypothesis | 29 | |
Not applicable | 3 | |
“Statistical” significance | Does not explicitly refer to “statistical significance” | 23 |
Consistently refers to “statistical significance” | 3 | |
Selectively refers to “statistical significance” | 16 | |
Not applicable | 1 | |
Effect sizes | Reports R2 measures | 26 |
Reports mean difference score measures | 2 | |
Reports multiple effect size measures | 4 | |
Does not report effect size measures | 10 | |
Not applicable | 1 | |
Confidence intervals | Reports confidence intervals consistently | 3 |
Reports confidence intervals selectively | 2 | |
Reports confidence intervals for bootstrapping results (no p-value available) | 3 | |
Does not report confidence intervals | 34 | |
Not applicable | 1 | |
Sampling | Convenient | 22 |
Systematic | 6 | |
Random | 4 | |
Entire population | 8 | |
Not applicable | 3 | |
Competing theories | Tested explicitly | 7 |
Not tested | 35 | |
Not applicable | 1 | |
A posteriori analyses | Provided | 11 |
Not provided | 31 | |
Not applicable | 1 |
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Mertens, W., Recker, J. (2023). New Guidelines for Null Hypothesis Significance Testing in Hypothetico-Deductive IS Research. In: Willcocks, L.P., Hassan, N.R., Rivard, S. (eds) Advancing Information Systems Theories, Volume II. Technology, Work and Globalization. Palgrave Macmillan, Cham. https://doi.org/10.1007/978-3-031-38719-7_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-38719-7_13
Published:
Publisher Name: Palgrave Macmillan, Cham
Print ISBN: 978-3-031-38718-0
Online ISBN: 978-3-031-38719-7
eBook Packages: Business and ManagementBusiness and Management (R0)