Skip to main content

New Guidelines for Null Hypothesis Significance Testing in Hypothetico-Deductive IS Research

  • Chapter
  • First Online:
Advancing Information Systems Theories, Volume II

Abstract

We are concerned about the design, analysis, reporting and reviewing of quantitative IS studies that draw on null hypothesis significance testing (NHST). We observe that debates about misinterpretations, abuse, and issues with NHST, while having persisted for about half a century, remain largely absent in IS. We find this an untenable position for a discipline with a proud quantitative tradition. We discuss traditional and emergent threats associated with the application of NHST and examine how they manifest in recent IS scholarship. To encourage the development of new standards for NHST in hypothetico-deductive IS research, we develop a balanced account of possible actions that are implementable short-term or long-term and that incentivize or penalize specific practices. To promote an immediate push for change, we also develop two sets of guidelines that IS scholars can adopt right away.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    That is, the entire IS scholarly ecosystem of authors, reviewers, editors/publishers, and educators/supervisors.

  2. 2.

    We will also discuss some of the problems inherent to NHST, but our clear focus is on our own fallibilities and how they could be mitigated.

  3. 3.

    Remarkably, contrary to several fields, the experiences at the AIS Transactions on Replication Research after three years of publishing replication research indicate that a meaningful proportion of research replications have produced results that are essentially the same as the original study (Dennis et al., 2018).

  4. 4.

    This trend is evidenced, for example, in the emergent number of IS research articles on these topics in our own journals (e.g., Berente et al., 2019; Howison et al., 2011; Levy & Germonprez, 2017; Lukyanenko et al., 2019).

  5. 5.

    To illustrate the magnitude of the conversation, in June 2019, The American Statistician published a special issue on null hypothesis significance testing that contains 43 articles on the topic (Wasserstein et al., 2019).

  6. 6.

    An analogous, more detailed example using the relationship between mammograms and the likelihood of breast cancer is provided by Gigerenzer et al. (2008).

  7. 7.

    See Lin et al. (2013) for several examples.

  8. 8.

    To illustrate, consider this tweet from June 3, 2019: “Discussion on the #statisticalSignificance has reached ISR. “Null hypothesis significance testing in quantitative IS research: a call to reconsider our practices [submission to a second AIS Senior Scholar Basket of 8 Journal, received Major Revisions]” a new paper by @janrecker” (https://twitter.com/AgloAnivel/status/1135466967354290176)

  9. 9.

    Our query terms were: [Management Information Systems Quarterly OR MIS Quarterly OR MISQ], [European Journal of Information Systems OR EJIS], [Information Systems Journal OR IS Journal OR ISJ], [Information Systems Research OR ISR], [Journal of the Association for Information Systems OR Journal of the AIS OR JAIS], [Journal of Information Technology OR Journal of IT OR JIT], [Journal of Management Information Systems OR Journal of MIS OR JMIS], [Journal of Strategic Information Systems OR Journal of SIS OR JSIS]. We checked for and excluded inaccurate results, such as papers from MISQ Executive, European Journal of Interdisciplinary Studies (EJIS), etc.

  10. 10.

    We used the definitions by Creswell (2009, p. 148): random sampling means each unit in the population has an equal probability of being selected, systematic sampling means that specific characteristics are used to stratify the sample such that the true proportion of units in the studied population is reflected, and convenience sampling means that a nonprobability sample of available or accessible units is used.

References

  • Amrhein, V., Greenland, S., & McShane, B. (2019). Scientists rise up against statistical significance. Nature, 567, 305–307.

    Article  Google Scholar 

  • Bagozzi, R. P. (2011). Measurement and meaning in information systems and organizational research: Methodological and philosophical foundations. MIS Quarterly, 35(2), 261–292.

    Article  Google Scholar 

  • Baker, M. (2016). Statisticians issue warning over misuse of p values. Nature, 531(7593), 151–151.

    Article  Google Scholar 

  • Baroudi, J. J., & Orlikowski, W. J. (1989). The problem of statistical power in MIS research. MIS Quarterly, 13(1), 87–106.

    Article  Google Scholar 

  • Bedeian, A. G., Taylor, S. G., & Miller, A. N. (2010). Management science on the credibility bubble: Cardinal sins and various misdemeanors. Academy of Management Learning & Education, 9(4), 715–725.

    Google Scholar 

  • Begg, C., Cho, M., Eastwood, S., Horton, R., Moher, D., Olkin, I., et al. (1996). Improving the quality of reporting of randomized controlled trials: The consort statement. Journal of the American Medical Association, 276(8), 637–639.

    Article  Google Scholar 

  • Berente, N., Seidel, S., & Safadi, H. (2019). Data-driven computationally-intensive theory development. Information Systems Research, 30(1), 50–64.

    Article  Google Scholar 

  • Bettis, R. A. (2012). The search for asterisks: Compromised statistical tests and flawed theories. Strategic Management Journal, 33(1), 108–113.

    Article  Google Scholar 

  • Bettis, R. A., Ethiraj, S., Gambardella, A., Helfat, C., & Mitchell, W. (2016). Creating repeatable cumulative knowledge in strategic management. Strategic Management Journal, 37(2), 257–261.

    Article  Google Scholar 

  • Branch, M. (2014). Malignant side effects of null-hypothesis significance testing. Theory & Psychology, 24(2), 256–277.

    Article  Google Scholar 

  • Bruns, S. B., & Ioannidis, J. P. A. (2016). P-curve and p-hacking in observational research. PLoS One, 11(2), e0149144.

    Article  Google Scholar 

  • Burmeister, O. K. (2016). A post publication review of “A review and comparative analysis of security risks and safety measures of mobile health apps.”. Australasian Journal of Information Systems, 20, 1–4.

    Article  Google Scholar 

  • Burtch, G., Ghose, A., & Wattal, S. (2013). An empirical examination of the antecedents and consequences of contribution patterns in crowd-funded markets. Information Systems Research, 24(3), 499–519.

    Article  Google Scholar 

  • Burton-Jones, A., & Lee, A. S. (2017). Thinking about measures and measurement in positivist research: A proposal for refocusing on fundamentals. Information Systems Research, 28(3), 451–467.

    Article  Google Scholar 

  • Burton-Jones, A., Recker, J., Indulska, M., Green, P., & Weber, R. (2017). Assessing representation theory with a framework for pursuing success and failure. MIS Quarterly, 41(4), 1307–1333.

    Article  Google Scholar 

  • Button, K. S., Bal, L., Clark, A., & Shipley, T. (2016). Preventing the ends from justifying the means: Withholding results to address publication bias in peer-review. BMC Psychology, 4, 59.

    Article  Google Scholar 

  • Chen, H., Chiang, R., & Storey, V. C. (2012). Business intelligence and analytics: From big data to big impacts. MIS Quarterly, 36(4), 1165–1188.

    Article  Google Scholar 

  • Christensen, R. (2005). Testing Fisher, Neyman, Pearson, and Bayes. The American Statistician, 59(2), 121–126.

    Article  Google Scholar 

  • Cohen, J. (1994). The earth is round (p <0.05). American Psychologist, 49(12), 997–1003.

    Article  Google Scholar 

  • Creswell, J. W. (2009). Research design: Qualitative, quantitative, and mixed methods approaches (3rd ed.). SAGE.

    Google Scholar 

  • David, P. A. (2004). Understanding the emergence of “open science” institutions: Functionalist economics in historical context. Industrial and Corporate Change, 13(4), 571–589.

    Article  Google Scholar 

  • Dennis, A. R., Brown, S. A., Wells, T., & Rai, A. (2018). Information systems replication project. https://aisel.aisnet.org/trr/aimsandscope.html.

  • Dennis, A. R., & Valacich, J. S. (2015). A replication manifesto. AIS Transactions on Replication Research, 1(1), 1–4.

    Google Scholar 

  • Dennis, A. R., Valacich, J. S., Fuller, M. A., & Schneider, C. (2006). Research standards for promotion and tenure in information systems. MIS Quarterly, 30(1), 1–12.

    Article  Google Scholar 

  • Dewan, S., & Ramaprasad, J. (2014). Social media, traditional media, and music sales. MIS Quarterly, 38(1), 101–121.

    Article  Google Scholar 

  • Dixon, P. (2003). The p-value fallacy and how to avoid it. Canadian Journal of Experimental Psychology/Revue canadienne de psychologie expérimentale, 57(3), 189–202.

    Article  Google Scholar 

  • Edwards, J. R., & Berry, J. W. (2010). The presence of something or the absence of nothing: Increasing theoretical precision in management research. Organizational Research Methods, 13(4), 668–689.

    Article  Google Scholar 

  • Emerson, G. B., Warme, W. J., Wolf, F. M., Heckman, J. D., Brand, R. A., & Leopold, S. S. (2010). Testing for the presence of positive-outcome bias in peer review: A randomized controlled trial. Archives of Internal Medicine, 170(21), 1934–1939.

    Article  Google Scholar 

  • Falk, R., & Greenbaum, C. W. (1995). Significance tests die hard: The amazing persistence of a probabilistic misconception. Theory & Psychology, 5(1), 75–98.

    Article  Google Scholar 

  • Faul, F., Erdfelder, E., Lang, A.-G., & Axel, B. (2007). G*power 3: A flexible statistical power analysis for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175–191.

    Article  Google Scholar 

  • Field, A. (2013). Discovering statistics using IBM SPSS statistics. SAGE.

    Google Scholar 

  • Fisher, R. A. (1935a). The design of experiments. Oliver & Boyd.

    Google Scholar 

  • Fisher, R. A. (1935b). The logic of inductive inference. Journal of the Royal Statistical Society, 98(1), 39–82.

    Article  Google Scholar 

  • Fisher, R. A. (1955). Statistical methods and scientific induction. Journal of the Royal Statistical Society. Series B (Methodological), 17(1), 69–78.

    Article  Google Scholar 

  • Freelon, D. (2014). On the interpretation of digital trace data in communication and social computing research. Journal of Broadcasting & Electronic Media, 58(1), 59–75.

    Article  Google Scholar 

  • Gefen, D., Rigdon, E. E., & Straub, D. W. (2011). An update and extension to SEM guidelines for administrative and social science research. MIS Quarterly, 35(2), iii–xiv.

    Article  Google Scholar 

  • Gelman, A. (2013). P values and statistical practice. Epidemiology, 24(1), 69–72.

    Article  Google Scholar 

  • Gelman, A. (2015). Statistics and research integrity. European Science Editing, 41, 13–14.

    Google Scholar 

  • Gelman, A., & Stern, H. (2006). The difference between “significant” and “not significant” is not itself statistically significant. The American Statistician, 60(4), 328–331.

    Article  Google Scholar 

  • George, G., Haas, M. R., & Pentland, A. (2014). From the editors: Big data and management. Academy of Management Journal, 57(2), 321–326.

    Article  Google Scholar 

  • Gerow, J. E., Grover, V., Roberts, N., & Thatcher, J. B. (2010). The diffusion of second-generation statistical techniques in information systems research from 1990-2008. Journal of Information Technology Theory and Application, 11(4), 5–28.

    Google Scholar 

  • Gigerenzer, G. (2004). Mindless statistics. Journal of Socio-Economics, 33(5), 587–606.

    Article  Google Scholar 

  • Gigerenzer, G., Gaissmeyer, W., Kurz-Milcke, E., Schwartz, L. M., & Woloshin, S. (2008). Helping doctors and patients make sense of health statistics. Psychological Science in the Public Interest, 8(2), 53–96.

    Article  Google Scholar 

  • Godfrey-Smith, P. (2003). Theory and reality: An introduction to the philosophy of science. University of Chicago Press.

    Book  Google Scholar 

  • Goldfarb, B., & King, A. A. (2016). Scientific apophenia in strategic management research: Significance tests & mistaken inference. Strategic Management Journal, 37(1), 167–176.

    Article  Google Scholar 

  • Goodhue, D. L., Lewis, W., & Thompson, R. L. (2007). Statistical power in analyzing interaction effects: Questioning the advantage of PLS with product indicators. Information Systems Research, 18(2), 211–227.

    Article  Google Scholar 

  • Gray, P. H., & Cooper, W. H. (2010). Pursuing failure. Organizational Research Methods, 13(4), 620–643.

    Article  Google Scholar 

  • Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, p values, confidence intervals, and power: A guide to misinterpretations. European Journal of Epidemiology, 31(4), 337–350.

    Article  Google Scholar 

  • Gregor, S. (2006). The nature of theory in information systems. MIS Quarterly, 30(3), 611–642.

    Article  Google Scholar 

  • Gregor, S., & Klein, G. (2014). Eight obstacles to overcome in the theory testing genre. Journal of the Association for Information Systems, 15(11), i–xix.

    Article  Google Scholar 

  • Greve, W., Bröder, A., & Erdfelder, E. (2013). Result-blind peer reviews and editorial decisions: A missing pillar of scientific culture. European Psychologist, 18(4), 286–294.

    Article  Google Scholar 

  • Grover, V., & Lyytinen, K. (2015). New state of play in information systems research: The push to the edges. MIS Quarterly, 39(2), 271–296.

    Article  Google Scholar 

  • Grover, V., Straub, D. W., & Galluch, P. (2009). Editor’s comments: Turning the corner: The influence of positive thinking on the information systems field. MIS Quarterly, 33(1), iii-viii.

    Article  Google Scholar 

  • Guide, V. D. R., Jr., & Ketokivi, M. (2015). Notes from the editors: Redefining some methodological criteria for the journal. Journal of Operations Management, 37, v-viii.

    Article  Google Scholar 

  • Hair, J. F., Sarstedt, M., Ringle, C. M., & Mena, J. A. (2012). An assessment of the use of partial least squares structural equation modeling in marketing research. Journal of the Academy of Marketing Science, 40(3), 414–433.

    Article  Google Scholar 

  • Haller, H., & Kraus, S. (2002). Misinterpretations of significance: A problem students share with their teachers? Methods of Psychological Research, 7(1), 1–20.

    Google Scholar 

  • Harrison, J. S., Banks, G. C., Pollack, J. M., O’Boyle, E. H., & Short, J. (2014). Publication bias in strategic management research. Journal of Management, 43(2), 400–425.

    Article  Google Scholar 

  • Harzing, A.-W. (2010). The publish or perish book: Your guide to effective and responsible citation analysis. Tarma Software Research.

    Google Scholar 

  • Howison, J., Wiggins, A., & Crowston, K. (2011). Validity issues in the use of social network analysis with digital trace data. Journal of the Association for Information Systems, 12(12), 767–797.

    Article  Google Scholar 

  • Hubbard, R. (2004). Alphabet soup. Blurring the distinctions between p’s and a’s in psychological research. Theory & Psychology, 14(3), 295–327.

    Article  Google Scholar 

  • Ioannidis, J. P. A., Fanelli, D., Drunne, D. D., & Goodman, S. N. (2015). Meta-research: Evaluation and improvement of research methods and practices. PLoS Biology, 13(10), e1002264.

    Article  Google Scholar 

  • Johnson, V. E., Payne, R. D., Wang, T., Asher, A., & Mandal, S. (2017). On the reproducibility of psychological science. Journal of the American Statistical Association, 112(517), 1–10.

    Article  Google Scholar 

  • Kaplan, A. (1998/1964). The conduct of inquiry: Methodology for behavioral science. Transaction Publishers.

    Google Scholar 

  • Kerr, N. L. (1998). Harking: Hypothesizing after the results are known. Personality and Social Psychology Review, 2(3), 196–217.

    Article  Google Scholar 

  • Lang, J. M., Rothman, K. J., & Cann, C. I. (1998). That confounded p-value. Epidemiology, 9(1), 7–8.

    Article  Google Scholar 

  • Lazer, D., Pentland, A. P., Adamic, L. A., Aral, S., Barabási, A.-L., Brewer, D., et al. (2009). Computational social science. Science, 323(5915), 721–723.

    Article  Google Scholar 

  • Leahey, E. (2005). Alphas and asterisks: The development of statistical significance testing standards in sociology. Social Forces, 84(1), 1–24.

    Article  Google Scholar 

  • Lee, A. S., & Baskerville, R. (2003). Generalizing generalizability in information systems research. Information Systems Research, 14(3), 221–243.

    Article  Google Scholar 

  • Lee, A. S., & Hubona, G. S. (2009). A scientific basis for rigor in information systems research. MIS Quarterly, 33(2), 237–262.

    Article  Google Scholar 

  • Lee, A. S., Mohajeri, K., & Hubona, G. S. (2017). Three roles for statistical significance and the validity frontier in theory testing. Paper presented at the 50th Hawaii international conference on system sciences.

    Google Scholar 

  • Lehmann, E. L. (1993). The Fisher, Neyman-Pearson theories of testing hypotheses: One theory or two? Journal of the American Statistical Association, 88(424), 1242–1249.

    Article  Google Scholar 

  • Lenzer, J., Hoffman, J. R., Furberg, C. D., & Ioannidis, J. P. A. (2013). Ensuring the integrity of clinical practice guidelines: A tool for protecting patients. British Medical Journal, 347, f5535.

    Article  Google Scholar 

  • Levy, M., & Germonprez, M. (2017). The potential for citizen science in information systems research. Communications of the Association for Information Systems, 40(2), 22–39.

    Article  Google Scholar 

  • Lin, M., Lucas, H. C., Jr., & Shmueli, G. (2013). Too big to fail: Large samples and the p-value problem. Information Systems Research, 24(4), 906–917.

    Article  Google Scholar 

  • Locascio, J. J. (2019). The impact of results blind science publishing on statistical consultation and collaboration. The American Statistician, 73(supp1), 346–351.

    Article  Google Scholar 

  • Lu, X., Ba, S., Huang, L., & Feng, Y. (2013). Promotional marketing or word-of-mouth? Evidence from online restaurant reviews. Information Systems Research, 24(3), 596–612.

    Article  Google Scholar 

  • Lukyanenko, R., Parsons, J., Wiersma, Y. F., & Maddah, M. (2019). Expecting the unexpected: Effects of data collection design choices on the quality of crowdsourced user-generated content. MIS Quarterly, 43(2), 623–647.

    Article  Google Scholar 

  • Lyytinen, K., Baskerville, R., Iivari, J., & Te‘Eni, D. (2007). Why the old world cannot publish? Overcoming challenges in publishing high-impact is research. European Journal of Information Systems, 16(4), 317–326.

    Article  Google Scholar 

  • MacKenzie, S. B., Podsakoff, P. M., & Podsakoff, N. P. (2011). Construct measurement and validation procedures in mis and behavioral research: Integrating new and existing techniques. MIS Quarterly, 35(2), 293–334.

    Article  Google Scholar 

  • Madden, L. V., Shah, D. A., & Esker, P. D. (2015). Does the p value have a future in plant pathology? Phytopathology, 105(11), 1400–1407.

    Article  Google Scholar 

  • Matthews, R. A. J. (2019). Moving towards the post p < 0.05 era via the analysis of credibility. The American Statistician, 73(Sup 1), 202–212.

    Article  Google Scholar 

  • McNutt, M. (2016). Taking up top. Science, 352(6290), 1147.

    Article  Google Scholar 

  • McShane, B. B., & Gal, D. (2017). Blinding us to the obvious? The effect of statistical training on the evaluation of evidence. Management Science, 62(6), 1707–1718.

    Article  Google Scholar 

  • Meehl, P. E. (1967). Theory-testing in psychology and physics: A methodological paradox. Philosophy of Science, 34(2), 103–115.

    Article  Google Scholar 

  • Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46, 806–834.

    Article  Google Scholar 

  • Mertens, W., Pugliese, A., & Recker, J. (2017). Quantitative data analysis: A companion for accounting and information systems research. Springer.

    Book  Google Scholar 

  • Miller, J. (2009). What is the probability of replicating a statistically significant effect? Psychonomic Bulletin & Review, 16(4), 617–640.

    Article  Google Scholar 

  • Mithas, S., Tafti, A., & Mitchell, W. (2013). How a firm's competitive environment and digital strategic posture influence digital business strategy. MIS Quarterly, 37(2), 511.

    Article  Google Scholar 

  • Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G., & The PRISMA Group. (2009). Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. PLoS Medicine, 6(7), e1000100.

    Article  Google Scholar 

  • Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., du Sert, N. P., et al. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1(0021), 1–9.

    Google Scholar 

  • Nakagawa, S., & Cuthill, I. C. (2007). Effect size, confidence interval and statistical significance: A practical guide for biologists. Biological Reviews, 82(4), 591–605.

    Article  Google Scholar 

  • NCBI Insights. (2018). Pubmed commons to be discontinued. https://ncbiinsights.ncbi.nlm. nih.gov/2018/02/01/pubmed-commons-to-be-discontinued/.

  • Nelson, L. D., Simmons, J. P., & Simonsohn, U. (2018). Psychology’s renaissance. Annual Review of Psychology, 69, 511–534.

    Article  Google Scholar 

  • Neyman, J., & Pearson, E. S. (1928). On the use and interpretation of certain test criteria for purposes of statistical inference: Part I. Biometrika, 20A(1/2), 175–240.

    Article  Google Scholar 

  • Neyman, J., & Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 231, 289–337.

    Google Scholar 

  • Nickerson, R. S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5(2), 241–301.

    Article  Google Scholar 

  • Nielsen, M. (2011). Reinventing discovery: The new era of networked science. Princeton University Press.

    Book  Google Scholar 

  • Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., et al. (2015). Promoting an open research culture. Science, 348(6242), 1422–1425.

    Article  Google Scholar 

  • Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The preregistration revolution. Proceedings of the National Academy of Sciences, 115(11), 2600–2606.

    Article  Google Scholar 

  • Nuzzo, R. (2014). Statistical errors: P values, the “gold standard” of statistical validity, are not as reliable as many scientists assume. Nature, 506(150), 150–152.

    Article  Google Scholar 

  • O’Boyle, E. H., Banks, G. C., & Gonzalez-Mulé, E. (2017). The chrysalis effect: How ugly initial results metamorphosize into beautiful articles. Journal of Management, 43(2), 376–399.

    Article  Google Scholar 

  • Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), 943.

    Article  Google Scholar 

  • Pernet, C. (2016). Null hypothesis significance testing: A guide to commonly misunderstood concepts and recommendations for good practice [version 5; peer review: 2 approved, 2 not approved]. F1000Research, 4(621). https://doi.org/10.12688/f1000research.6963.5.

  • publons. (2017). 5 steps to writing a winning post-publication peer review. https://publons.com/blog/5-steps-to-writing-a-winning-post-publication-peer-review/.

  • Reinhart, A. (2015). Statistics done wrong: The woefully complete guide. No Starch Press.

    Google Scholar 

  • Ringle, C. M., Sarstedt, M., & Straub, D. W. (2012). Editor’s comments: A critical look at the use of PLS-SEM in MIS quarterly. MIS Quarterly, 36(1), iii–xiv.

    Article  Google Scholar 

  • Rishika, R., Kumar, A., Janakiraman, R., & Bezawada, R. (2013). The effect of customers’ social media participation on customer visit frequency and profitability: An empirical investigation. Information Systems Research, 24(1), 108–127.

    Article  Google Scholar 

  • Rönkkö, M., & Evermann, J. (2013). A critical examination of common beliefs about partial least squares path modeling. Organizational Research Methods, 16(3), 425–448.

    Article  Google Scholar 

  • Rönkkö, M., McIntosh, C. N., Antonakis, J., & Edwards, J. R. (2016). Partial least squares path modeling: Time for some serious second thoughts. Journal of Operations Management, 47-48, 9–27.

    Article  Google Scholar 

  • Saunders, C. (2005). Editor’s comments: Looking for diamond cutters. MIS Quarterly, 29(1), iii–viii.

    Article  Google Scholar 

  • Saunders, C., Brown, S. A., Bygstad, B., Dennis, A. R., Ferran, C., Galletta, D. F., et al. (2017). Goals, values, and expectations of the ais family of journals. Journal of the Association for Information Systems, 18(9), 633–647.

    Article  Google Scholar 

  • Schönbrodt, F. D. (2018). P-checker: One-for-all p-value analyzer. http://shinyapps.org/apps/p-checker/.

  • Schwab, A., Abrahamson, E., Starbuck, W. H., & Fidler, F. (2011). Perspective: Researchers should make thoughtful assessments instead of null-hypothesis significance tests. Organization Science, 22(4), 1105–1120.

    Article  Google Scholar 

  • Shaw, J. D., & Ertug, G. (2017). From the editors: The suitability of simulations and meta-analyses for submissions to academy of management journal. Academy of Management Journal, 60(6), 2045–2049.

    Article  Google Scholar 

  • Siegfried, T. (2014). To make science better, watch out for statistical flaws. ScienceNews Context Blog, 2019, February 7, 2014. https://www.sciencenews.org/blog/context/make-science-better-watch-out-statistical-flaws.

  • Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014). P-curve: A key to the file-drawer. Journal of Experimental Psychology: General, 143(2), 534–547.

    Article  Google Scholar 

  • Sivo, S. A., Saunders, C., Chang, Q., & Jiang, J. J. (2006). How low should you go? Low response rates and the validity of inference in is questionnaire research. Journal of the Association for Information Systems, 7(6), 351–414.

    Article  Google Scholar 

  • Smith, S. M., Fahey, T., & Smucny, J. (2014). Antibiotics for acute bronchitis. Journal of the American Medical Association, 312(24), 2678–2679.

    Article  Google Scholar 

  • Starbuck, W. H. (2013). Why and where do academics publish? Management, 16(5), 707–718.

    Google Scholar 

  • Starbuck, W. H. (2016). 60th anniversary essay: How journals could improve research practices in social science. Administrative Science Quarterly, 61(2), 165–183.

    Article  Google Scholar 

  • Straub, D. W. (1989). Validating instruments in MIS research. MIS Quarterly, 13(2), 147–169.

    Article  Google Scholar 

  • Straub, D. W. (2008). Editor’s comments: Type II reviewing errors and the search for exciting papers. MIS Quarterly, 32(2), v–x.

    Article  Google Scholar 

  • Straub, D. W., Boudreau, M.-C., & Gefen, D. (2004). Validation guidelines for is positivist research. Communications of the Association for Information Systems, 13(24), 380–427.

    Google Scholar 

  • Szucs, D., & Ioannidis, J. P. A. (2017). When null hypothesis significance testing is unsuitable for research: A reassessment. Frontiers in Human Neuroscience, 11(390), 1–21.

    Google Scholar 

  • Tams, S., & Straub, D. W. (2010). The effect of an IS article’s structure on its impact. Communications of the Association for Information Systems, 27(10), 149–172.

    Google Scholar 

  • The Economist. (2013). Trouble at the lab. The Economist. http://www.economist.com/news/briefing/21588057-scientists-think-science-self-correcting-alarming-degree-it-not-trouble.

  • Trafimow, D., & Marks, M. (2015). Editorial. Basic and Applied Social Psychology, 37(1), 1–2.

    Article  Google Scholar 

  • Tryon, W. W., Patelis, T., Chajewski, M., & Lewis, C. (2017). Theory construction and data analysis. Theory & Psychology, 27(1), 126–134.

    Article  Google Scholar 

  • Tsang, E. W. K., & Williams, J. N. (2012). Generalization and induction: Misconceptions, clarifications, and a classification of induction. MIS Quarterly, 36(3), 729–748.

    Article  Google Scholar 

  • Twa, M. D. (2016). Transparency in biomedical research: An argument against tests of statistical significance. Optometry & Vision Science, 93(5), 457–458.

    Article  Google Scholar 

  • Venkatesh, V., Brown, S. A., & Bala, H. (2013). Bridging the qualitative-quantitative divide: Guidelines for conducting mixed methods research in information systems. MIS Quarterly, 37(1), 21–54.

    Article  Google Scholar 

  • Vodanovich, S., Sundaram, D., & Myers, M. D. (2010). Research commentary: Digital natives and ubiquitous information systems. Information Systems Research, 21(4), 711–723.

    Article  Google Scholar 

  • Walsh, E., Rooney, M., Appleby, L., & Wilkinson, G. (2000). Open peer review: A randomised controlled trial. The British Journal of Psychiatry, 176(1), 47–51.

    Article  Google Scholar 

  • Warren, M. (2018). First analysis of “preregistered” studies shows sharp rise in null findings. Nature News, October 24, 2018, https://www.nature.com/articles/d41586-018-07118.

  • Wasserstein, R. L., & Lazar, N. A. (2016). The ASA’s statement on p-values: Context, process, and purpose. The American Statistician, 70(2), 129–133.

    Article  Google Scholar 

  • Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a world beyond “p < 0.05.”. The American Statistician, 73(Sup 1), 1–19.

    Article  Google Scholar 

  • Xu, H., Zhang, N., & Zhou, L. (2019). Validity concerns in research using organic data. Journal of Management, 46, 1257. https://doi.org/10.1177/0149206319862027

    Article  Google Scholar 

  • Yong, E. (2012). Nobel laureate challenges psychologists to clean up their act. Nature News, October 3, 2012. https://www.nature.com/news/nobel-laureate-challenges-psychologists-to-clean-up-their-act-1.11535.

  • Yoo, Y. (2010). Computing in everyday life: A call for research on experiential computing. MIS Quarterly, 34(2), 213–231.

    Article  Google Scholar 

  • Zeng, X., & Wei, L. (2013). Social ties and user content generation: Evidence from flickr. Information Systems Research, 24(1), 71–87.

    Article  Google Scholar 

Download references

Acknowledgments

We are indebted to the senior editor at JAIS, Allen Lee, and two anonymous reviewers for constructive and developmental feedback that helped us improve the original chapter. We thank participants at seminars at Queensland University of Technology and University of Cologne for providing feedback on our work. We also thank Christian Hovestadt for his help in coding papers. All faults remain ours.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan Recker .

Editor information

Editors and Affiliations

Appendices

Appendix A: Literature Review Procedures

Identification of Papers

In our intention to demonstrate “open science” practices (Locascio, 2019; Nosek et al., 2018; Warren, 2018) we preregistered our research procedures using the Open Science Framework “Registries” (doi:10.17605/OSF.IO/2GKCS).

We proceeded as follows: We identified the 100 top-cited papers (per year) between 2013 and 2016 in the AIS Senior Scholars’ basket of 8 IS journals using Harzing’s Publish or Perish version 6 (Harzing, 2010). We ran the queries separately on February 7, 2017, and then aggregated the results to identify the 100 most cited papers (based on citations per year) across the basket of eight journals.Footnote 9 The raw data (together with the coded data) is available at an open data repository hosted by Queensland University of Technology (doi:10.25912/5cede0024b1e1).

We identified from this set of papers those that followed the hypothetico-deductive model. First, we excluded 48 papers that did not involve empirical data: 31 papers that offered purely theoretical contributions, 11 that were commentaries in the form of forewords, introductions to special issues or editorials, 5 methodological essays, and 1 design science paper. Second, we identified from these 52 papers those that reported on collection and analysis of quantitative data. We found 46 such papers; of these, 39 were traditional quantitative research articles, 3 were essays on methodological aspects of quantitative research, 2 studies employed mixed-method designs involving quantitative empirical data, and 2 design science papers that involved quantitative data. Third, we eliminated from this set the three methodological essays as the focus of these papers was not on developing and testing new theory to explain and predict IS phenomena. This resulted in a final sample of 43 papers, including 2 design science and 2 mixed-method studies.

Coding of Papers

We developed a coding scheme in an excel repository to code the studies. The repository is available in our Open Science Framework (OSF) registry. We used the following criteria. Where applicable, we refer to literature that defined the variables we used during coding.

  • What is the main method of data collection and analysis (e.g., experiment, meta-analysis, panel, social network analysis, survey, text mining, economic modeling, multiple)?

  • Are testable hypotheses or propositions proposed (yes/in graphical form only/no)?

  • How precisely are the hypotheses formulated (using the classification of Edwards & Berry, 2010)?

  • Is null hypothesis significance testing used (yes/no)?

  • Are exact p-values reported (yes/all/some/not at all)?

  • Are effect sizes reported and, if so, which ones primarily (e.g., R2, standardized means difference scores, f2, partial eta2)?

  • Are results declared as “statistically significant” (yes/sometimes/not at all)?

  • How many hypotheses are reported as supported (%)?

  • Are p-values used to argue the absence of an effect (yes/no)?

  • Are confidence intervals for test statistics reported (yes/selectively/no)?

  • What sampling method is used (i.e., convenient/random/systematic sampling, entire population)?Footnote 10

  • Is statistical power discussed and if so, where and how (e.g., sample size estimation, ex-post power analysis)?

  • Are competing theories tested explicitly (Gray & Cooper, 2010)?

  • Are corrections made to adjust for multiple hypothesis testing, where applicable (e.g., Bonferroni, alpha-inflation, variance inflation)?

  • Are post hoc analyses reported for unexpected results?

We also extracted quotes that in our interpretation illuminated the view taken on NHST in the chapter. This was important for us to demonstrate the imbuement of practices in our research routines and the language used in using key NHST phrases such as “statistical significance” or “p-value” (Gelman & Stern, 2006).

To be as unbiased as possible, we hired a research assistant to perform the coding of papers. Before he commenced coding, we explained the coding scheme to him during several meetings. We then conducted a pilot test to evaluate the quality of his coding: the research assistant coded five random papers from the set of papers and we met to review the coding by comparing our different individual understandings of the papers. Where inconsistencies arose, we clarified the coding scheme with him until we were confident that he understood it thoroughly. During the coding, the research assistant highlighted particular problematic or ambiguous coding elements and we met and resolved these ambiguities to arrive at a shared agreement. The coding process took three months to complete. The results of our coding are openly accessible at doi:10.25912/5cede0024b1e1. Appendix B provides some summary statistics about our sample.

Appendix B

Selected Descriptive Statistics from 43 Frequently Cited IS Papers from 2013 to 2016

Main method for data collection and analysis

Experiment

5

Meta-analysis

2

 

Panel

5

 

Social network analysis

4

 

Survey

15

 

Text mining

5

 

Economic modeling

1

 

Multiple

6

Empirical data

Newly collected or analyzed primary data

40

 

Re-analyzed or secondary data

3

Hypotheses

Testable hypotheses or propositions proposed

38

 

No testable hypotheses or propositions proposed

5

 

Average percentage of hypotheses per study that were supported by the data

82%

Statement of hypotheses

As relations

0

 

As upper/lower limits

0

 

As directions

13

 

In non-nil form

0

 

In functional form

0

 

In contingent form

2

 

As comparisons

6

 

In multiple ways

15

 

Not formulated

2

 

Not applicable

5

NHST

Uses NHST techniques or terminology

42

 

Does not use NHST techniques or terminology

1

Exact p-values

Reports exact p-values

3

 

Reports exact p-values selectively

8

 

Reports indicators for different levels of statistical significance

28

 

Does not report p-values

3

Inverse use of p-values

Uses p-values to point at the absence of an effect or accept the null hypothesis

11

 

Does not use p-values to point at the absence of effect or accept the null hypothesis

29

 

Not applicable

3

“Statistical” significance

Does not explicitly refer to “statistical significance”

23

 

Consistently refers to “statistical significance”

3

 

Selectively refers to “statistical significance”

16

 

Not applicable

1

Effect sizes

Reports R2 measures

26

 

Reports mean difference score measures

2

 

Reports multiple effect size measures

4

 

Does not report effect size measures

10

 

Not applicable

1

Confidence intervals

Reports confidence intervals consistently

3

 

Reports confidence intervals selectively

2

 

Reports confidence intervals for bootstrapping results (no p-value available)

3

 

Does not report confidence intervals

34

 

Not applicable

1

Sampling

Convenient

22

 

Systematic

6

 

Random

4

 

Entire population

8

 

Not applicable

3

Competing theories

Tested explicitly

7

 

Not tested

35

 

Not applicable

1

A posteriori analyses

Provided

11

 

Not provided

31

 

Not applicable

1

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Mertens, W., Recker, J. (2023). New Guidelines for Null Hypothesis Significance Testing in Hypothetico-Deductive IS Research. In: Willcocks, L.P., Hassan, N.R., Rivard, S. (eds) Advancing Information Systems Theories, Volume II. Technology, Work and Globalization. Palgrave Macmillan, Cham. https://doi.org/10.1007/978-3-031-38719-7_13

Download citation

Publish with us

Policies and ethics