New Guidelines for Null Hypothesis Significance Testing in Hypothetico-Deductive IS Research

Mertens, Willem; Recker, Jan

doi:10.1007/978-3-031-38719-7_13

Part of the book series: Technology, Work and Globalization ((TWG))

236 Accesses

Abstract

We are concerned about the design, analysis, reporting and reviewing of quantitative IS studies that draw on null hypothesis significance testing (NHST). We observe that debates about misinterpretations, abuse, and issues with NHST, while having persisted for about half a century, remain largely absent in IS. We find this an untenable position for a discipline with a proud quantitative tradition. We discuss traditional and emergent threats associated with the application of NHST and examine how they manifest in recent IS scholarship. To encourage the development of new standards for NHST in hypothetico-deductive IS research, we develop a balanced account of possible actions that are implementable short-term or long-term and that incentivize or penalize specific practices. To promote an immediate push for change, we also develop two sets of guidelines that IS scholars can adopt right away.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
That is, the entire IS scholarly ecosystem of authors, reviewers, editors/publishers, and educators/supervisors.
2.
We will also discuss some of the problems inherent to NHST, but our clear focus is on our own fallibilities and how they could be mitigated.
3.
Remarkably, contrary to several fields, the experiences at the AIS Transactions on Replication Research after three years of publishing replication research indicate that a meaningful proportion of research replications have produced results that are essentially the same as the original study (Dennis et al., 2018).
4.
This trend is evidenced, for example, in the emergent number of IS research articles on these topics in our own journals (e.g., Berente et al., 2019; Howison et al., 2011; Levy & Germonprez, 2017; Lukyanenko et al., 2019).
5.
To illustrate the magnitude of the conversation, in June 2019, The American Statistician published a special issue on null hypothesis significance testing that contains 43 articles on the topic (Wasserstein et al., 2019).
6.
An analogous, more detailed example using the relationship between mammograms and the likelihood of breast cancer is provided by Gigerenzer et al. (2008).
7.
See Lin et al. (2013) for several examples.
8.
To illustrate, consider this tweet from June 3, 2019: “Discussion on the #statisticalSignificance has reached ISR. “Null hypothesis significance testing in quantitative IS research: a call to reconsider our practices [submission to a second AIS Senior Scholar Basket of 8 Journal, received Major Revisions]” a new paper by @janrecker” (https://twitter.com/AgloAnivel/status/1135466967354290176)
9.
Our query terms were: [Management Information Systems Quarterly OR MIS Quarterly OR MISQ], [European Journal of Information Systems OR EJIS], [Information Systems Journal OR IS Journal OR ISJ], [Information Systems Research OR ISR], [Journal of the Association for Information Systems OR Journal of the AIS OR JAIS], [Journal of Information Technology OR Journal of IT OR JIT], [Journal of Management Information Systems OR Journal of MIS OR JMIS], [Journal of Strategic Information Systems OR Journal of SIS OR JSIS]. We checked for and excluded inaccurate results, such as papers from MISQ Executive, European Journal of Interdisciplinary Studies (EJIS), etc.
10.
We used the definitions by Creswell (2009, p. 148): random sampling means each unit in the population has an equal probability of being selected, systematic sampling means that specific characteristics are used to stratify the sample such that the true proportion of units in the studied population is reflected, and convenience sampling means that a nonprobability sample of available or accessible units is used.

References

Amrhein, V., Greenland, S., & McShane, B. (2019). Scientists rise up against statistical significance. Nature, 567, 305–307.
Article Google Scholar
Bagozzi, R. P. (2011). Measurement and meaning in information systems and organizational research: Methodological and philosophical foundations. MIS Quarterly, 35(2), 261–292.
Article Google Scholar
Baker, M. (2016). Statisticians issue warning over misuse of p values. Nature, 531(7593), 151–151.
Article Google Scholar
Baroudi, J. J., & Orlikowski, W. J. (1989). The problem of statistical power in MIS research. MIS Quarterly, 13(1), 87–106.
Article Google Scholar
Bedeian, A. G., Taylor, S. G., & Miller, A. N. (2010). Management science on the credibility bubble: Cardinal sins and various misdemeanors. Academy of Management Learning & Education, 9(4), 715–725.
Google Scholar
Begg, C., Cho, M., Eastwood, S., Horton, R., Moher, D., Olkin, I., et al. (1996). Improving the quality of reporting of randomized controlled trials: The consort statement. Journal of the American Medical Association, 276(8), 637–639.
Article Google Scholar
Berente, N., Seidel, S., & Safadi, H. (2019). Data-driven computationally-intensive theory development. Information Systems Research, 30(1), 50–64.
Article Google Scholar
Bettis, R. A. (2012). The search for asterisks: Compromised statistical tests and flawed theories. Strategic Management Journal, 33(1), 108–113.
Article Google Scholar
Bettis, R. A., Ethiraj, S., Gambardella, A., Helfat, C., & Mitchell, W. (2016). Creating repeatable cumulative knowledge in strategic management. Strategic Management Journal, 37(2), 257–261.
Article Google Scholar
Branch, M. (2014). Malignant side effects of null-hypothesis significance testing. Theory & Psychology, 24(2), 256–277.
Article Google Scholar
Bruns, S. B., & Ioannidis, J. P. A. (2016). P-curve and p-hacking in observational research. PLoS One, 11(2), e0149144.
Article Google Scholar
Burmeister, O. K. (2016). A post publication review of “A review and comparative analysis of security risks and safety measures of mobile health apps.”. Australasian Journal of Information Systems, 20, 1–4.
Article Google Scholar
Burtch, G., Ghose, A., & Wattal, S. (2013). An empirical examination of the antecedents and consequences of contribution patterns in crowd-funded markets. Information Systems Research, 24(3), 499–519.
Article Google Scholar
Burton-Jones, A., & Lee, A. S. (2017). Thinking about measures and measurement in positivist research: A proposal for refocusing on fundamentals. Information Systems Research, 28(3), 451–467.
Article Google Scholar
Burton-Jones, A., Recker, J., Indulska, M., Green, P., & Weber, R. (2017). Assessing representation theory with a framework for pursuing success and failure. MIS Quarterly, 41(4), 1307–1333.
Article Google Scholar
Button, K. S., Bal, L., Clark, A., & Shipley, T. (2016). Preventing the ends from justifying the means: Withholding results to address publication bias in peer-review. BMC Psychology, 4, 59.
Article Google Scholar
Chen, H., Chiang, R., & Storey, V. C. (2012). Business intelligence and analytics: From big data to big impacts. MIS Quarterly, 36(4), 1165–1188.
Article Google Scholar
Christensen, R. (2005). Testing Fisher, Neyman, Pearson, and Bayes. The American Statistician, 59(2), 121–126.
Article Google Scholar
Cohen, J. (1994). The earth is round (p <0.05). American Psychologist, 49(12), 997–1003.
Article Google Scholar
Creswell, J. W. (2009). Research design: Qualitative, quantitative, and mixed methods approaches (3rd ed.). SAGE.
Google Scholar
David, P. A. (2004). Understanding the emergence of “open science” institutions: Functionalist economics in historical context. Industrial and Corporate Change, 13(4), 571–589.
Article Google Scholar
Dennis, A. R., Brown, S. A., Wells, T., & Rai, A. (2018). Information systems replication project. https://aisel.aisnet.org/trr/aimsandscope.html.
Dennis, A. R., & Valacich, J. S. (2015). A replication manifesto. AIS Transactions on Replication Research, 1(1), 1–4.
Google Scholar
Dennis, A. R., Valacich, J. S., Fuller, M. A., & Schneider, C. (2006). Research standards for promotion and tenure in information systems. MIS Quarterly, 30(1), 1–12.
Article Google Scholar
Dewan, S., & Ramaprasad, J. (2014). Social media, traditional media, and music sales. MIS Quarterly, 38(1), 101–121.
Article Google Scholar
Dixon, P. (2003). The p-value fallacy and how to avoid it. Canadian Journal of Experimental Psychology/Revue canadienne de psychologie expérimentale, 57(3), 189–202.
Article Google Scholar
Edwards, J. R., & Berry, J. W. (2010). The presence of something or the absence of nothing: Increasing theoretical precision in management research. Organizational Research Methods, 13(4), 668–689.
Article Google Scholar
Emerson, G. B., Warme, W. J., Wolf, F. M., Heckman, J. D., Brand, R. A., & Leopold, S. S. (2010). Testing for the presence of positive-outcome bias in peer review: A randomized controlled trial. Archives of Internal Medicine, 170(21), 1934–1939.
Article Google Scholar
Falk, R., & Greenbaum, C. W. (1995). Significance tests die hard: The amazing persistence of a probabilistic misconception. Theory & Psychology, 5(1), 75–98.
Article Google Scholar
Faul, F., Erdfelder, E., Lang, A.-G., & Axel, B. (2007). G*power 3: A flexible statistical power analysis for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175–191.
Article Google Scholar
Field, A. (2013). Discovering statistics using IBM SPSS statistics. SAGE.
Google Scholar
Fisher, R. A. (1935a). The design of experiments. Oliver & Boyd.
Google Scholar
Fisher, R. A. (1935b). The logic of inductive inference. Journal of the Royal Statistical Society, 98(1), 39–82.
Article Google Scholar
Fisher, R. A. (1955). Statistical methods and scientific induction. Journal of the Royal Statistical Society. Series B (Methodological), 17(1), 69–78.
Article Google Scholar
Freelon, D. (2014). On the interpretation of digital trace data in communication and social computing research. Journal of Broadcasting & Electronic Media, 58(1), 59–75.
Article Google Scholar
Gefen, D., Rigdon, E. E., & Straub, D. W. (2011). An update and extension to SEM guidelines for administrative and social science research. MIS Quarterly, 35(2), iii–xiv.
Article Google Scholar
Gelman, A. (2013). P values and statistical practice. Epidemiology, 24(1), 69–72.
Article Google Scholar
Gelman, A. (2015). Statistics and research integrity. European Science Editing, 41, 13–14.
Google Scholar
Gelman, A., & Stern, H. (2006). The difference between “significant” and “not significant” is not itself statistically significant. The American Statistician, 60(4), 328–331.
Article Google Scholar
George, G., Haas, M. R., & Pentland, A. (2014). From the editors: Big data and management. Academy of Management Journal, 57(2), 321–326.
Article Google Scholar
Gerow, J. E., Grover, V., Roberts, N., & Thatcher, J. B. (2010). The diffusion of second-generation statistical techniques in information systems research from 1990-2008. Journal of Information Technology Theory and Application, 11(4), 5–28.
Google Scholar
Gigerenzer, G. (2004). Mindless statistics. Journal of Socio-Economics, 33(5), 587–606.
Article Google Scholar
Gigerenzer, G., Gaissmeyer, W., Kurz-Milcke, E., Schwartz, L. M., & Woloshin, S. (2008). Helping doctors and patients make sense of health statistics. Psychological Science in the Public Interest, 8(2), 53–96.
Article Google Scholar
Godfrey-Smith, P. (2003). Theory and reality: An introduction to the philosophy of science. University of Chicago Press.
Book Google Scholar
Goldfarb, B., & King, A. A. (2016). Scientific apophenia in strategic management research: Significance tests & mistaken inference. Strategic Management Journal, 37(1), 167–176.
Article Google Scholar
Goodhue, D. L., Lewis, W., & Thompson, R. L. (2007). Statistical power in analyzing interaction effects: Questioning the advantage of PLS with product indicators. Information Systems Research, 18(2), 211–227.
Article Google Scholar
Gray, P. H., & Cooper, W. H. (2010). Pursuing failure. Organizational Research Methods, 13(4), 620–643.
Article Google Scholar
Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, p values, confidence intervals, and power: A guide to misinterpretations. European Journal of Epidemiology, 31(4), 337–350.
Article Google Scholar
Gregor, S. (2006). The nature of theory in information systems. MIS Quarterly, 30(3), 611–642.
Article Google Scholar
Gregor, S., & Klein, G. (2014). Eight obstacles to overcome in the theory testing genre. Journal of the Association for Information Systems, 15(11), i–xix.
Article Google Scholar
Greve, W., Bröder, A., & Erdfelder, E. (2013). Result-blind peer reviews and editorial decisions: A missing pillar of scientific culture. European Psychologist, 18(4), 286–294.
Article Google Scholar
Grover, V., & Lyytinen, K. (2015). New state of play in information systems research: The push to the edges. MIS Quarterly, 39(2), 271–296.
Article Google Scholar
Grover, V., Straub, D. W., & Galluch, P. (2009). Editor’s comments: Turning the corner: The influence of positive thinking on the information systems field. MIS Quarterly, 33(1), iii-viii.
Article Google Scholar
Guide, V. D. R., Jr., & Ketokivi, M. (2015). Notes from the editors: Redefining some methodological criteria for the journal. Journal of Operations Management, 37, v-viii.
Article Google Scholar
Hair, J. F., Sarstedt, M., Ringle, C. M., & Mena, J. A. (2012). An assessment of the use of partial least squares structural equation modeling in marketing research. Journal of the Academy of Marketing Science, 40(3), 414–433.
Article Google Scholar
Haller, H., & Kraus, S. (2002). Misinterpretations of significance: A problem students share with their teachers? Methods of Psychological Research, 7(1), 1–20.
Google Scholar
Harrison, J. S., Banks, G. C., Pollack, J. M., O’Boyle, E. H., & Short, J. (2014). Publication bias in strategic management research. Journal of Management, 43(2), 400–425.
Article Google Scholar
Harzing, A.-W. (2010). The publish or perish book: Your guide to effective and responsible citation analysis. Tarma Software Research.
Google Scholar
Howison, J., Wiggins, A., & Crowston, K. (2011). Validity issues in the use of social network analysis with digital trace data. Journal of the Association for Information Systems, 12(12), 767–797.
Article Google Scholar
Hubbard, R. (2004). Alphabet soup. Blurring the distinctions between p’s and a’s in psychological research. Theory & Psychology, 14(3), 295–327.
Article Google Scholar
Ioannidis, J. P. A., Fanelli, D., Drunne, D. D., & Goodman, S. N. (2015). Meta-research: Evaluation and improvement of research methods and practices. PLoS Biology, 13(10), e1002264.
Article Google Scholar
Johnson, V. E., Payne, R. D., Wang, T., Asher, A., & Mandal, S. (2017). On the reproducibility of psychological science. Journal of the American Statistical Association, 112(517), 1–10.
Article Google Scholar
Kaplan, A. (1998/1964). The conduct of inquiry: Methodology for behavioral science. Transaction Publishers.
Google Scholar
Kerr, N. L. (1998). Harking: Hypothesizing after the results are known. Personality and Social Psychology Review, 2(3), 196–217.
Article Google Scholar
Lang, J. M., Rothman, K. J., & Cann, C. I. (1998). That confounded p-value. Epidemiology, 9(1), 7–8.
Article Google Scholar
Lazer, D., Pentland, A. P., Adamic, L. A., Aral, S., Barabási, A.-L., Brewer, D., et al. (2009). Computational social science. Science, 323(5915), 721–723.
Article Google Scholar
Leahey, E. (2005). Alphas and asterisks: The development of statistical significance testing standards in sociology. Social Forces, 84(1), 1–24.
Article Google Scholar
Lee, A. S., & Baskerville, R. (2003). Generalizing generalizability in information systems research. Information Systems Research, 14(3), 221–243.
Article Google Scholar
Lee, A. S., & Hubona, G. S. (2009). A scientific basis for rigor in information systems research. MIS Quarterly, 33(2), 237–262.
Article Google Scholar
Lee, A. S., Mohajeri, K., & Hubona, G. S. (2017). Three roles for statistical significance and the validity frontier in theory testing. Paper presented at the 50th Hawaii international conference on system sciences.
Google Scholar
Lehmann, E. L. (1993). The Fisher, Neyman-Pearson theories of testing hypotheses: One theory or two? Journal of the American Statistical Association, 88(424), 1242–1249.
Article Google Scholar
Lenzer, J., Hoffman, J. R., Furberg, C. D., & Ioannidis, J. P. A. (2013). Ensuring the integrity of clinical practice guidelines: A tool for protecting patients. British Medical Journal, 347, f5535.
Article Google Scholar
Levy, M., & Germonprez, M. (2017). The potential for citizen science in information systems research. Communications of the Association for Information Systems, 40(2), 22–39.
Article Google Scholar
Lin, M., Lucas, H. C., Jr., & Shmueli, G. (2013). Too big to fail: Large samples and the p-value problem. Information Systems Research, 24(4), 906–917.
Article Google Scholar
Locascio, J. J. (2019). The impact of results blind science publishing on statistical consultation and collaboration. The American Statistician, 73(supp1), 346–351.
Article Google Scholar
Lu, X., Ba, S., Huang, L., & Feng, Y. (2013). Promotional marketing or word-of-mouth? Evidence from online restaurant reviews. Information Systems Research, 24(3), 596–612.
Article Google Scholar
Lukyanenko, R., Parsons, J., Wiersma, Y. F., & Maddah, M. (2019). Expecting the unexpected: Effects of data collection design choices on the quality of crowdsourced user-generated content. MIS Quarterly, 43(2), 623–647.
Article Google Scholar
Lyytinen, K., Baskerville, R., Iivari, J., & Te‘Eni, D. (2007). Why the old world cannot publish? Overcoming challenges in publishing high-impact is research. European Journal of Information Systems, 16(4), 317–326.
Article Google Scholar
MacKenzie, S. B., Podsakoff, P. M., & Podsakoff, N. P. (2011). Construct measurement and validation procedures in mis and behavioral research: Integrating new and existing techniques. MIS Quarterly, 35(2), 293–334.
Article Google Scholar
Madden, L. V., Shah, D. A., & Esker, P. D. (2015). Does the p value have a future in plant pathology? Phytopathology, 105(11), 1400–1407.
Article Google Scholar
Matthews, R. A. J. (2019). Moving towards the post p < 0.05 era via the analysis of credibility. The American Statistician, 73(Sup 1), 202–212.
Article Google Scholar
McNutt, M. (2016). Taking up top. Science, 352(6290), 1147.
Article Google Scholar
McShane, B. B., & Gal, D. (2017). Blinding us to the obvious? The effect of statistical training on the evaluation of evidence. Management Science, 62(6), 1707–1718.
Article Google Scholar
Meehl, P. E. (1967). Theory-testing in psychology and physics: A methodological paradox. Philosophy of Science, 34(2), 103–115.
Article Google Scholar
Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46, 806–834.
Article Google Scholar
Mertens, W., Pugliese, A., & Recker, J. (2017). Quantitative data analysis: A companion for accounting and information systems research. Springer.
Book Google Scholar
Miller, J. (2009). What is the probability of replicating a statistically significant effect? Psychonomic Bulletin & Review, 16(4), 617–640.
Article Google Scholar
Mithas, S., Tafti, A., & Mitchell, W. (2013). How a firm's competitive environment and digital strategic posture influence digital business strategy. MIS Quarterly, 37(2), 511.
Article Google Scholar
Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G., & The PRISMA Group. (2009). Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. PLoS Medicine, 6(7), e1000100.
Article Google Scholar
Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., du Sert, N. P., et al. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1(0021), 1–9.
Google Scholar
Nakagawa, S., & Cuthill, I. C. (2007). Effect size, confidence interval and statistical significance: A practical guide for biologists. Biological Reviews, 82(4), 591–605.
Article Google Scholar
NCBI Insights. (2018). Pubmed commons to be discontinued. https://ncbiinsights.ncbi.nlm. nih.gov/2018/02/01/pubmed-commons-to-be-discontinued/.
Nelson, L. D., Simmons, J. P., & Simonsohn, U. (2018). Psychology’s renaissance. Annual Review of Psychology, 69, 511–534.
Article Google Scholar
Neyman, J., & Pearson, E. S. (1928). On the use and interpretation of certain test criteria for purposes of statistical inference: Part I. Biometrika, 20A(1/2), 175–240.
Article Google Scholar
Neyman, J., & Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 231, 289–337.
Google Scholar
Nickerson, R. S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5(2), 241–301.
Article Google Scholar
Nielsen, M. (2011). Reinventing discovery: The new era of networked science. Princeton University Press.
Book Google Scholar
Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., et al. (2015). Promoting an open research culture. Science, 348(6242), 1422–1425.
Article Google Scholar
Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The preregistration revolution. Proceedings of the National Academy of Sciences, 115(11), 2600–2606.
Article Google Scholar
Nuzzo, R. (2014). Statistical errors: P values, the “gold standard” of statistical validity, are not as reliable as many scientists assume. Nature, 506(150), 150–152.
Article Google Scholar
O’Boyle, E. H., Banks, G. C., & Gonzalez-Mulé, E. (2017). The chrysalis effect: How ugly initial results metamorphosize into beautiful articles. Journal of Management, 43(2), 376–399.
Article Google Scholar
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), 943.
Article Google Scholar
Pernet, C. (2016). Null hypothesis significance testing: A guide to commonly misunderstood concepts and recommendations for good practice [version 5; peer review: 2 approved, 2 not approved]. F1000Research, 4(621). https://doi.org/10.12688/f1000research.6963.5.
publons. (2017). 5 steps to writing a winning post-publication peer review. https://publons.com/blog/5-steps-to-writing-a-winning-post-publication-peer-review/.
Reinhart, A. (2015). Statistics done wrong: The woefully complete guide. No Starch Press.
Google Scholar
Ringle, C. M., Sarstedt, M., & Straub, D. W. (2012). Editor’s comments: A critical look at the use of PLS-SEM in MIS quarterly. MIS Quarterly, 36(1), iii–xiv.
Article Google Scholar
Rishika, R., Kumar, A., Janakiraman, R., & Bezawada, R. (2013). The effect of customers’ social media participation on customer visit frequency and profitability: An empirical investigation. Information Systems Research, 24(1), 108–127.
Article Google Scholar
Rönkkö, M., & Evermann, J. (2013). A critical examination of common beliefs about partial least squares path modeling. Organizational Research Methods, 16(3), 425–448.
Article Google Scholar
Rönkkö, M., McIntosh, C. N., Antonakis, J., & Edwards, J. R. (2016). Partial least squares path modeling: Time for some serious second thoughts. Journal of Operations Management, 47-48, 9–27.
Article Google Scholar
Saunders, C. (2005). Editor’s comments: Looking for diamond cutters. MIS Quarterly, 29(1), iii–viii.
Article Google Scholar
Saunders, C., Brown, S. A., Bygstad, B., Dennis, A. R., Ferran, C., Galletta, D. F., et al. (2017). Goals, values, and expectations of the ais family of journals. Journal of the Association for Information Systems, 18(9), 633–647.
Article Google Scholar
Schönbrodt, F. D. (2018). P-checker: One-for-all p-value analyzer. http://shinyapps.org/apps/p-checker/.
Schwab, A., Abrahamson, E., Starbuck, W. H., & Fidler, F. (2011). Perspective: Researchers should make thoughtful assessments instead of null-hypothesis significance tests. Organization Science, 22(4), 1105–1120.
Article Google Scholar
Shaw, J. D., & Ertug, G. (2017). From the editors: The suitability of simulations and meta-analyses for submissions to academy of management journal. Academy of Management Journal, 60(6), 2045–2049.
Article Google Scholar
Siegfried, T. (2014). To make science better, watch out for statistical flaws. ScienceNews Context Blog, 2019, February 7, 2014. https://www.sciencenews.org/blog/context/make-science-better-watch-out-statistical-flaws.
Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014). P-curve: A key to the file-drawer. Journal of Experimental Psychology: General, 143(2), 534–547.
Article Google Scholar
Sivo, S. A., Saunders, C., Chang, Q., & Jiang, J. J. (2006). How low should you go? Low response rates and the validity of inference in is questionnaire research. Journal of the Association for Information Systems, 7(6), 351–414.
Article Google Scholar
Smith, S. M., Fahey, T., & Smucny, J. (2014). Antibiotics for acute bronchitis. Journal of the American Medical Association, 312(24), 2678–2679.
Article Google Scholar
Starbuck, W. H. (2013). Why and where do academics publish? Management, 16(5), 707–718.
Google Scholar
Starbuck, W. H. (2016). 60th anniversary essay: How journals could improve research practices in social science. Administrative Science Quarterly, 61(2), 165–183.
Article Google Scholar
Straub, D. W. (1989). Validating instruments in MIS research. MIS Quarterly, 13(2), 147–169.
Article Google Scholar
Straub, D. W. (2008). Editor’s comments: Type II reviewing errors and the search for exciting papers. MIS Quarterly, 32(2), v–x.
Article Google Scholar
Straub, D. W., Boudreau, M.-C., & Gefen, D. (2004). Validation guidelines for is positivist research. Communications of the Association for Information Systems, 13(24), 380–427.
Google Scholar
Szucs, D., & Ioannidis, J. P. A. (2017). When null hypothesis significance testing is unsuitable for research: A reassessment. Frontiers in Human Neuroscience, 11(390), 1–21.
Google Scholar
Tams, S., & Straub, D. W. (2010). The effect of an IS article’s structure on its impact. Communications of the Association for Information Systems, 27(10), 149–172.
Google Scholar
The Economist. (2013). Trouble at the lab. The Economist. http://www.economist.com/news/briefing/21588057-scientists-think-science-self-correcting-alarming-degree-it-not-trouble.
Trafimow, D., & Marks, M. (2015). Editorial. Basic and Applied Social Psychology, 37(1), 1–2.
Article Google Scholar
Tryon, W. W., Patelis, T., Chajewski, M., & Lewis, C. (2017). Theory construction and data analysis. Theory & Psychology, 27(1), 126–134.
Article Google Scholar
Tsang, E. W. K., & Williams, J. N. (2012). Generalization and induction: Misconceptions, clarifications, and a classification of induction. MIS Quarterly, 36(3), 729–748.
Article Google Scholar
Twa, M. D. (2016). Transparency in biomedical research: An argument against tests of statistical significance. Optometry & Vision Science, 93(5), 457–458.
Article Google Scholar
Venkatesh, V., Brown, S. A., & Bala, H. (2013). Bridging the qualitative-quantitative divide: Guidelines for conducting mixed methods research in information systems. MIS Quarterly, 37(1), 21–54.
Article Google Scholar
Vodanovich, S., Sundaram, D., & Myers, M. D. (2010). Research commentary: Digital natives and ubiquitous information systems. Information Systems Research, 21(4), 711–723.
Article Google Scholar
Walsh, E., Rooney, M., Appleby, L., & Wilkinson, G. (2000). Open peer review: A randomised controlled trial. The British Journal of Psychiatry, 176(1), 47–51.
Article Google Scholar
Warren, M. (2018). First analysis of “preregistered” studies shows sharp rise in null findings. Nature News, October 24, 2018, https://www.nature.com/articles/d41586-018-07118.
Wasserstein, R. L., & Lazar, N. A. (2016). The ASA’s statement on p-values: Context, process, and purpose. The American Statistician, 70(2), 129–133.
Article Google Scholar
Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a world beyond “p < 0.05.”. The American Statistician, 73(Sup 1), 1–19.
Article Google Scholar
Xu, H., Zhang, N., & Zhou, L. (2019). Validity concerns in research using organic data. Journal of Management, 46, 1257. https://doi.org/10.1177/0149206319862027
Article Google Scholar
Yong, E. (2012). Nobel laureate challenges psychologists to clean up their act. Nature News, October 3, 2012. https://www.nature.com/news/nobel-laureate-challenges-psychologists-to-clean-up-their-act-1.11535.
Yoo, Y. (2010). Computing in everyday life: A call for research on experiential computing. MIS Quarterly, 34(2), 213–231.
Article Google Scholar
Zeng, X., & Wei, L. (2013). Social ties and user content generation: Evidence from flickr. Information Systems Research, 24(1), 71–87.
Article Google Scholar

Download references

Acknowledgments

We are indebted to the senior editor at JAIS, Allen Lee, and two anonymous reviewers for constructive and developmental feedback that helped us improve the original chapter. We thank participants at seminars at Queensland University of Technology and University of Cologne for providing feedback on our work. We also thank Christian Hovestadt for his help in coding papers. All faults remain ours.

Author information

Authors and Affiliations

Colruyt Group, Halle, Belgium
Willem Mertens
Universität Hamburg, Faculty of Business Administration, Information Systems and Digital Innovation, Hamburg, Germany
Jan Recker

Authors

Willem Mertens
View author publications
You can also search for this author in PubMed Google Scholar
Jan Recker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Recker .

Editor information

Editors and Affiliations

Department of Management, London School of Economics and Political Science, London, UK
Leslie P. Willcocks
Labovitz School of Business and Economics, University of Minnesota Duluth, Duluth, MN, USA
Nik R. Hassan
HEC Montréal, Montreal, QC, Canada
Suzanne Rivard

Appendices

Appendix A: Literature Review Procedures

Identification of Papers

In our intention to demonstrate “open science” practices (Locascio, 2019; Nosek et al., 2018; Warren, 2018) we preregistered our research procedures using the Open Science Framework “Registries” (doi:10.17605/OSF.IO/2GKCS).

We proceeded as follows: We identified the 100 top-cited papers (per year) between 2013 and 2016 in the AIS Senior Scholars’ basket of 8 IS journals using Harzing’s Publish or Perish version 6 (Harzing, 2010). We ran the queries separately on February 7, 2017, and then aggregated the results to identify the 100 most cited papers (based on citations per year) across the basket of eight journals.^{Footnote 9} The raw data (together with the coded data) is available at an open data repository hosted by Queensland University of Technology (doi:10.25912/5cede0024b1e1).

We identified from this set of papers those that followed the hypothetico-deductive model. First, we excluded 48 papers that did not involve empirical data: 31 papers that offered purely theoretical contributions, 11 that were commentaries in the form of forewords, introductions to special issues or editorials, 5 methodological essays, and 1 design science paper. Second, we identified from these 52 papers those that reported on collection and analysis of quantitative data. We found 46 such papers; of these, 39 were traditional quantitative research articles, 3 were essays on methodological aspects of quantitative research, 2 studies employed mixed-method designs involving quantitative empirical data, and 2 design science papers that involved quantitative data. Third, we eliminated from this set the three methodological essays as the focus of these papers was not on developing and testing new theory to explain and predict IS phenomena. This resulted in a final sample of 43 papers, including 2 design science and 2 mixed-method studies.

Coding of Papers

We developed a coding scheme in an excel repository to code the studies. The repository is available in our Open Science Framework (OSF) registry. We used the following criteria. Where applicable, we refer to literature that defined the variables we used during coding.

What is the main method of data collection and analysis (e.g., experiment, meta-analysis, panel, social network analysis, survey, text mining, economic modeling, multiple)?
Are testable hypotheses or propositions proposed (yes/in graphical form only/no)?
How precisely are the hypotheses formulated (using the classification of Edwards & Berry, 2010)?
Is null hypothesis significance testing used (yes/no)?
Are exact p-values reported (yes/all/some/not at all)?
Are effect sizes reported and, if so, which ones primarily (e.g., R², standardized means difference scores, f², partial eta²)?
Are results declared as “statistically significant” (yes/sometimes/not at all)?
How many hypotheses are reported as supported (%)?
Are p-values used to argue the absence of an effect (yes/no)?
Are confidence intervals for test statistics reported (yes/selectively/no)?
What sampling method is used (i.e., convenient/random/systematic sampling, entire population)?^{Footnote 10}
Is statistical power discussed and if so, where and how (e.g., sample size estimation, ex-post power analysis)?
Are competing theories tested explicitly (Gray & Cooper, 2010)?
Are corrections made to adjust for multiple hypothesis testing, where applicable (e.g., Bonferroni, alpha-inflation, variance inflation)?
Are post hoc analyses reported for unexpected results?

We also extracted quotes that in our interpretation illuminated the view taken on NHST in the chapter. This was important for us to demonstrate the imbuement of practices in our research routines and the language used in using key NHST phrases such as “statistical significance” or “p-value” (Gelman & Stern, 2006).

To be as unbiased as possible, we hired a research assistant to perform the coding of papers. Before he commenced coding, we explained the coding scheme to him during several meetings. We then conducted a pilot test to evaluate the quality of his coding: the research assistant coded five random papers from the set of papers and we met to review the coding by comparing our different individual understandings of the papers. Where inconsistencies arose, we clarified the coding scheme with him until we were confident that he understood it thoroughly. During the coding, the research assistant highlighted particular problematic or ambiguous coding elements and we met and resolved these ambiguities to arrive at a shared agreement. The coding process took three months to complete. The results of our coding are openly accessible at doi:10.25912/5cede0024b1e1. Appendix B provides some summary statistics about our sample.

Appendix B

Selected Descriptive Statistics from 43 Frequently Cited IS Papers from 2013 to 2016

Main method for data collection and analysis	Experiment	5
Main method for data collection and analysis	Meta-analysis	2
	Panel	5
	Social network analysis	4
	Survey	15
	Text mining	5
	Economic modeling	1
	Multiple	6
Empirical data	Newly collected or analyzed primary data	40
	Re-analyzed or secondary data	3
Hypotheses	Testable hypotheses or propositions proposed	38
	No testable hypotheses or propositions proposed	5
	Average percentage of hypotheses per study that were supported by the data	82%
Statement of hypotheses	As relations	0
	As upper/lower limits	0
	As directions	13
	In non-nil form	0
	In functional form	0
	In contingent form	2
	As comparisons	6
	In multiple ways	15
	Not formulated	2
	Not applicable	5
NHST	Uses NHST techniques or terminology	42
	Does not use NHST techniques or terminology	1
Exact p-values	Reports exact p-values	3
	Reports exact p-values selectively	8
	Reports indicators for different levels of statistical significance	28
	Does not report p-values	3
Inverse use of p-values	Uses p-values to point at the absence of an effect or accept the null hypothesis	11
	Does not use p-values to point at the absence of effect or accept the null hypothesis	29
	Not applicable	3
“Statistical” significance	Does not explicitly refer to “statistical significance”	23
	Consistently refers to “statistical significance”	3
	Selectively refers to “statistical significance”	16
	Not applicable	1
Effect sizes	Reports R² measures	26
	Reports mean difference score measures	2
	Reports multiple effect size measures	4
	Does not report effect size measures	10
	Not applicable	1
Confidence intervals	Reports confidence intervals consistently	3
	Reports confidence intervals selectively	2
	Reports confidence intervals for bootstrapping results (no p-value available)	3
	Does not report confidence intervals	34
	Not applicable	1
Sampling	Convenient	22
	Systematic	6
	Random	4
	Entire population	8
	Not applicable	3
Competing theories	Tested explicitly	7
	Not tested	35
	Not applicable	1
A posteriori analyses	Provided	11
	Not provided	31
	Not applicable	1

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mertens, W., Recker, J. (2023). New Guidelines for Null Hypothesis Significance Testing in Hypothetico-Deductive IS Research. In: Willcocks, L.P., Hassan, N.R., Rivard, S. (eds) Advancing Information Systems Theories, Volume II. Technology, Work and Globalization. Palgrave Macmillan, Cham. https://doi.org/10.1007/978-3-031-38719-7_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-38719-7_13
Published: 15 October 2023
Publisher Name: Palgrave Macmillan, Cham
Print ISBN: 978-3-031-38718-0
Online ISBN: 978-3-031-38719-7
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics