skip to main content
10.1145/2470654.2481404acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article

Using behavioral data to identify interviewer fabrication in surveys

Published:27 April 2013Publication History

ABSTRACT

Surveys conducted by human interviewers are one of the principal means of gathering data from all over the world, but the quality of this data can be threatened by interviewer fabrication. In this paper, we investigate a new approach to detecting interviewer fabrication automatically. We instrument electronic data collection software to record logs of low-level behavioral data and show that supervised classification, when applied to features extracted from these logs, can identify interviewer fabrication with an accuracy of up to 96%. We show that even when interviewers know that our approach is being used, have some knowledge of how it works, and are incentivized to avoid detection, it can still achieve an accuracy of 86%. We also demonstrate the robustness of our approach to a moderate amount of label noise and provide practical recommendations, based on empirical evidence, on how much data is needed for our approach to be effective.

References

  1. Baker, R. P. New technology in survey research: Computer-assisted personal interviewing (CAPI). Social Science: Computer Review 10, 2 (1992), 145--157.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bennett, A. Toward a solution of the "cheater problem" among part-time research investigators. J. Marketing 2 (1948), 470--474.Google ScholarGoogle Scholar
  3. Birnbaum, B. Algorithmic Approaches to Detecting Interviewer Fabrication in Surveys. PhD thesis, U. Washington, 2012.Google ScholarGoogle Scholar
  4. Birnbaum, B., et al. Automated quality control for mobile data collection. In DEV (2012), 1:1--1:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Blaya, J. A., et al. E-health technologies show promise in developing countries. Health Aff. (Millwood) 29, 2 (2010), 244--51.Google ScholarGoogle ScholarCross RefCross Ref
  6. Bredl, S., et al. A statistical approach to detect cheating interviewers. Tech. Rep. 39, University Giessen, Center for International Development and Environmental Research (ZEU), 2008.Google ScholarGoogle Scholar
  7. Breiman, L. Random forests. Machine Learning 45 (2001), 5--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bushery, J. M., et al. Using date and time stamps to detect interviewer falsification. Proc. ASA (Survey Research Methods) (1999), 316--320.Google ScholarGoogle Scholar
  9. Caruana, R., et al. An empirical evaluation of supervised learning in high dimensions. In ICML (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chen, K., et al. USHER: Improving data quality with dynamic forms. IEEE Trans. Knowledge and Data Engineering 23, 8 (2010), 1138--1153. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cho, M. J., et al. Inferential methods to identify possible interviewer fraud using leading digit preference patterns and design effect matrices. Proc. ASA (Survey Research Methods Section) (2003), 936--941.Google ScholarGoogle Scholar
  12. Couper, M. P. Usability evaluation of computer-assisted survey instruments. Social Science: Computer Review 18, 4 (2000), 384--396. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Couper, M. P., and Kreuter, F. Using paradata to explore item level response times in surveys. J. Royal Statistical Society: A (2012).Google ScholarGoogle Scholar
  14. Crespi, L. P. The cheater problem in polling. Public Opinion Quarterly 9, 4 (1945), 431--445.Google ScholarGoogle ScholarCross RefCross Ref
  15. DeRenzi, B., et al. Mobile phone tools for field-based health care workers in low-income countries. Mt. Sinai J. Medicine 78, 3 (2011), 406--418.Google ScholarGoogle ScholarCross RefCross Ref
  16. EpiSurveyor. http://www.episurveyor.org/.Google ScholarGoogle Scholar
  17. Evans, F. B. On interviewer cheating. Public Opinion Quarterly 25 (1961), 126--127.Google ScholarGoogle ScholarCross RefCross Ref
  18. Ghazarian, A., and Noorhosseini, S. M. Automatic detection of users skill levels using high-frequency user interface events. User Modeling and User-Adapted Interaction 20, 2 (2010), 109--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hall, M., et al. The WEKA data mining software: An update. SIGKDD Explorations 11, 1 (2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hansen, S. E., and Marvin, T. Reporting on item times and keystrokes from Blaise audit trails. Tech. rep., 2001.Google ScholarGoogle Scholar
  21. Hartung, C., et al. Open Data Kit: Tools to build information services for developing regions. In ICTD (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Hilbert, D. M., and Redmiles, D. F. Extracting usability information from user interface events. ACM Comp. Surveys 32, 4 (2000), 384--421. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Hong, H. S., et al. Adoption of a PDA-based home hospice care system for cancer patients. Comput. Inform. Nurs. 27, 6 (2009), 365--71.Google ScholarGoogle ScholarCross RefCross Ref
  24. Hood, C. C., and Bushery, J. M. Getting more bang from the reinterview buck: Identifying "at risk" interviewers. Proc. ASA (Survey Research Methods Section) (1997), 820--824.Google ScholarGoogle Scholar
  25. Hurst, A., et al. Automatically detecting pointing performance. In IUI (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Inciardi, J. A. Fictitious data in drug abuse research. Intl. J. Addictions 16 (1981), 377--380.Google ScholarGoogle ScholarCross RefCross Ref
  27. Judge, G., and Schechter, L. Detecting problems in survey data using Benford's Law. J. Human Resources 44, 1 (2009), 1--24.Google ScholarGoogle ScholarCross RefCross Ref
  28. Kiecker, P., and Nelson, J. E. Do interviewers follow telephone survey instructions? J. Market Research Society 38 (1996), 161--176.Google ScholarGoogle ScholarCross RefCross Ref
  29. Krejsa, E. A., et al. Evaluation of the quality assurance falsification interview used in the Census 2000 dress rehearsal. Proc. ASA (Survey Research Methods Section) (1999), 635--640.Google ScholarGoogle Scholar
  30. Lal, S. O., et al. Palm computer demonstrates a fast and accurate means of burn data collection. J. Burn Care Rehabil. 21, 6 (2000), 559--61.Google ScholarGoogle ScholarCross RefCross Ref
  31. Li, J., et al. Using statistical models for sample design of a reinterview program. Proc. ASA (Survey Research Methods Section) (2009), 4681--4695.Google ScholarGoogle Scholar
  32. Murphy, J., et al. A system for detecting interview falsification. In American Assoc. Public Opinion Research 59th Ann. Conf. (2004).Google ScholarGoogle Scholar
  33. Parikh, T. S., et al. Mobile phones and paper documents: Evaluating a new approach for capturing microfinance data in rural india. In CHI (2006), 551--560. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Pendragon Forms. http://pendragonsoftware.com/.Google ScholarGoogle Scholar
  35. Porras, J., and English, N. Data-driven approaches to identifying interviewer data falsification: The case of health surveys. Proc. ASA (Survey Research Methods Section) (2004), 4223--4228.Google ScholarGoogle Scholar
  36. Ratan, A. L., et al. Managing microfinance with paper, pen and digital slate. In ICTD (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Rzeszotarski, J. M., and Kittur, A. Instrumenting the crowd: Using implicit behavioral measures to predict task performance. In UIST (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Schreiner, I., et al. Interviewer falsification in census bureau surveys. Proc. ASA (Survey Research Methods Section) (1988), 491--496.Google ScholarGoogle Scholar
  39. Shäfer, C., et al. Automatic identification of faked and fraudulent interviews in surveys by two different methods. Proc. ASA (Survey Research Methods Section) (2004), 4318--4325.Google ScholarGoogle Scholar
  40. Stieger, S., and Reips, U.-D. What are participants doing while filling in an online questionnaire: A paradata collection tool and an empirical study. Computers in Human Behavior 26, 6 (2010), 1488--1495. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Turner, C. F., et al. Falsification in epidemiologic surveys: Detection and remediation. Tech. Rep. 53, Research Triangle Institute, 2002.Google ScholarGoogle Scholar
  42. United Nations Dept. of Economic and Social Affairs, Population Division. World Urbanization Prospects, 2011.Google ScholarGoogle Scholar
  43. United Nations Development Programme. The Human Development Report, 2011.Google ScholarGoogle Scholar

Index Terms

  1. Using behavioral data to identify interviewer fabrication in surveys

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CHI '13: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
      April 2013
      3550 pages
      ISBN:9781450318990
      DOI:10.1145/2470654

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 April 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CHI '13 Paper Acceptance Rate392of1,963submissions,20%Overall Acceptance Rate6,199of26,314submissions,24%

      Upcoming Conference

      CHI '24
      CHI Conference on Human Factors in Computing Systems
      May 11 - 16, 2024
      Honolulu , HI , USA

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader