research-article

Using behavioral data to identify interviewer fabrication in surveys

Authors:
Benjamin Birnbaum

University of Washington, Seattle, WA, USA

University of Washington, Seattle, WA, USA
View Profile

,
Gaetano Borriello

University of Washington, Seattle, Washington, USA

University of Washington, Seattle, Washington, USA
View Profile

,
Abraham D. Flaxman

University of Washington, Seattle, Washington, USA

University of Washington, Seattle, Washington, USA
View Profile

,
Brian DeRenzi

University of Washington, Seattle, Washington, USA

University of Washington, Seattle, Washington, USA
View Profile

,
Anna R. Karlin

University of Washington, Seattle, Washington, USA

University of Washington, Seattle, Washington, USA
View Profile

CHI '13: Proceedings of the SIGCHI Conference on Human Factors in Computing SystemsApril 2013Pages 2911–2920https://doi.org/10.1145/2470654.2481404

Published:27 April 2013Publication History

CHI '13: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

Pages 2911–2920

ABSTRACT

Surveys conducted by human interviewers are one of the principal means of gathering data from all over the world, but the quality of this data can be threatened by interviewer fabrication. In this paper, we investigate a new approach to detecting interviewer fabrication automatically. We instrument electronic data collection software to record logs of low-level behavioral data and show that supervised classification, when applied to features extracted from these logs, can identify interviewer fabrication with an accuracy of up to 96%. We show that even when interviewers know that our approach is being used, have some knowledge of how it works, and are incentivized to avoid detection, it can still achieve an accuracy of 86%. We also demonstrate the robustness of our approach to a moderate amount of label noise and provide practical recommendations, based on empirical evidence, on how much data is needed for our approach to be effective.

References

Baker, R. P. New technology in survey research: Computer-assisted personal interviewing (CAPI). Social Science: Computer Review 10, 2 (1992), 145--157.Google ScholarDigital Library
Bennett, A. Toward a solution of the "cheater problem" among part-time research investigators. J. Marketing 2 (1948), 470--474.Google Scholar
Birnbaum, B. Algorithmic Approaches to Detecting Interviewer Fabrication in Surveys. PhD thesis, U. Washington, 2012.Google Scholar
Birnbaum, B., et al. Automated quality control for mobile data collection. In DEV (2012), 1:1--1:10. Google ScholarDigital Library
Blaya, J. A., et al. E-health technologies show promise in developing countries. Health Aff. (Millwood) 29, 2 (2010), 244--51.Google ScholarCross Ref
Bredl, S., et al. A statistical approach to detect cheating interviewers. Tech. Rep. 39, University Giessen, Center for International Development and Environmental Research (ZEU), 2008.Google Scholar
Breiman, L. Random forests. Machine Learning 45 (2001), 5--32. Google ScholarDigital Library
Bushery, J. M., et al. Using date and time stamps to detect interviewer falsification. Proc. ASA (Survey Research Methods) (1999), 316--320.Google Scholar
Caruana, R., et al. An empirical evaluation of supervised learning in high dimensions. In ICML (2008). Google ScholarDigital Library
Chen, K., et al. USHER: Improving data quality with dynamic forms. IEEE Trans. Knowledge and Data Engineering 23, 8 (2010), 1138--1153. Google ScholarDigital Library
Cho, M. J., et al. Inferential methods to identify possible interviewer fraud using leading digit preference patterns and design effect matrices. Proc. ASA (Survey Research Methods Section) (2003), 936--941.Google Scholar
Couper, M. P. Usability evaluation of computer-assisted survey instruments. Social Science: Computer Review 18, 4 (2000), 384--396. Google ScholarDigital Library
Couper, M. P., and Kreuter, F. Using paradata to explore item level response times in surveys. J. Royal Statistical Society: A (2012).Google Scholar
Crespi, L. P. The cheater problem in polling. Public Opinion Quarterly 9, 4 (1945), 431--445.Google ScholarCross Ref
DeRenzi, B., et al. Mobile phone tools for field-based health care workers in low-income countries. Mt. Sinai J. Medicine 78, 3 (2011), 406--418.Google ScholarCross Ref
EpiSurveyor. http://www.episurveyor.org/.Google Scholar
Evans, F. B. On interviewer cheating. Public Opinion Quarterly 25 (1961), 126--127.Google ScholarCross Ref
Ghazarian, A., and Noorhosseini, S. M. Automatic detection of users skill levels using high-frequency user interface events. User Modeling and User-Adapted Interaction 20, 2 (2010), 109--146. Google ScholarDigital Library
Hall, M., et al. The WEKA data mining software: An update. SIGKDD Explorations 11, 1 (2009). Google ScholarDigital Library
Hansen, S. E., and Marvin, T. Reporting on item times and keystrokes from Blaise audit trails. Tech. rep., 2001.Google Scholar
Hartung, C., et al. Open Data Kit: Tools to build information services for developing regions. In ICTD (2010). Google ScholarDigital Library
Hilbert, D. M., and Redmiles, D. F. Extracting usability information from user interface events. ACM Comp. Surveys 32, 4 (2000), 384--421. Google ScholarDigital Library
Hong, H. S., et al. Adoption of a PDA-based home hospice care system for cancer patients. Comput. Inform. Nurs. 27, 6 (2009), 365--71.Google ScholarCross Ref
Hood, C. C., and Bushery, J. M. Getting more bang from the reinterview buck: Identifying "at risk" interviewers. Proc. ASA (Survey Research Methods Section) (1997), 820--824.Google Scholar
Hurst, A., et al. Automatically detecting pointing performance. In IUI (2008). Google ScholarDigital Library
Inciardi, J. A. Fictitious data in drug abuse research. Intl. J. Addictions 16 (1981), 377--380.Google ScholarCross Ref
Judge, G., and Schechter, L. Detecting problems in survey data using Benford's Law. J. Human Resources 44, 1 (2009), 1--24.Google ScholarCross Ref
Kiecker, P., and Nelson, J. E. Do interviewers follow telephone survey instructions? J. Market Research Society 38 (1996), 161--176.Google ScholarCross Ref
Krejsa, E. A., et al. Evaluation of the quality assurance falsification interview used in the Census 2000 dress rehearsal. Proc. ASA (Survey Research Methods Section) (1999), 635--640.Google Scholar
Lal, S. O., et al. Palm computer demonstrates a fast and accurate means of burn data collection. J. Burn Care Rehabil. 21, 6 (2000), 559--61.Google ScholarCross Ref
Li, J., et al. Using statistical models for sample design of a reinterview program. Proc. ASA (Survey Research Methods Section) (2009), 4681--4695.Google Scholar
Murphy, J., et al. A system for detecting interview falsification. In American Assoc. Public Opinion Research 59th Ann. Conf. (2004).Google Scholar
Parikh, T. S., et al. Mobile phones and paper documents: Evaluating a new approach for capturing microfinance data in rural india. In CHI (2006), 551--560. Google ScholarDigital Library
Pendragon Forms. http://pendragonsoftware.com/.Google Scholar
Porras, J., and English, N. Data-driven approaches to identifying interviewer data falsification: The case of health surveys. Proc. ASA (Survey Research Methods Section) (2004), 4223--4228.Google Scholar
Ratan, A. L., et al. Managing microfinance with paper, pen and digital slate. In ICTD (2010). Google ScholarDigital Library
Rzeszotarski, J. M., and Kittur, A. Instrumenting the crowd: Using implicit behavioral measures to predict task performance. In UIST (2011). Google ScholarDigital Library
Schreiner, I., et al. Interviewer falsification in census bureau surveys. Proc. ASA (Survey Research Methods Section) (1988), 491--496.Google Scholar
Shäfer, C., et al. Automatic identification of faked and fraudulent interviews in surveys by two different methods. Proc. ASA (Survey Research Methods Section) (2004), 4318--4325.Google Scholar
Stieger, S., and Reips, U.-D. What are participants doing while filling in an online questionnaire: A paradata collection tool and an empirical study. Computers in Human Behavior 26, 6 (2010), 1488--1495. Google ScholarDigital Library
Turner, C. F., et al. Falsification in epidemiologic surveys: Detection and remediation. Tech. Rep. 53, Research Triangle Institute, 2002.Google Scholar
United Nations Dept. of Economic and Social Affairs, Population Division. World Urbanization Prospects, 2011.Google Scholar
United Nations Development Programme. The Human Development Report, 2011.Google Scholar

Index Terms

Using behavioral data to identify interviewer fabrication in surveys
1. Human-centered computing

Recommendations

Algorithmic approaches to detecting interviewer fabrication in surveys
Read More
An Enhanced Technique to Clean Data in the Data Warehouse
DESE '11: Proceedings of the 2011 Developments in E-systems Engineering

Data quality is a critical factor for the success of data warehousing projects. Improving the quality of data is important in data warehouse, because it is used in the process of decision support, which requires accurate data. There are many errors and ...
Read More
A Taxonomy of Dirty Data

Today large corporations are constructing enterprise data warehouses from disparate data sources in order to run enterprise-wide data analysis applications, including decision support systems, multidimensional online analytical applications, data mining,...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHI '13: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
April 2013
3550 pages
ISBN:9781450318990
DOI:10.1145/2470654
General Chair:
Wendy E. Mackay
INRIA
,
Program Chairs:
Stephen Brewster
Glasgow University
,
Susanne Bødker
University of Aarhus
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 April 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
behavioral data
curbstoning
data collection
data quality
hci4d
supervised classification
surveys
user logging
Qualifiers
- research-article
Conference

Acceptance Rates
CHI '13 Paper Acceptance Rate392of1,963submissions,20%Overall Acceptance Rate6,199of26,314submissions,24%
More
Upcoming Conference
CHI '24

Sponsor:

sigchi

CHI Conference on Human Factors in Computing Systems

May 11 - 16, 2024

Honolulu , HI , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 470
  Total Downloads
- Downloads (Last 12 months)38
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Using behavioral data to identify interviewer fabrication in surveys

CHI '13: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Algorithmic approaches to detecting interviewer fabrication in surveys

An Enhanced Technique to Clean Data in the Data Warehouse

A Taxonomy of Dirty Data