There are repeated calls in the AI community to prioritize data work — collecting, curating, analysing and otherwise considering the quality of data. But this is not practised as much as advocates would like, often because of a lack of institutional and cultural incentives. One way to encourage data work would be to reframe it as more technically rigorous, and thereby integrate it into more-valued lines of research such as model innovation.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
References
Jiang, M., Rocktäschel, T. & Grefenstette, E. Preprint at https://arxiv.org/abs/2211.07819 (2022).
Liang, W. et al. Nat. Mach. Intell. 4, 669–677 (2022).
Sambasivan, N., Kapania, S., Highfill, H., Akrong, D., Paritosh, P. & Aroyo, L. M. in Proc. 2021 CHI Conference on Human Factors in Computing Systems 1–15 (Assoc. Computing Machinery, 2020).
Liberman, M. Comp. Linguistics 36, 595–599 (2010).
Zhou, K., Jurafsky, D. & Hashimoto, T. Preprint at https://arxiv.org/abs/2302.13439 (2023).
Kaplan, J. et al. Preprint at https://doi.org/10.48550/arXiv.2001.08361 (2020).
Yang, K., Qinami, K., Fei-Fei, L., Deng, J. & Russakovsky, O. in Proc. 2020 Conference on Fairness, Accountability, and Transparency 547–558 (Assoc. Computing Machinery, 2020).
Brown, T. B. et al. in Advances in Neural Information Processing Systems 33 https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html (NeurIPS, 2020).
Narayanan, A. The Limits of the Quantitative Approach to Discrimination (James Baldwin Lecture, 2022).
Birhane, A. et al. in 2022 ACM Conference on Fairness, Accountability, and Transparency 173–184 (Assoc. Computing Machinery, 2022).
Faulkner, W. Social Studies Sci. 30, 759–792 (2000).
Semenova, L., Rudin, C. & Parr, R. in 2022 ACM Conference on Fairness, Accountability, and Transparency 1827–1858 (Assoc. Computing Machinery, 2022).
Koch, B., Denton, E., Hanna, A. & Foster, J. G. in 35th Conference on Neural Information Processing Systems (2021).
Bandy, J. & Vincent, N. in Proc. Neural Information Processing Systems Track on Datasets and Benchmarks 1 https://datasets-benchmarks-proceedings.neurips.cc/paper_files/paper/2021 (NeurIPS Datasets and Benchmarks, 2021).
Caselli, T., Basile, V., Mitrović, J. & Granitzer, M. in Proc. 5th Workshop on Online Abuse and Harms https://aclanthology.org/2021.woah-1.3/ (WOAH, 2021).
Borkan, D., Dixon, L., Sorensen, J., Thain, N. & Vasserman, L. in Companion Proc. 2019 World Wide Web Conference 491–500 (ACM, 2019).
Sattigeri, P., Ghosh, S., Padhi, I., Dognin, P., & Varshney K. in Advances in Neural Information Processing Systems 35 (2022).
Srivastava, A. et al. Preprint at https://arxiv.org/abs/2206.04615 (2022).
Das, P. & Varshney, L. R. IEEE Signal Proc. Mag. 39, 85–95 (2022).
Rothschild, A. et al. in Proc. ACM on Human–Computer Interaction 6 article 307 (Assoc. for Computing Machinery, 2022).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks Margaret Mitchell for their contribution to the peer review of this work.
Rights and permissions
About this article
Cite this article
Gero, K.I., Das, P., Dognin, P. et al. The incentive gap in data work in the era of large models. Nat Mach Intell 5, 565–567 (2023). https://doi.org/10.1038/s42256-023-00673-x
Published:
Issue Date:
DOI: https://doi.org/10.1038/s42256-023-00673-x
This article is cited by
-
Getting real about synthetic data ethics
EMBO Reports (2024)