Skip to main content

Weak Positive Sampling and Soft Smooth Labeling for Distractor Generation Data Augmentation

  • Conference paper
  • First Online:
Advanced Intelligent Computing Technology and Applications (ICIC 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14089))

Included in the following conference series:

  • 948 Accesses

Abstract

Distractor generation is one of the most important and challenging tasks in the automatic generation of multiple choice questions. Previous studies usually use a few ground truth distractors as training samples, which ignores more potential usable distractors, where the strong generation ability of deep learning models might not be fully utilized. Therefore, we propose a data augmentation framework for distractor generation, which first applies the distractor ranking model on a distractor candidate set and then selects useful distractor candidates as additional training samples. Besides, we propose weak positive sampling and soft smooth labeling mechanism to ensure the sample quality and effectively use samples during the training stage. Experimental results on public benchmarks demonstrate the effectiveness of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Gao, Y., Bing, L., Li, P., King, I., Lyu, M.R.: Generating distractors for reading comprehension questions from real examinations. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 6423–6430 (2019)

    Google Scholar 

  2. Chung, H.L., Chan, Y.H., Fan, Y.C.: A bert-based distractor generation scheme with multi-tasking and negative answer training strategies. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 4390–4400 (2020)

    Google Scholar 

  3. Zhou, X., Luo, S., Wu, Y.: Co-attention hierarchical network: generating coherent long distractors for reading comprehension. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 9725–9732 (2020)

    Google Scholar 

  4. Qiu, Z., Wu, X., Fan, W.: Automatic distractor generation for multiple choice questions in standard tests. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 2096–2106 (2020)

    Google Scholar 

  5. Maurya, K.K., Desarkar, M.S.: Learning to distract: a hierarchical multi-decoder network for automated generation of long distractors for multiple-choice questions for reading comprehension. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 1115–1124 (2020)

    Google Scholar 

  6. Liang, C., Yang, X., Dave, N., Wham, D., Pursel, B., Giles, C.L.: Distractor generation for multiple choice questions using learning to rank. In: Proceedings of the 13th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 284–290 (2018)

    Google Scholar 

  7. Sinha, M., Dasgupta, T., Mandav, J.: Ranking multiple choice question distractors using semantically informed neural networks. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 3329–3332 (2020)

    Google Scholar 

  8. Karpukhin, V., et al.: Dense passage retrieval for open-domain question answering. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pp. 6769–6781 (2020)

    Google Scholar 

  9. Hofstätter, S., Lin, S.C., Yang, J.H., Lin, J., Hanbury, A.: Efficiently teaching an effective dense retriever with balanced topic aware sampling. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 113–122 (2021)

    Google Scholar 

  10. Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.: Race: large-scale reading comprehension dataset from examinations. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 785–794 (2017)

    Google Scholar 

  11. Richardson, M., Burges, C.J., Renshaw, E.: Mctest: A challenge dataset for the open-domain machine comprehension of text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 193–203 (2013)

    Google Scholar 

  12. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)

    Google Scholar 

  13. Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., Zettlemoyer, L.: Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7871–7880 (2020)

    Google Scholar 

  14. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(1), 5485–5551 (2020)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgment

This work was partially supported by the National Natural Science Foundation of China (No. 61977002).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenge Rong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, J., Bai, J., Rong, W., Ouyang, Y., Xiong, Z. (2023). Weak Positive Sampling and Soft Smooth Labeling for Distractor Generation Data Augmentation. In: Huang, DS., Premaratne, P., Jin, B., Qu, B., Jo, KH., Hussain, A. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2023. Lecture Notes in Computer Science(), vol 14089. Springer, Singapore. https://doi.org/10.1007/978-981-99-4752-2_62

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-4752-2_62

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-4751-5

  • Online ISBN: 978-981-99-4752-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics