Skip to main content

Advertisement

Log in

A random forest-based analysis of household survey data to infer insights on digital inequality

  • Research
  • Published:
Iran Journal of Computer Science Aims and scope Submit manuscript

Abstract

This paper examines digital inequalities in Nepal based on a publicly available dataset. We build different random forest classification models and apply sensitivity analysis, permutation variable test, and partial dependence analysis to characterize digital inequality on access, skill, and use at household and individual levels. Our analysis reveals important Nepal-specific findings about digital inequality. In addition, our random forest-based analysis illustrates how non-parametric methods can explicate complex nonlinear relationships that prevail between demographic variables. This paper also illustrates how sensitivity and partial dependence analysis can aid in interpreting the so-called ‘black box’ models like random forests. One of our notable findings is that caste has very little explanatory power in explaining the adoption of digital technologies. Gender, on the other hand, is still a strong predictor of an individual’s computer skills. Although the analysis in this paper is limited to Nepal, the methodology applies to similar datasets for other countries too.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. There could be other measures of sensitivity as well. We restrict to AAD as our objective is only to triangulate the variable importance inferred from permutation and sensitivity-based analyses.

  2. This can be observed through the Management Information Report series provided on the website of Nepal Telecommunication Authority, the telecom regulator of Nepal.

  3. Literally, ‘Sudoorpaschim’ literally translates to ‘far-western’. The far-western region is the most underdeveloped in terms of ICT access, income, education, and other indicators as per the census data of 2011. This fact has been explored in [7]. Recent reports on human development by the Central Bureau of Statistics Nepal, available on its official website, show that the situation is still similar.

  4. Around 67 percent of the population had access to electricity in 2011, which increased to around 90 percent in 2019. This data is provided online by the World Bank.

  5. A good review of the socioeconomic impacts of the caste system can be found in [15].

References

  1. DiMaggio, P., Hargittai, E., et al.: “From the ‘digital divide’to ‘digital inequality’: studying internet use as penetration increases,” Princeton: Center for Arts and Cultural Policy Studies. Woodrow Wilson Sch. Princeton Univ. 4(1), 4–2 (2001)

  2. Van Dijk, J. A.: A theory of the digital divide. In: The digital divide. Routledge, pp. 49–72 (2013)

  3. Robinson, L., Schulz, J., Blank, G., Ragnedda, M., Ono, H., Hogan, B., Mesch, G. S., Cotten, S. R., Kretchmer, S. B., Hale, T. M., Drabowicz, T., Yan, P., Wellman, B., Harper, M.-G., Quan-Haase, A., Dunn, H. S., Casilli, A. A., Tubaro, P., Carvath, R., Chen, W., Wiest, J. B., Dodel, M., Stern, M. J., Ball, C., Huang, K.-T., Khilnani, A.: “Digital inequalities 2.0: Legacy inequalities in the information age,” First Monday, 25(7) (2020)

  4. Robinson, L., Schulz, J., Dunn, H. S., Casilli, A. A., Tubaro, P., Carvath, R., Chen, W., Wiest, J. B., Dodel, M., Stern, M. J., Ball, C., Huang, K.-T., Blank, G., Ragnedda, M., Ono, H., Hogan, B., Mesch, G. S., Cotten, S. R., Kretchmer, S. B., Hale, T. M., Drabowicz, T., Yan, P., Wellman, B., Harper, M.-G., Quan-Haase, A., Khilnani, A.: “Digital inequalities 3.0: Emergent inequalities in the information age,” First Monday, 25(7) (2020)

  5. Pandey, S., Raj, Y.: Free float internet policies of Nepal. Studies Nepali Hist. Soc. 21(1), 1–60 (2016)

    Google Scholar 

  6. Regmi, N.: Expectations versus reality: a case of internet in Nepal. Electron. J. Inform. Syst. Dev. Count. 82(1), 1–20 (2017)

    Article  Google Scholar 

  7. Pandey, S., Regmi, N.: Changing connectivities and renewed priorities: status and challenges facing Nepali internet. First Monday (2018)

  8. Pandey, S.B., Regmi, N.: If you build it, will they come? Exploring narratives that shape the internet in Nepal. Sci. Technol. Soc. 25(3), 444–464 (2020)

    Article  Google Scholar 

  9. Chautari, M.: Moving beyond access: the landscape of internet use and digital inequality in nepal. martin chautari research brief 23. Martin Chautari, Tech. Rep. (2012)

  10. CBSN and UNICEF: Nepal multiple indicator cluster survey report 2019 survey findings report. Tech. Rep, Central Buereau of Staistics Nepal (2020)

  11. Cortez, P., Embrechts, M.J.: Using sensitivity analysis and visualization techniques to open black box data mining models. Inform. Sci. 225, 1–17 (2013)

    Article  Google Scholar 

  12. Greenwell, B.M.: pdp: an r package for constructing partial dependence plots. R J. 9(1), 421 (2017)

  13. Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning: data mining, inference, and prediction. Springer, Berlin (2009)

    Book  MATH  Google Scholar 

  14. Archer, E.: “Package ‘rfpermute”’ (2020)

  15. Mosse, D.: Caste and development: contemporary perspectives on a structure of discrimination and advantage. World Dev. 110, 422–436 (2018)

  16. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  17. Strobl, C., Malley, J., Tutz, G.: An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychol. Methods 14(4), 323 (2009)

    Article  Google Scholar 

  18. Strobl, C., Boulesteix, A.-L., Kneib, T., Augustin, T., Zeileis, A.: Conditional variable importance for random forests. BMC Bioinform. 9(1), 1–11 (2008)

    Article  Google Scholar 

  19. CBSN: “Population monograph of nepal volume ii,” Central Buereau of Staistics Nepal, Tech. Rep. (2014)

  20. Filmer, D., Pritchett, L.H.: Estimating wealth effects without expenditure data-or tears: an application to educational enrollments in states of India. Demography 38(1), 115–132 (2001)

    Google Scholar 

  21. Liaw, A.: Package ‘randomforest. University of California, Berkeley (2018)

    Google Scholar 

  22. Cortez, P.: “Package ‘rminer’,” Teaching Report, 59 (2020)

  23. Guha, A., Mukerji, M. et al.: Determinants of digital divide using demand-supply framework. Aust. J. Inform. Syst. 25 (2021)

  24. Thapa, D., Sæbø, Ø.: Exploring the link between ict and development in the context of developing countries: a literature review. Electron. J. Inform. Syst. Dev. Count. 64(1), 1–15 (2014)

    Article  Google Scholar 

  25. Van Deursen, A. J., Helsper, E. J., Eynon, R.: Measuring digital skills. In: From digital skills to tangible outcomes project report (2014)

Download references

Funding

This work has not been funded by any institution.

Author information

Authors and Affiliations

Authors

Contributions

All the work has been contributed by the corresponding author.

Corresponding author

Correspondence to Nischal Regmi.

Ethics declarations

Conflict of interest

There is no conflict of interest to disclose.

Human Participants

This work does not directly involve any human participants. The data used is published by UNICEF and is publicly available.

Informed Consent

Not applicable as this works uses public data provided by UNICEF.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Regmi, N. A random forest-based analysis of household survey data to infer insights on digital inequality. Iran J Comput Sci 6, 333–344 (2023). https://doi.org/10.1007/s42044-023-00143-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42044-023-00143-y

Keywords

Navigation