Skip to main content

The Risk of Disclosure When Reporting Commonly Used Univariate Statistics

  • Conference paper
  • First Online:
Privacy in Statistical Databases (PSD 2022)

Abstract

When basic or descriptive summary statistics are reported, it may be possible that the entire sample of observations is inadvertently disclosed, or that members within a sample will be able to work out responses of others. Three sets of univariate summary statistics that are frequently reported are considered: the mean and standard deviation; the median and lower and upper quartiles; the median and minimum and maximum. The methodology assesses how often the full sample of results can be reverse engineered given the summary statistics. The R package uwedragon is recommended for users to assess this risk for a given data set, prior to reporting the mean and standard deviation. It is shown that the disclosure risk is particularly high for small sample sizes on a highly discrete scale. This risk is reduced when alternatives to the mean and standard deviation are reported. An example is given to invoke discussion on appropriate reporting of summary statistics, also giving attention to the box and whiskers plot which is frequently used to visualise some of the summary statistics. Six variations of the box and whiskers plot are discussed, to illustrate disclosure issues that may arise. It is concluded that the safest summary statistics to report is a three-number summary of median, and lower and upper quartiles, which can be graphically displayed by the literal ‘boxplot’ with no whiskers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Skinner, C.: Statistical disclosure control for survey data. In Handbook of Statistics, vol. 29, pp. 381–396. Elsevier (2009). https://doi.org/10.1016/S0169-7161(08)00015-1

  2. Derrick, B., White, P.: Comparing two samples from an individual Likert question. Int. J. Math. Statist. 18(3) (2017)

    Google Scholar 

  3. Derrick, B.: uwedragon: Data Research, Access, Governance Network: Statistical Disclosure Control. R package (2022). https://cran.r-project.org/web/packages/uwedragon/index.html

  4. Lowthian, P., Ritchie, F.: Ensuring the confidentiality of statistical outputs from the ADRN. ADRN Technical paper (2017). https://uwe-repository.worktribe.com/output/888435

  5. Dinur, I., Nissim, K.: Revealing information while preserving privacy. PODS 2003, 202–210 (2003)

    Google Scholar 

  6. Hozo, S.P., Djulbegovic, B., Hozo, I.: Estimating the mean and variance from the median, range, and the size of a sample. BMC Med. Res. Methodol. 5(1), 1–10 (2005)

    Article  Google Scholar 

  7. Derrick, B., Green, L., Kember, K., Ritchie, F., White, P.: Safety in numbers: Minimum thresholding, Maximum bounds, and Little White Lies: The case of the Mean and Standard Deviation Scottish Economic Society Conference 2022 (2022). www.ses2022.org/sessions/protecting-confidentiality-social-science-research-outputs

  8. R Core team: A Language and Environment for Statistical Computing (2021). https://www.R-project.org/

  9. Hyndman, R.J., Fan, Y.: Sample quantiles in statistical packages. Am. Stat. 50(4), 361–365 (1996)

    Google Scholar 

  10. Tukey, J.W.: Exploratory Data Analysis, p. 9780201076165. Addison-Wesley, ISBN (1977)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ben Derrick .

Editor information

Editors and Affiliations

Appendix

Appendix

See Tables A1, A2, and A3

Table A1. Number of unique solutions, data on 5-point scale.
Table A2. Number of unique solutions, data on 9-point scale.
Table A3. Number of unique solutions, data on 11-point scale.

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Derrick, B., Green, E., Ritchie, F., White, P. (2022). The Risk of Disclosure When Reporting Commonly Used Univariate Statistics. In: Domingo-Ferrer, J., Laurent, M. (eds) Privacy in Statistical Databases. PSD 2022. Lecture Notes in Computer Science, vol 13463. Springer, Cham. https://doi.org/10.1007/978-3-031-13945-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-13945-1_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-13944-4

  • Online ISBN: 978-3-031-13945-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics