Skip to main content

Matrix-Based Method for Inferring Variable Labels Using Outlines of Data in Data Jackets

  • Conference paper
  • First Online:
Book cover Advances in Knowledge Discovery and Data Mining (PAKDD 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10235))

Included in the following conference series:

Abstract

Data Jacket (DJ) is a technique for sharing information about data and for considering the potential value of datasets, with the data itself hidden, by describing the summary of data in natural language. In DJs, variables are described by variable labels (VLs), which are the names/meanings of variables, and the utility of data is estimated through the discussion about combinations of VLs. However, DJs do not always contain VLs, because the description rule of DJs cannot force data owners to enter all the information about their data. Due to the lack of VLs in some DJs, even if DJs are related to each other, the connection cannot be made through string matching of VLs. In this paper, we propose a method for inferring VLs in DJs whose VLs are unknown, using the texts in outlines of DJs. We specifically focus on the similarity of the outlines of DJs and created two models for inferring VLs, i.e., the similarity of the outlines and the co-occurrence of VLs. The results of experiments show that our method works significantly better than the method using only the string matching of VLs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://data.gov.uk/.

  2. 2.

    https://www.data.gov/.

  3. 3.

    http://www.panda.sys.t.u-tokyo.ac.jp/hayashi/djs/djs4ddi/.

  4. 4.

    http://taku910.github.io/mecab/.

  5. 5.

    http://open-data.pref.shizuoka.jp/.

References

  1. Acquisti, A., Gross, R.: Predicting social security numbers from public data. Proc. Nat. Acad. Sci. 106(27), 10975–10980 (2009)

    Article  Google Scholar 

  2. Xu, L., Jiang, C., Wang, J., Yuan, J., Ren, Y.: Information security in big data: privacy and data mining. IEEE Access 2, 1149–1176 (2014)

    Article  Google Scholar 

  3. Ohsawa, Y., Kido, H., Hayashi, T., Liu, C.: Data Jackets for synthesizing values in the market of data. In: 17th International Conference in Knowledge Based and Intelligent Information and Engineering Systems, vol. 22, pp. 709–716 (2013)

    Google Scholar 

  4. Ohsawa, Y., Liu, C., Suda, Y., Kido, H.: Innovators marketplace on Data Jackets for externalizing the value of data via stakeholders’ requirement communication. In: Proceedings of AAAI 2014 Spring Symposium on Big Data Becomes Personal: Knowledge into Meaning, AAAI Technical report, pp. 45–50 (2014)

    Google Scholar 

  5. Ohsawa, Y., Kido, H., Hayashi, T., Liu, C., Komoda, K.: Innovators marketplace on Data Jackets, for valuating, sharing, and synthesizing data. In: Tweedale, J.W., Jain, L.C., Watada, J., Howlett, R.J. (eds.) Knowledge-Based Information Systems in Practice. SIST, vol. 30, pp. 83–97. Springer, Cham (2015). doi:10.1007/978-3-319-13545-8_6

    Google Scholar 

  6. Hayashi, T., Ohsawa, Y.: Processing combinatorial thinking: innovators marketplace as role-based game plus action planning. Int. J. Knowl. Syst. Sci. 4(3), 14–38 (2013)

    Article  Google Scholar 

  7. Ohsawa, Y., Benson, N.E., Yachida, M.: KeyGraph: automatic indexing by co-occurrence graph based on building construction metaphor. In: Proceedings of Advanced Digital Library Conference, pp. 12–18 (1998)

    Google Scholar 

  8. Kudo, T., Matsumoto, Y.: Japanese dependency structure analysis based on support vector machines. In: Proceedings of EMNLP, pp. 18–25 (2000)

    Google Scholar 

  9. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  10. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)

    Article  Google Scholar 

  11. Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37, 141–188 (2010)

    MathSciNet  MATH  Google Scholar 

  12. Buckley, C., Voorhees, E.M.: Evaluating evaluation measure stability. In: Proceedings of SIGIR, pp. 33–40 (2000)

    Google Scholar 

Download references

Acknowledgments

This study was partially supported by JST-CREST, and JSPS KAKENHI Grant Number JP16J06450. Also we would like to thank all the staff members of Kozo Keikaku Engineering Inc. for supporting our research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Teruaki Hayashi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Hayashi, T., Ohsawa, Y. (2017). Matrix-Based Method for Inferring Variable Labels Using Outlines of Data in Data Jackets. In: Kim, J., Shim, K., Cao, L., Lee, JG., Lin, X., Moon, YS. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science(), vol 10235. Springer, Cham. https://doi.org/10.1007/978-3-319-57529-2_54

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-57529-2_54

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-57528-5

  • Online ISBN: 978-3-319-57529-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics