Abstract
Data Jacket (DJ) is a technique for sharing information about data and for considering the potential value of datasets, with the data itself hidden, by describing the summary of data in natural language. In DJs, variables are described by variable labels (VLs), which are the names/meanings of variables, and the utility of data is estimated through the discussion about combinations of VLs. However, DJs do not always contain VLs, because the description rule of DJs cannot force data owners to enter all the information about their data. Due to the lack of VLs in some DJs, even if DJs are related to each other, the connection cannot be made through string matching of VLs. In this paper, we propose a method for inferring VLs in DJs whose VLs are unknown, using the texts in outlines of DJs. We specifically focus on the similarity of the outlines of DJs and created two models for inferring VLs, i.e., the similarity of the outlines and the co-occurrence of VLs. The results of experiments show that our method works significantly better than the method using only the string matching of VLs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Acquisti, A., Gross, R.: Predicting social security numbers from public data. Proc. Nat. Acad. Sci. 106(27), 10975–10980 (2009)
Xu, L., Jiang, C., Wang, J., Yuan, J., Ren, Y.: Information security in big data: privacy and data mining. IEEE Access 2, 1149–1176 (2014)
Ohsawa, Y., Kido, H., Hayashi, T., Liu, C.: Data Jackets for synthesizing values in the market of data. In: 17th International Conference in Knowledge Based and Intelligent Information and Engineering Systems, vol. 22, pp. 709–716 (2013)
Ohsawa, Y., Liu, C., Suda, Y., Kido, H.: Innovators marketplace on Data Jackets for externalizing the value of data via stakeholders’ requirement communication. In: Proceedings of AAAI 2014 Spring Symposium on Big Data Becomes Personal: Knowledge into Meaning, AAAI Technical report, pp. 45–50 (2014)
Ohsawa, Y., Kido, H., Hayashi, T., Liu, C., Komoda, K.: Innovators marketplace on Data Jackets, for valuating, sharing, and synthesizing data. In: Tweedale, J.W., Jain, L.C., Watada, J., Howlett, R.J. (eds.) Knowledge-Based Information Systems in Practice. SIST, vol. 30, pp. 83–97. Springer, Cham (2015). doi:10.1007/978-3-319-13545-8_6
Hayashi, T., Ohsawa, Y.: Processing combinatorial thinking: innovators marketplace as role-based game plus action planning. Int. J. Knowl. Syst. Sci. 4(3), 14–38 (2013)
Ohsawa, Y., Benson, N.E., Yachida, M.: KeyGraph: automatic indexing by co-occurrence graph based on building construction metaphor. In: Proceedings of Advanced Digital Library Conference, pp. 12–18 (1998)
Kudo, T., Matsumoto, Y.: Japanese dependency structure analysis based on support vector machines. In: Proceedings of EMNLP, pp. 18–25 (2000)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)
Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37, 141–188 (2010)
Buckley, C., Voorhees, E.M.: Evaluating evaluation measure stability. In: Proceedings of SIGIR, pp. 33–40 (2000)
Acknowledgments
This study was partially supported by JST-CREST, and JSPS KAKENHI Grant Number JP16J06450. Also we would like to thank all the staff members of Kozo Keikaku Engineering Inc. for supporting our research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Hayashi, T., Ohsawa, Y. (2017). Matrix-Based Method for Inferring Variable Labels Using Outlines of Data in Data Jackets. In: Kim, J., Shim, K., Cao, L., Lee, JG., Lin, X., Moon, YS. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science(), vol 10235. Springer, Cham. https://doi.org/10.1007/978-3-319-57529-2_54
Download citation
DOI: https://doi.org/10.1007/978-3-319-57529-2_54
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57528-5
Online ISBN: 978-3-319-57529-2
eBook Packages: Computer ScienceComputer Science (R0)