skip to main content
10.1145/3080546.3080547acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Using contexts and constraints for improved geotagging of human trafficking webpages

Published:14 May 2017Publication History

ABSTRACT

Extracting geographical tags from webpages is a well-motiva-ted application in many domains. In illicit domains with unusual language models, like human trafficking, extracting geotags with both high precision and recall is a challenging problem. In this paper, we describe a geotag extraction framework in which context, constraints and the openly available Geonames knowledge base work in tandem in an Integer Linear Programming (ILP) model to achieve good performance. In preliminary empirical investigations, the framework improves precision by 28.57% and F-measure by 36.9% on a difficult human trafficking geotagging task compared to a machine learning-based baseline. The method is already being integrated into an existing knowledge base construction system widely used by US law enforcement agencies to combat human trafficking.

References

  1. C.-H. Chang, M. Kayed, M. R. Girgis, and K. F. Shaalan. A survey of web information extraction systems. IEEE transactions on knowledge and data engineering, 18(10):1411--1428, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Collobert and J. Weston. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning, pages 160--167. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Dubrawski, K. Miller, M. Barnes, B. Boecking, and E. Kennedy. Leveraging publicly available data to discern patterns of human-trafficking activity. Journal of Human Trafficking, 1(1):65--85, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  4. R. S. Garfinkel and G. L. Nemhauser. Integer programming, volume 4. Wiley New York, 1972.Google ScholarGoogle Scholar
  5. B. Han, P. Cook, and T. Baldwin. Text-based twitter user geolocation prediction. Journal of Artificial Intelligence Research, 49:451--500, 2014. Google ScholarGoogle ScholarCross RefCross Ref
  6. M. Kejriwal and P. Szekely. Information extraction in illicit domains. arXiv preprint arXiv:1703.03097, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. N. Kushmerick. Wrapper induction for information extraction. PhD thesis, University of Washington, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. L. Leidner. Toponym resolution in text: Annotation, evaluation and applications of spatial grounding of place names. Universal-Publishers, 2008.Google ScholarGoogle Scholar
  9. F. Niu, C. Zhang, C. Ré, and J. W. Shavlik. Deepdive: Web-scale knowledge-base construction using statistical learning and inference. VLDS, 12:25--28, 2012.Google ScholarGoogle Scholar
  10. G. Optimization et al. Gurobi optimizer reference manual. URL: http://www.gurobi.com, 2:1--3, 2012.Google ScholarGoogle Scholar
  11. F. Ostermann. Hybrid geo-information processing: Crowdsourced supervision of geo-spatial machine learning tasks. In Proceedings of the 18th AGILE International Conference on Geographic Information Science, Lisbon, Portugal, pages 9--12, 2015.Google ScholarGoogle Scholar
  12. E. Riloff, R. Jones, et al. Learning dictionaries for information extraction by multi-level bootstrapping. In AAAI/IAAI, pages 474--479, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. B. Roark and E. Charniak. Noun-phrase co-occurrence statistics for semiautomatic semantic lexicon construction. In Proceedings of the 17th international conference on Computational linguistics-Volume 2, pages 1110--1116. Association for Computational Linguistics, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Speriosu and J. Baldridge. Text-driven toponym resolution using indirect supervision. In ACL (1), pages 1466--1476, 2013.Google ScholarGoogle Scholar
  15. P. Szekely, C. A. Knoblock, J. Slepicka, A. Philpot, A. Singh, C. Yin, D. Kapoor, P. Natarajan, D. Marcu, K. Knight, et al. Building and using a knowledge graph to combat human trafficking. In International Semantic Web Conference, pages 205--221. Springer, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  16. M. Wick and C. Boutreux. Geonames. GeoNames Geographical Database, 2011.Google ScholarGoogle Scholar
  1. Using contexts and constraints for improved geotagging of human trafficking webpages

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      GeoRich '17: Proceedings of the Fourth International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data
      May 2017
      54 pages
      ISBN:9781450350471
      DOI:10.1145/3080546

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 May 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      GeoRich '17 Paper Acceptance Rate8of10submissions,80%Overall Acceptance Rate25of50submissions,50%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader