Characterizing Mammography Reports for Health Analytics

Rojas, Carlos C.; Patton, Robert M.; Beckerman, Barbara G.

doi:10.1007/s10916-011-9685-2

Characterizing Mammography Reports for Health Analytics

Original Paper
Published: 14 June 2011

Volume 35, pages 1197–1210, (2011)
Cite this article

Journal of Medical Systems Aims and scope Submit manuscript

Carlos C. Rojas¹,
Robert M. Patton¹ &
Barbara G. Beckerman¹

235 Accesses
3 Citations
Explore all metrics

Abstract

As massive collections of digital health data are becoming available, the opportunities for large-scale automated analysis increase. In particular, the widespread collection of detailed health information is expected to help realize a vision of evidence-based public health and patient-centric health care. Within such a framework for large scale health analytics we describe the transformation of a large data set of mostly unlabeled and free-text mammography data into a searchable and accessible collection, usable for analytics. We also describe several methods to characterize and analyze the data, including their temporal aspects, using information retrieval, supervised learning, and classical statistical techniques. We present experimental results that demonstrate the validity and usefulness of the approach, since the results are consistent with the known features of the data, provide novel insights about it, and can be used in specific applications. Additionally, based on the process of going from raw data to results from analysis, we present the architecture of a generic system for health analytics from clinical notes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Empowering study of breast cancer data with application of artificial intelligence technology: promises, challenges, and use cases

Article Open access 26 October 2021

Maryam Panahiazar, Nolan Chen, … Dexter Hadley

Evaluation of an Automated Information Extraction Tool for Imaging Data Elements to Populate a Breast Cancer Screening Registry

Article 06 January 2015

Ronilda Lacson, Kimberly Harris, … Jennifer S. Haas

Data Mining Techniques in Health Informatics: A Case Study from Breast Cancer Research

Notes

http://apps.who.int/classifications/apps/icd/icd10online/index.htm?navi.htm+ka00
Breast Imaging Reporting and Data System, developed by the American College of Radiology.
This, of course, does not hold for every document and every human (within a given language) since specialized terminology is not universally accessible. It is, however, a reasonable assumption within a field, e.g., health sciences.

References

TREC-5, http://trec.nist.gov, 1999.
North Carolina Medical Journal. Special Issue on Data and Health Policy, 2008.
Aronow, D. B., Fangfang, F., and Croft, W. B., Ad hoc classification of radiology reports. J. Am. Med. Inform. Assoc., 6(5):393–411, 1999.
Article Google Scholar
Bakalar, R., IBM’s vision for the future in patient-centric global health care: IBM’s vision of how advanced health analytics and automated health information infrastructure will transform anatomic pathology services. Arch. Pathol. Lab. Med., 132(5):766–771, 2008.
Google Scholar
Berndt, D. J., and Clifford, J., Using dynamic time warping to find patterns in time series. In: KDD Workshop, pp. 359–370, 1994.
Borg, I., and Groenen, P., Modern Multidimensional Scaling: Theory and Applications. Springer, 1996.
Burnside, B., Strasberg, H., and Rubin, D., Automated indexing of mammography reports using linear least squares fit. In: Proc. of the 14th International Congress and Exhibition on Computer Assisted Radiology and Surgery, pp. 449–454, 2000.
Chapman, W. W., Bridewell, W., Hanbury, P., Cooper, G. F., and Buchanan, B. G., Evaluation of negation phrases in narrative clinical reports. In: Proc AMIA Symp, pp. 105–109, 2001.
Dumais, S., Faceted search. Encyclopedia of Database Systems, pp. 1103–1109, 2009.
Giger, M., Computer-aided diagnosis of breast lesions in medical images. Comput. Sci. Eng. 2(5):39–45, 2000.
Article Google Scholar
Harkema, H., Setzer, A., Gaizauskas, R., and Hepple, M., Mining and modelling temporal clinical data. In: Proceedings of the UK e-Science All Hands Meeting, 2005.
Heffernan, R., Mostashari, F., Das, D., Karpati, A., Kulldorff, M., and Weiss, D., Syndromic surveillance in public health practice, New York City. Emerg. Infect. Dis. 10(5):858–64, 2004.
Google Scholar
Howell, C., Stimulus package contains $19 billion for health care technology spending and adoption of electronic health records. Wisconsin Technology Network news, February 19 2009. (Retrieved 29 April 2010, at http://wistechnology.com/articles/5523/).
Jain, N. L., and Friedman, C., Identification of findings suspicious for breast cancer based on natural language processing of mammogram reports. In: Proc AMIA Annu Fall Symp, pp. 829–833, 1997.
Jolliffe, I., Principal Component Analysis. Springer, 2002.
Lohr, S., Tech Companies Push to Digitize Patients’ Records. New York Times, September 10 2009.
Ma, F., Bajger, M., and Bottema, M., Temporal analysis of mammograms based on graph matching. Digital Mammography, pp. 158–165, 2010.
McCallum, A. K., Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/~mccallum/bow, 1996.
Meystre, S. M., Savova, G. K., Kipper-Schuler, K. C., and Hurdle, J. F., Extracting information from textual documents in the electronic health record: A review of recent research. In: Yearb Med Inform, pp. 128–144, 2008.
Mitchell, T. M., Machine Learning, 1st edn.. New York, NY: McGraw-Hill, Inc, 1997.
MATH Google Scholar
Nassif, H., Woodsz, R., Burnsidey, E., Ayvacix, M., Shavlik, J., and Page, D., Information extraction for clinical data mining: A mammography case study. In: ICDM - DDDM09 Workshop, 2009.
Norén, G., Hopstadius, J., Bate, A., Star, K., and Edwards, I., Temporal pattern discovery in longitudinal electronic patient records. Data Mining and Knowledge Discovery 20:1–27, 2010.
Article MathSciNet Google Scholar
Patton, R. M., Potok, T. E., Beckerman, B. G., and Treadwell, J. N., A genetic algorithm for learning significant phrase patterns in radiology reports. In: GECCO ’09: Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference, pp. 2665–2670. New York, NY: ACM, 2009.
Chapter Google Scholar
Porter, M. F., An algorithm for suffix stripping. Program: Electronic Library and Information Systems 14(3):130–137, 1980.
Article Google Scholar
Reed, J. W., Jiao, Y., Potok, T. E., Klump, B. A., Elmore, M. T., and Hurson, A. R., Tf-icf: A new term weighting scheme for clustering dynamic data streams. In: ICMLA ’06: Proceedings of the 5th International Conference on Machine Learning and Applications, pp. 258–263. Washington, DC: IEEE Computer Society, 2006.
Google Scholar
Roelofs, A., Karssemeijer, N., Wedekind, N., Beck, C., van Woudenberg, S., Snoeren, P., Hendriks, J., Rosselli del Turco, M., Bjurstam, N., Junkermann, H., et al., Importance of comparison of current and prior mammograms in breast cancer screening. Radiology 242(1):70, 2007.
Article Google Scholar
Rokach, L., Romano, R., and Maimon, O., Negation recognition in medical narrative reports. Inf. Retr. 11(6):499–538, 2008.
Article Google Scholar
Sakoe, H., and Chiba, S., Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 26(1):43–49, 1978.
Article MATH Google Scholar
Salton, G., and Buckley, C., Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5):513–523, 1988.
Article Google Scholar
Sebastiani, F., Machine learning in automated text categorization. ACM Comput. Surv. 34:1–47, 2002.
Article MathSciNet Google Scholar
Studnicki, J., Fisher, J. W., and Eichelberger, C. N., NC- CATCH: North Carolina comprehensive assessment for tracking community health. [2], pp. 122–126.
Tang, J., Rangayyan, R., Xu, J., El Naqa, I., and Yang, Y., Computer-aided detection and diagnosis of breast cancer with mammography: Recent advances. IEEE Trans. Inf. Technol. Biomed. 13(2):236–251, 2009.
Article Google Scholar
Timp, S., Varela, C., and Karssemeijer, N., Temporal change analysis for characterization of mass lesions in mammography. IEEE Trans. Med. Imag. 26(7):945–953, 2007.
Article Google Scholar
Yi, B.-K., Jagadish, H. V., and Faloutsos, C., Efficient retrieval of similar time sequences under time warping. In: ICDE ’98: Proceedings of the Fourteenth International Conference on Data Engineering, pp. 201–208. Washington, DC: IEEE Computer Society, 1998.
Google Scholar

Download references

Acknowledgements

We thank Robert M. Nishikawa, Ph.D., Department of Radiology, University of Chicago, for providing the large dataset of unstructured mammography reports.

Prepared by Oak Ridge National Laboratory, P. O. Box 2008, Oak Ridge, Tennessee, 37831-6285, managed by UT-Battelle, LLC, for the U.S. Department of Energy Under contract DE-AC05-00OR22725. Research partially sponsored by the Laboratory Directed Research and Development Program of Oak Ridge National Laboratory, LDRD #5327.

This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the U.S. Department of Energy.

Author information

Authors and Affiliations

Oak Ridge National Lab, One Bethel Valley Road, P.O. Box 2008, MS-6085, Oak Ridge, TN, 37831-6085, USA
Carlos C. Rojas, Robert M. Patton & Barbara G. Beckerman

Authors

Carlos C. Rojas
View author publications
You can also search for this author in PubMed Google Scholar
Robert M. Patton
View author publications
You can also search for this author in PubMed Google Scholar
Barbara G. Beckerman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carlos C. Rojas.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rojas, C.C., Patton, R.M. & Beckerman, B.G. Characterizing Mammography Reports for Health Analytics. J Med Syst 35, 1197–1210 (2011). https://doi.org/10.1007/s10916-011-9685-2

Download citation

Received: 16 January 2011
Accepted: 13 March 2011
Published: 14 June 2011
Issue Date: October 2011
DOI: https://doi.org/10.1007/s10916-011-9685-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Characterizing Mammography Reports for Health Analytics

Abstract

Access this article

Similar content being viewed by others

Empowering study of breast cancer data with application of artificial intelligence technology: promises, challenges, and use cases

Evaluation of an Automated Information Extraction Tool for Imaging Data Elements to Populate a Breast Cancer Screening Registry

Data Mining Techniques in Health Informatics: A Case Study from Breast Cancer Research

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Characterizing Mammography Reports for Health Analytics

Abstract

Access this article

Similar content being viewed by others

Empowering study of breast cancer data with application of artificial intelligence technology: promises, challenges, and use cases

Evaluation of an Automated Information Extraction Tool for Imaging Data Elements to Populate a Breast Cancer Screening Registry

Data Mining Techniques in Health Informatics: A Case Study from Breast Cancer Research

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation