A process for predicting manhole events in Manhattan

Rudin, Cynthia; Passonneau, Rebecca J.; Radeva, Axinia; Dutta, Haimonti; Ierome, Steve; Isaac, Delfina

doi:10.1007/s10994-009-5166-y

A process for predicting manhole events in Manhattan

Published: 28 January 2010

Volume 80, pages 1–31, (2010)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

A process for predicting manhole events in Manhattan

Download PDF

Cynthia Rudin¹,
Rebecca J. Passonneau²,
Axinia Radeva²,
Haimonti Dutta²,
Steve Ierome³ &
…
Delfina Isaac³

1609 Accesses
34 Citations
19 Altmetric
2 Mentions
Explore all metrics

Abstract

We present a knowledge discovery and data mining process developed as part of the Columbia/Con Edison project on manhole event prediction. This process can assist with real-world prioritization problems that involve raw data in the form of noisy documents requiring significant amounts of pre-processing. The documents are linked to a set of instances to be ranked according to prediction criteria. In the case of manhole event prediction, which is a new application for machine learning, the goal is to rank the electrical grid structures in Manhattan (manholes and service boxes) according to their vulnerability to serious manhole events such as fires, explosions and smoking manholes. Our ranking results are currently being used to help prioritize repair work on the Manhattan electrical grid.

Article PDF

Automatic Case Capturing for Problematic Drilling Situations

Application of Data Mining for Crime Analysis

Ranking Hazardous Chemicals with a Heuristic Approach to Reduce Isolated Objects in Hasse Diagrams

References

Azevedo, A., & Santos, M. F. (2008). KDD, SEMMA and CRISP-DM: a parallel overview. In Proceedings of the IADIS European conf. data mining (pp. 182–185).
Becker, H., & Arias, M. (2007). Real-time ranking with concept drift using expert advice. In Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’07) (pp. 86–94). New York: ACM.
Chapter Google Scholar
Boriah, S., Kumar, V., Steinbach, M., Potter, C., & Klooster, S. A. (2008). Land cover change detection: a case study. In Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’08) (pp. 857–865). New York: ACM.
Chapter Google Scholar
Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7), 1145–1159.
Article Google Scholar
Castano, R., Judd, M., Anderson, R. C., & Estlin, T. (2003). Machine learning challenges in Mars rover traverse science. In Workshop on machine learning technologies for autonomous space applications, international conference on machine learning.
Chen, G., & Peterson, A. T. (2002). Prioritization of areas in China for the conservation of endangered birds using modelled geographical distributions. Bird Conservation International, 12, 197–209.
Article Google Scholar
Chen, H., Chung, W., Xu, J. J., Wang, G., Qin, Y., & Chau, M. (2004). Crime data mining: a general framework and some examples. IEEE Computer, 37(4), 50–56.
Google Scholar
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.
Article Google Scholar
Cunningham, H., Maynard, D., Bontcheva, K., & Tablan, V. (2002). GATE: a framework and graphical development environment for robust NLP tools and applications. In Proceedings of the 40th anniversary meeting of the association for computational linguistics (ACL’02).
Devaney, M., & Ram, A. (2005). Preventing failures by mining maintenance logs with case-based reasoning. In Proceedings of the 59th meeting of the society for machinery failure prevention technology (MFPT-59).
Dudík, M., Phillips, S. J., & Schapire, R. E. (2007). Maximum entropy density estimation with generalized regularization and an application to species distribution modeling. Journal of Machine Learning Research, 8, 1217–1260.
Google Scholar
Dutta, H., Rudin, C., Passonneau, R., Seibel, F., Bhardwaj, N., Radeva, A., Liu, Z. A., & Ierome S, Isaac, D. (2008). Visualization of manhole and precursor-type events for the Manhattan electrical distribution system. In Proceedings of the workshop on geo-visualization of dynamics, movement and change, 11th AGILE international conference on geographic information science, Girona, Spain.
Fayyad, U., & Uthurusamy, R. (2002). Evolving data into mining solutions for insights. Communications of the ACM, 45(8), 28–31.
Article Google Scholar
Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI Magazine, 17, 37–54.
Google Scholar
Frawley, W. J., Piatetsky-Shapiro, G., & Matheus, C. J. (1992). Knowledge discovery in databases: an overview. AI Magazine, 13(3), 57–70.
Google Scholar
Freund, Y., Iyer, R., Schapire, R. E., & Singer, Y. (2003). An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4, 933–969.
Article MathSciNet Google Scholar
Google Earth (2009). http://www.earth.google.com.
Grishman, R., Hirschman, L., & Nhan, N. T. (1986). Discovery procedures for sublanguage selectional patterns: initial experiments. Computational Linguistics, 205–215.
Gross, P., Boulanger, A., Arias, M., Waltz, D. L., Long, P. M., Lawson, C., Anderson, R., Koenig, M., Mastrocinque, M., Fairechio, W., Johnson, J. A., Lee, S., Doherty, F., & Kressner, A. (2006). Predicting electricity distribution feeder failures using machine learning susceptibility analysis. In Proceedings of the eighteenth conference on innovative applications of artificial intelligence IAAI-06, Boston, Massachusetts.
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.
Article MATH Google Scholar
Hand, D. J. (1994). Deconstructing statistical questions. Journal of the Royal Statistical Society Series A (Statistics in Society), 157(3), 317–356.
Article MathSciNet Google Scholar
Harding, J. A., Shahbaz, M., Srinivas, & Kusiak, A. (2006). Data mining in manufacturing: a review. Journal of Manufacturing Science and Engineering, 128(4), 969–976.
Article Google Scholar
Harris, Z. (1982). Discourse and sublanguage. In Kittredge, R., & Lehrberger, J. (Eds.) Sublanguage: studies of language in restricted semantic domains (pp. 231–236). Berlin: de Gruyter.
Google Scholar
Hirschman, L., Palmer, M., Dowding, J., Dahl, D., Linebarger, M., Passonneau, R., Lang, F., Ball, C., & Weir, C. (1989). The PUNDIT natural-language processing system. In Proceedings of the annual AI systems in government conference (pp. 234–243).
Hsu, W., Lee, M. L., Liu, B., & Ling, T. W. (2000). Exploration mining in diabetic patients databases: findings and conclusions. In Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’00) (pp. 430–436). New York: ACM.
Chapter Google Scholar
Jiang, R., Yang, H., Zhou, L., Kuo, C. C. J., Sun, F., & Chen, T. (2007). Sequence-based prioritization of nonsynonymous single-nucleotide polymorphisms for the study of disease mutations. American Journal of Human Genetics, 81(2), 346–360.
Article Google Scholar
Kirtley, J. Jr., Hagman, W., Lesieutre, B., Boyd, M., Warren, E., Chou, H., & Tabors, R. (1996). Monitoring the health of power transformers. IEEE Computer Applications in Power, 9(1), 18–23.
Article Google Scholar
Kittredge, R. (1982). Sublanguages. American Journal of Computational Linguistics, 79–84.
Kittredge, R., Korelsky, T., & Rambow, O. (1991). On the need for domain communication knowledge. Computational Intelligence, 7(4), 305–314.
Article Google Scholar
Kohavi, R., & John, G. (1997). Wrappers for feature selection. Artificial Intelligence, 97(1–2), 273–324.
Article MATH Google Scholar
Krippendorff, K. (1980). Content analysis: an introduction to its methodology. Beverly Hills: Sage.
Google Scholar
Kusiak, A., & Shah, S. (2006). A data-mining-based system for prediction of water chemistry faults. IEEE Transactions on Industrial Electronics, 53(2), 593–603.
Article Google Scholar
Liddy, E. D., Symonenko, S., & Rowe, S. (2006). Sublanguage analysis applied to trouble tickets. In Proceedings of the Florida artificial intelligence research society conference (pp. 752–757).
Linebarger, M., Dahl, D., Hirschman, L., & Passonneau, R. (1988). Sentence fragments regular structures. In Proceedings of the 26th association for computational linguistics, Buffalo, NY.
Murray, J. F., Hughes, G. F., & Kreutz-Delgado, K. (2005). Machine learning methods for predicting failures in hard drives: a multiple-instance application. Journal of Machine Learning Research, 6, 783–816.
MathSciNet Google Scholar
National Institute of Standards and Technology (NIST), Information Access Division (ACE) Automatic Content Extraction Evaluation. http://www.itl.nist.gov/iad/mig/tests/ace/.
Oza, N., Castle, J. P., & Stutz, J. (2009). Classification of aeronautics system health and safety documents. IEEE Transactions on Systems, Man and Cybernetics, Part C, 39, 670–680.
Article Google Scholar
Passonneau, R., Rudin, C., Radeva, A., & Liu, Z. A. (2009). Reducing noise in labels and features for a real world dataset: application of NLP corpus annotation methods. In Proceedings of the 10th international conference on computational linguistics and intelligent text processing (CICLing).
Patel, K., Fogarty, J., Landay, J. A., & Harrison, B. (2008). Investigating statistical machine learning as a tool for software development. In Proceedings of ACM CHI 2008 conference on human factors in computing systems (CHI 2008) (pp. 667–676).
Radeva, A., Rudin, C., Passonneau, R., & Isaac, D. (2009). Report cards for manholes: eliciting expert feedback for a machine learning task. In Proceedings of the international conference on machine learning and applications.
Rudin, C. (2009). The P-Norm Push: a simple convex ranking algorithm that concentrates at the top of the list. Journal of Machine Learning Research, 10, 2233–2271.
Google Scholar
Sager, N. (1970). The sublanguage method in string grammars. In R. W. Ewton Jr. & J. Ornstein (Eds.), Studies in language and linguistics, University of Texas at El Paso (pp. 89–98).
Steed, J. (1995). Condition monitoring applied to power transformers-an REC view. In Second international conference on the reliability of transmission and distribution equipment (pp. 109–114).
Symonenko, S., Rowe, S., & Liddy, E. D. (2006). Illuminating trouble tickets with sublanguage theory. In Proceedings of the human language technology/North American association of computational linguistics conference.
Vilalta, R., & Ma, S. (2002). Predicting rare events in temporal domains. In IEEE international conference on data mining (pp. 474–481).
Weiss, G. M., & Hirsh, H. (2000). Learning to predict extremely rare events. In AAAI workshop on learning from imbalanced data sets (pp. 64–68). Menlo Park: AAAI Press.
Google Scholar

Download references

Author information

Authors and Affiliations

MIT Sloan School of Management, E53-323, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Cynthia Rudin
Center for Computational Learning Systems, Columbia University, 475 Riverside Dr., New York, NY, 10115, USA
Rebecca J. Passonneau, Axinia Radeva & Haimonti Dutta
Consolidated Edison Company of New York, 4 Irving Place, New York, NY, 10003, USA
Steve Ierome & Delfina Isaac

Authors

Cynthia Rudin
View author publications
You can also search for this author in PubMed Google Scholar
Rebecca J. Passonneau
View author publications
You can also search for this author in PubMed Google Scholar
Axinia Radeva
View author publications
You can also search for this author in PubMed Google Scholar
Haimonti Dutta
View author publications
You can also search for this author in PubMed Google Scholar
Steve Ierome
View author publications
You can also search for this author in PubMed Google Scholar
Delfina Isaac
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cynthia Rudin.

Additional information

Editor: Carla Brodley.

This work was done while Cynthia Rudin was at the Center for Computational Learning Systems at Columbia University.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rudin, C., Passonneau, R.J., Radeva, A. et al. A process for predicting manhole events in Manhattan. Mach Learn 80, 1–31 (2010). https://doi.org/10.1007/s10994-009-5166-y

Download citation

Received: 02 October 2008
Revised: 18 November 2009
Accepted: 12 December 2009
Published: 28 January 2010
Issue Date: July 2010
DOI: https://doi.org/10.1007/s10994-009-5166-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A process for predicting manhole events in Manhattan

Abstract

Article PDF

Similar content being viewed by others

Automatic Case Capturing for Problematic Drilling Situations

Application of Data Mining for Crime Analysis

Ranking Hazardous Chemicals with a Heuristic Approach to Reduce Isolated Objects in Hasse Diagrams

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A process for predicting manhole events in Manhattan

Abstract

Article PDF

Similar content being viewed by others

Automatic Case Capturing for Problematic Drilling Situations

Application of Data Mining for Crime Analysis

Ranking Hazardous Chemicals with a Heuristic Approach to Reduce Isolated Objects in Hasse Diagrams

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation