Skip to main content
Log in

Multi-class segmentation of free-form online documents with tree conditional random fields

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

We present a new system for predicting the segmentation of online handwritten documents into multiple blocks, such as text paragraphs, tables, graphics, or mathematical expressions. A hierarchical representation of the document is adopted by aggregating strokes into blocks, and interactions between different levels are modeled in a tree Conditional Random Field. Features are extracted, and labels are predicted at each tree level with logistic classifiers, and Belief Propagation is adopted for optimal inference over the structure. Being fully trainable, the system is shown to properly handle difficult segmentation problems arising in unconstrained online note-taking documents, where no prior knowledge is available regarding the layout or the expected content. Our experiments show very promising results and allow to envision fully automatic segmentation of free-form online notes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. For IAM-OnDo dataset, a more elaborate temporal distance could be adopted since each sampled point has its own timestamp.

References

  1. Awasthi, P., Gagrani, A., Ravindran, B.: Image modeling using tree structured conditional random fields. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 2060–2065 (2007)

  2. Bishop, C.M., Svensen, M., Hinton, G.E.: Distinguishing text from graphics in on-line handwritten ink. In: Proceedings of the 9th International Workshop on Frontiers in Handwriting Recognition, pp. 142–147. IEEE (2004)

  3. Blanchard, J., Artieres, T.: On-line handwritten documents segmentation. In: Proceedings of the 9th International Workshop on Frontiers in Handwriting Recognition, pp. 148–153. IEEE (2004)

  4. Boix, X., Gonfaus, J.M., van de Weijer, J., Bagdanov, A.D., Serrat, J., Gonzàlez, J.: Harmony potentials. Int. J. Comput. Vis. 96(1):83–102 (2012)

  5. Delaye, A., Anquetil, E.: Hbf49 feature set: A first unified baseline for online symbol recognition. Pattern Recognition 46(1), 117–130 (2013)

    Article  Google Scholar 

  6. Delaye, A., Liu, C.-L.: Text/non-text classification in online handwritten documents with conditional random fields. In: Liu, C.-L., Zhang, C., Wang, L. (eds.) Proceedings of the Chinese Conference on Pattern Recognition, volume 0321 of Communications in Computer and Information Science, pp. 514–521. Springer, Heidelberg (2012)

    Google Scholar 

  7. Delaye, A., Liu, C.-L.: Context modeling for text/non-text separation in freeform online handwritten documents. In: Proceedings of the 19th Document Recognition and Retrieval Conference, part of the IS&T-SPIE Electronic Imaging Symposium, SPIE Proceedings, pp. 86580C–86580C. SPIE (2013)

  8. Delaye, A., Liu, C.-L.: Graphics extraction from heterogeneous online documents with hierarchical random fields. In: Proceedings of the 12th International Conference on Document Analysis and Recognition, pp. 1007–1011 (2013)

  9. Delaye, A., Liu, C.-L.: Contextual text/non-text stroke classification in online handwritten notes with conditional random fields. Pattern Recognition 47(3), 959–968 (2014)

    Article  Google Scholar 

  10. Delaye, A., Macé, S., Anquetil, E.: Modeling relative positioning of handwritten patterns. In: Proceedings of the 14th Biennial Conference of the International Graphonomics Society, pp. 122–127 (2009)

  11. He, X., Zemel, R.S., Carreira-Perpinán, M.A.: Multiscale conditional random fields for image labeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 695–702. IEEE (2004)

  12. Indermühle, E., Bunke, H., Shafait, F., Breuel, T.: Text versus non-text distinction in online handwritten documents. In: Proceedings of the 2010 ACM Symposium on Applied Computing, pp. 3–7. ACM (2010)

  13. Indermühle, E., Frinken, V., Bunke, H.: Mode detection in online handwritten documents using BLSTM neural networks. In: Proceedings of the 13th International Conference on Frontiers in Handwriting Recognition, pp. 302–307 (2012)

  14. Indermühle, E., Liwicki, M., Bunke, H.: IAMonDo-database: An online handwritten document database with non-uniform contents. In: Document Analysis Systems, pp. 97–104 (2010)

  15. Jain, A.K., Namboodiri, A.M., Subrahmonia, J.: Structure in on-line documents. In: Proceedings of the 6th International Conference on Document Analysis and Recognition, pp. 844–848 (2001)

  16. Kschischang, F.R., Frey, B.J., Loeliger, H.-A.: Factor graphs and the sum-product algorithm. IEEE Trans. Inf. Theory 47(2), 498–519 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  17. Kumar, S., Hebert, M.: Man-made structure detection in natural images using a causal multiscale random field. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 119–126. IEEE (2003)

  18. Ladicky, L., Russell, C., Kohli, P., Torr, P.H.: Associative hierarchical crfs for object class image segmentation. In: Computer Vision, 2009 IEEE 12th International Conference on, pp. 739–746. IEEE (2009)

  19. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the International Conference on Machine Learning, pp. 282–289. Citeseer (2001)

  20. Lemaitre, A., Camillerapp, J., Coüasnon, B.: Multiresolution cooperation makes easier document structure recognition. Int. J. Doc. Anal. Recognit. 11(2), 97–109 (2008)

    Article  Google Scholar 

  21. Lin, Z., He, J., Zhong, Z., Shum, H.-Y.: Table detection in online ink notes. IEEE Trans. Pattern Anal. Mach. Intell. 28, 1341–1346 (2006)

    Article  Google Scholar 

  22. Liu, D.C., Nocedal, J.: On the limited memory bfgs method for large scale optimization. Math. Program. 45(1), 503–528 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  23. Liwicki, M., Indermühle, E., Bunke, H.: On-line handwritten text line detection using dynamic programming. In: Proceedings of the 9th International Conference on Document Analysis and Recognition, vol. 1, pp. 447–451. IEEE (2007)

  24. Montreuil, F., Grosicki, E., Heutte, L., Nicolas, S.: Unconstrained handwritten document layout extraction using 2d conditional random fields. In: Proceedings of the 10th International Conference on Document Analysis and Recognition, pp. 853–857. IEEE (2009)

  25. Montreuil, F., Nicolas, S., Grosicki, E., Heutte, L.: A new hierarchical handwritten document layout extraction based on conditional random field modeling. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition, pp. 31–36. IEEE (2010)

  26. Nowozin, S. ,Gehler, P.V., Lampert, C.H.: On parameter learning in crf-based approaches to object class image segmentation. In: European Conference on Computer Vision, pp. 98–111. Springer (2010)

  27. Otte, S., Krechel, D., Liwicki, M., Dengel, A.: Local feature based online mode detection with recurrent neural networks. In: Proceedings of the 13th International Conference on Frontiers in Handwriting Recognition, pp. 531–535 (2012)

  28. Plath, N., Toussaint, M., Nakajima, S.: Multi-class image segmentation using conditional random fields and global classification. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 817–824. ACM (2009)

  29. Quattoni, A., Collins, M., Darrell, T.: Conditional random fields for object recognition. In: Proceedings of the 11th International Conference on Neural Information Processing. Citeseer (2004)

  30. Reynolds, J., Murphy, K.: Figure-ground segmentation using a hierarchical conditional random field. In: Proceedings of the 4th Canadian Conference on Computer and Robot Vision, pp. 175–182. IEEE (2007)

  31. Shi, Z., Govindaraju, V.: Multi-scale techniques for document page segmentation. In: Proceedings of the 8th International Conference on Document Analysis and Recognition, pp. 1020–1024. IEEE (2005)

  32. Shilman, M., Wei, Z., Raghupathy, S., Simard, P., Jones, D.: Discerning structure from freeform handwritten notes. In: Proceedings of the 7th International Conference on Document Analysis and Recognition, pp. 60–65. IEEE (2003)

  33. Sutton, C.A., McCallum, A.: An introduction to conditional random fields. Found. Trends Mach. Learn. 4(4), 267–373 (2012)

    Article  Google Scholar 

  34. Szummer, M.: Learning diagram parts with hidden random fields. In: Proceedings of the 8th International Conference on Document Analysis and Recognition, pp. II, 1188–1193 (2005)

  35. Wang, S.B., Quattoni, A., Morency, L.-P., Demirdjian, D., Darrell. T.: Hidden conditional random fields for gesture recognition. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1521–1527. IEEE (2006)

  36. Weber, M., Liwicki, M., Schelske, Y.T.H., Schoelzel, C., Strauß, F., Dengel, F.: Mcs for online mode detection: Evaluation on pen-enabled multi-touch interfaces. In: Proceedings of the 11th International Conference on Document Analysis and Recognition, pp. 957–961. IEEE (2011)

  37. Willems, D., Rossignol, S., Vuurpijl, L.: Mode detection in on-line pen drawing and handwriting recognition. In: Proceedings of the 8th International Conference on Document Analysis and Recognition, pp. 31–35. IEEE (2005)

  38. Yao, J., Fidler, J., Urtasun, R.: Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 702–709. IEEE (2012)

  39. Yedidia, J.S., Freeman, W.T., Weiss, Y.: Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans. Inf. Theory 51(7), 2282–2312 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  40. Yin, F., Liu, C.-L.: Handwritten chinese text line segmentation by clustering with distance metric learning. Pattern Recognit. 42(12), 3146–3157 (2009)

  41. Zhang, X., Lyu, M.R., Dai, G.: Extraction and segmentation of tables from chinese ink documents based on a matrix model. Pattern Recognit. 40(7), 1855–1867 (2007)

    Article  MATH  Google Scholar 

  42. Zhou, X.-D., Liu, C.-L.: Text/non-text ink stroke classification in Japanese handwriting based on markov random fields. In: Proceedings of the 9th International Conference on Document Analysis and Recognition, vol. 1, pp. 377–381. IEEE (2007)

  43. Zhou, X.-D., Wang, D.-H., Liu, C.-L.: A robust approach to text line grouping in online handwritten Japanese documents. Pattern Recognit. 42(9), 2077–2088 (2009)

    Article  MATH  Google Scholar 

Download references

Acknowledgments

This work is supported by the Chinese Academy of Sciences under the Fellowships for Young International Scientists program (No. 2012Y1GB0001), and by National Natural Science Fundation of China under the Research Fund for International Young Scientists (No. 61250110082).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adrien Delaye.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Delaye, A., Liu, CL. Multi-class segmentation of free-form online documents with tree conditional random fields. IJDAR 17, 313–329 (2014). https://doi.org/10.1007/s10032-014-0221-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-014-0221-z

Keywords

Navigation