Abstract
Trees representing hierarchical knowledge are prevalent in biology and medicine. Some examples are phylogenetic trees, the hierarchical structure of biological tissues and cell lines. The increasing throughput of techniques generating such trees poses new challenges to the analysis of tree ensembles. Some typical tasks include the determination of common patterns of lineage decisions in cellular differentiation trees. Partitioning the dataset is crucial for further analysis of the cellular genealogies. In this work, we develop a method to cluster labeled binary tree structures. Furthermore, for every cluster our method selects a centroid tree that captures the characteristic mitosis patterns of the group. We evaluate this technique on synthetic data and apply it to experimental trees that embody the lineages of differentiating cells under specific conditions over time. The results of the cell lineage trees are thoroughly interpreted with expert domain knowledge.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arnaudon, M., Barbaresco, F., Yang, L.: Medians and means in riemannian geometry: Existence, uniqueness and computation. In: Nielsen, F., Bhatia, R. (eds.) Matrix Information Geometry, pp. 169–197. Springer, Heidelberg (2013)
Arora, S., Lund, C., Motwani, R., Sudan, M., Szegedy, M.: Proof verification and the hardness of approximation problems. Journal of ACM 45, 501–555 (1998)
Asai, T., Arimura, H., Uno, T., Nakano, S.-I.: Discovering frequent substructures in large unordered trees. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds.) DS 2003. LNCS (LNAI), vol. 2843, pp. 47–61. Springer, Heidelberg (2003)
Bille, P.: A survey on tree edit distance and related problems. Theoretical Computer Science 337(1-3), 217–239 (2005)
Bishop, C.: Pattern recognition and machine learning. Information science and statistics. Springer (2006)
Brusco, M., Köhn, H.: Optimal partitioning of a data set based on the p-median model. Psychometrika 73, 89–105 (2008)
Ferrer, M., Valveny, E., Serratosa, F., Bardají, I., Bunke, H.: Graph-based k-means clustering: A comparison of the set median versus the generalized median graph. In: Jiang, X., Petkov, N. (eds.) CAIP 2009. LNCS, vol. 5702, pp. 342–350. Springer, Heidelberg (2009)
Hadzic, F., Tan, H., Dillon, T.S.: Tree mining applications. In: Hadzic, F., Tan, H., Dillon, T.S. (eds.) Mining of Data with Complex Structures. SCI, vol. 333, pp. 201–247. Springer, Heidelberg (2011)
Helmer, S., Augsten, N., Böhlen, M.: Measuring structural similarity of semistructured data based on information-theoretic approaches. The VLDB Journal 21(5), 677–702 (2012)
Jain, B.J., Wysotzki, F.: Central clustering of attributed graphs. Machine Learning 56(1-3), 169–207 (2004)
Klein, P., Tirthapura, S., Sharvit, D., Kimia, B.: A tree-edit-distance algorithm for comparing simple, closed shapes. In: Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2000, pp. 696–704. Society for Industrial and Applied Mathematics, Philadelphia (2000)
Land, A.H., Doig, A.G.: An automatic method of solving discrete programming problems. Econometrica 28, 497–520 (1960)
Luo, B., Wilson, R.C., Hancock, E.R.: Spectral embedding of graphs. Pattern Recognition 36, 2213–2230 (2003)
Luo, B., Robles-Kelly, A., Torsello, A., Wilson, R.C., Hancock, E.R.: Discovering shape categories by clustering shock trees. In: Skarbek, W. (ed.) CAIP 2001. LNCS, vol. 2124, pp. 152–160. Springer, Heidelberg (2001)
Marinai, S., Marino, E., Soda, G.: Tree clustering for layout-based document image retrieval. In: DIAL 2006: Proceedings of the Second International Conference on Document Image Analysis for Libraries (DIAL 2006), pp. 243–253. IEEE Computer Society (2006)
Marr, C., Strasser, M., Schwarzfischer, M., Schroeder, T., Theis, F.J.: Multi-scale modeling of gmp differentiation based on single-cell genealogies. FEBS J. 279(18), 3488–3500 (2012)
Mladenovic, N., Brimberg, J., Hansen, P., Moreno-Perez, J.: The p-median problem: A survey of metaheuristic approaches. European Journal of Operational Research 179(3), 927–939 (2007)
Nijssen, S., Kok, J.: Efficient discovery of frequent unordered trees. In: Proc. First Intl Workshop Mining Graphs, Trees, and Sequences, pp. 55–64 (2003)
Paul, D.: Extensions to phone-state decision-tree clustering: Single tree and tagged clustering. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 1487–1490 (1997)
Rastrigin, L.: The convergence of the random search method in the extremal control of a many parameter system. Automation and Remote Control 24, 1337–1342 (1963)
Rieger, M.A., Hoppe, P.S., Smejkal, B.M., Eitelhuber, A.C., Schroeder, T.: Hematopoietic cytokines can instruct lineage choice. Science 325, 217–218 (2009)
Solis, F., Wets, R.J.-B.: Minimization by random search techniques. Mathematics of Operations Research 6, 19–30 (1981)
Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)
Torsello, A., Hancock, E.R.: Graph embedding using tree edit-union. Pattern Recognition 40(5), 1393–1405 (2007)
Torsello, A., Hidović-Rowe, D., Pelillo, M.: Polynomial-time metrics for attributed trees. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(7), 1087–1099 (2005), cited By (since 1996)35
Xiao, B., Torsello, A., Hancock, E.R.: Isotree: Tree clustering via metric embedding. Neurocomputing 71(10-12), 2029–2036 (2008)
Zaki, M.: Efficiently mining frequent embedded unordered trees. Fundamenta Informaticae 66, 33–52 (2005)
Zhang, K.: A constrained edit distance between unordered labeled trees. Algorithmica 15(3), 205–222 (1996)
Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput. 18, 1245–1262 (1989)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Khakhutskyy, V. et al. (2014). Centroid Clustering of Cellular Lineage Trees. In: Bursa, M., Khuri, S., Renda, M.E. (eds) Information Technology in Bio- and Medical Informatics. ITBAM 2014. Lecture Notes in Computer Science, vol 8649. Springer, Cham. https://doi.org/10.1007/978-3-319-10265-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-10265-8_2
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10264-1
Online ISBN: 978-3-319-10265-8
eBook Packages: Computer ScienceComputer Science (R0)