Abstract
How many different ways can we represent our data in order to convey its meaning to its intended audience? What is the most effective approach? Is one technique better than another? The answer depends on your data, your audience, and the message you are trying to convey. In this chapter, we provide context for the output of our analyses, discussing very traditional representation methodologies, including tables, histograms, graphs, and plots of various types, as well as some of the more creative approaches that have seen increasing mindshare as vehicles of communication across the Internet. We also include an important technique that spans both the algorithmic and representation concepts – trees and rules – since such techniques can be valuable for both explanation and input to other systems and show that not all representations necessarily need to be graphical.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
These are also known as cross classification tables.
- 2.
Data of T- and B-cell acute lymphocytic leukemia from the Ritz Laboratory at the DFCI and available from the Bioconductor web site.
- 3.
- 4.
- 5.
We are, of course, making an assumption that the data can be easily split accordingly. This may not be possible given the data in our learning dataset. However, if it is, Gini branching will choose this split.
- 6.
This measure is used in ID3 and C4.5.
- 7.
An interesting example relates to the drug Viagra which was initially clinically tested as a heart drug but which…well, you know how that ended up.
- 8.
We shall return to this form of the rule later.
- 9.
Witten and Frank (2005 #11) used the term coverage synonymously with support.
- 10.
This is also often referred to as a transaction. We have elected to continue with our definition of our dataset being analyzed as comprising a set of instances.
- 11.
The algorithm suffers from a number of inefficiencies and trade-offs that have resulted in many other algorithms being proposed. For more details, see the articles cited or search “Apriori algorithm.”
- 12.
Valentin Zacharias, “Development and Verification of Rule-Based Systems – A Survey of Developers”, http://www.cs.manchester.ac.uk/ruleML/presentations/session1paper1.pdf.
- 13.
- 14.
We can use the R function range(sampleData) to obtain this information.
- 15.
We return to this dataset in Chap. 6.
- 16.
- 17.
- 18.
We’re going to take some artistic license here. “Network” and “graph” are often used interchangeably in the literature.
- 19.
- 20.
Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences, 3rd edn. Cohen, et al.
- 21.
Treeview is now available from Google code at http://code.google.com/p/treeviewx/ for the Linux/Unix variant and http://taxonomy.zoology.gla.ac.uk/rod/treeview.html for the Windows and Mac versions.
- 22.
- 23.
- 24.
- 25.
- 26.
References
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Bocca JB, Jarke M, Zaniolo C (eds) Proceedings of the 20th international conference on very large data bases, {VLDB}. Morgan Kaufmann, San Francisco, pp 487–499
Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on management of data, Washington DC, pp 207–216
Cohen W, Ravikumar P, Fienberg S (2003) A comparison of string distance metrics for name-matching tasks
Chiaretti S et al (2004) Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival. Blood 103:2771–2778
Gentleman RC et al (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5:R80
Gibson R, Smith DR (2003) Genome visualization made fast and simple. Bioinformatics 19:1449–1450
Hagel J, Facchini P (2010) Biochemistry and occurrence of O-demethylation in plant metabolism. Front Physiol 1:14
Hall BG (2001) Phylogenetic trees made easy : a how-to manual for molecular biologists. Sinauer, Sunderland, Mass
Han X (2006) Inferring species phylogenies: a microarray approach. In: Computational intelligence and bioinformatics: international conference on intelligent computing, ICIC 2006, Kunming, China. Springer, Berlin/Heidelberg, pp 485–493
Kerkhoven R et al (2004) Visualization for genomics: the Microbial Genome Viewer. Bioinformatics 20:1812–1814
Krzywinski MI et al (2009) Circos: an information aesthetic for comparative genomics. Genome Res 19:1639–1645
Ledford H (2010) Big science: the cancer genome challenge. Nature 464:972–974
Milne I et al (2010) Flapjack – graphical genotype visualization. Bioinformatics 26:3133–3134
Morris JA et al (2010) Evoker: a visualization tool for genotype intensity data. Bioinformatics 26:1786–1787
Neapolitan RE (2003) Learning Bayesian networks. Prentice Hall, Harlow
Novere NL et al (2009) The systems biology graphical notation. Nat Biotechnol 27:735–741
Rajaram S, Oono Y (2010) NeatMap – non-clustering heat map alternatives in R. BMC Bioinformatics 11:45
Risch N, Merikangas K (1996) The future of genetic studies of complex human diseases. Science 273:1516–1517
Rual J-F et al (2005) Towards a proteome-scale map of the human protein-protein interaction network. Nature 437:1173–1178
Sato N, Ehira S (2003) GenoMap, a circular genome data viewer. Bioinformatics 19:1583–1584
Stothard P, Wishart DS (2005) Circular genome visualization and exploration using CGView. Bioinformatics 21:537–539
Tufte ER (1990) Envisioning information. Graphics Press, Cheshire
Tufte ER (1997) Visual explanations: images and quantities, evidence and narrative. Graphics Press, Cheshire
Tufte ER (2001) The visual display of quantitative information. Graphics Press, Cheshire
Tufte ER (2003) The cognitive style of PowerPoint. Graphics Press, Cheshire
Tufte ER (2006) Beautiful evidence. Graphics Press, Cheshire
Witten IH, Frank E (2005) Emboss European molecular biology open software suite. In: Data mining – practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San Francisco
Zacharias V (2008) Development and Verification of Rule-Based Systems – A Survey of Developers, http://www.cs.manchester.ac.uk/ruleML/presentations/session1paper1.pdf
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Sullivan, R. (2012). Representing Data Mining Results. In: Introduction to Data Mining for the Life Sciences. Humana Press. https://doi.org/10.1007/978-1-59745-290-8_4
Download citation
DOI: https://doi.org/10.1007/978-1-59745-290-8_4
Published:
Publisher Name: Humana Press
Print ISBN: 978-1-58829-942-0
Online ISBN: 978-1-59745-290-8
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)