Representing Data Mining Results

Sullivan, Rob

doi:10.1007/978-1-59745-290-8_4

Rob Sullivan²

2814 Accesses
1 Citations

Abstract

How many different ways can we represent our data in order to convey its meaning to its intended audience? What is the most effective approach? Is one technique better than another? The answer depends on your data, your audience, and the message you are trying to convey. In this chapter, we provide context for the output of our analyses, discussing very traditional representation methodologies, including tables, histograms, graphs, and plots of various types, as well as some of the more creative approaches that have seen increasing mindshare as vehicles of communication across the Internet. We also include an important technique that spans both the algorithmic and representation concepts – trees and rules – since such techniques can be valuable for both explanation and input to other systems and show that not all representations necessarily need to be graphical.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
These are also known as cross classification tables.
2.
Data of T- and B-cell acute lymphocytic leukemia from the Ritz Laboratory at the DFCI and available from the Bioconductor web site.
3.
http://evolution.genetics.washington.edu/phylip.html
4.
http://www.geneious.com/
5.
We are, of course, making an assumption that the data can be easily split accordingly. This may not be possible given the data in our learning dataset. However, if it is, Gini branching will choose this split.
6.
This measure is used in ID3 and C4.5.
7.
An interesting example relates to the drug Viagra which was initially clinically tested as a heart drug but which…well, you know how that ended up.
8.
We shall return to this form of the rule later.
9.
Witten and Frank (2005 #11) used the term coverage synonymously with support.
10.
This is also often referred to as a transaction. We have elected to continue with our definition of our dataset being analyzed as comprising a set of instances.
11.
The algorithm suffers from a number of inefficiencies and trade-offs that have resulted in many other algorithms being proposed. For more details, see the articles cited or search “Apriori algorithm.”
12.
Valentin Zacharias, “Development and Verification of Rule-Based Systems – A Survey of Developers”, http://www.cs.manchester.ac.uk/ruleML/presentations/session1paper1.pdf.
13.
http://www.sciencedirect.com/science/journal/09507051
14.
We can use the R function range(sampleData) to obtain this information.
15.
We return to this dataset in Chap. 6.
16.
http://www2.warwick.ac.uk/fac/sci/moac/students/peter_cock/r/heatmap/
17.
http://flowingdata.com/category/visualization/network-visualization/
18.
We’re going to take some artistic license here. “Network” and “graph” are often used interchangeably in the literature.
19.
http://sbml.org/More_Detailed_Summary_of_SBML
20.
Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences, 3rd edn. Cohen, et al.
21.
Treeview is now available from Google code at http://code.google.com/p/treeviewx/ for the Linux/Unix variant and http://taxonomy.zoology.gla.ac.uk/rod/treeview.html for the Windows and Mac versions.
22.
http://www.visual-literacy.org/periodic_table/periodic_table.html
23.
http://www.visualcomplexity.com/vc/index.cfm
24.
http://flowingdata.com/
25.
http://www.smashingmagazine.com/2007/08/02/data-visualization-modern-approaches/
26.
http://www.ted.com/talks/david_mccandless_the_beauty_of_data_visualization.html

References

Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Bocca JB, Jarke M, Zaniolo C (eds) Proceedings of the 20th international conference on very large data bases, {VLDB}. Morgan Kaufmann, San Francisco, pp 487–499
Google Scholar
Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on management of data, Washington DC, pp 207–216
Google Scholar
Cohen W, Ravikumar P, Fienberg S (2003) A comparison of string distance metrics for name-matching tasks
Google Scholar
Chiaretti S et al (2004) Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival. Blood 103:2771–2778
Article PubMed CAS Google Scholar
Gentleman RC et al (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5:R80
Article PubMed Google Scholar
Gibson R, Smith DR (2003) Genome visualization made fast and simple. Bioinformatics 19:1449–1450
Article PubMed CAS Google Scholar
Hagel J, Facchini P (2010) Biochemistry and occurrence of O-demethylation in plant metabolism. Front Physiol 1:14
PubMed CAS Google Scholar
Hall BG (2001) Phylogenetic trees made easy : a how-to manual for molecular biologists. Sinauer, Sunderland, Mass
Google Scholar
Han X (2006) Inferring species phylogenies: a microarray approach. In: Computational intelligence and bioinformatics: international conference on intelligent computing, ICIC 2006, Kunming, China. Springer, Berlin/Heidelberg, pp 485–493
Google Scholar
Kerkhoven R et al (2004) Visualization for genomics: the Microbial Genome Viewer. Bioinformatics 20:1812–1814
Article PubMed CAS Google Scholar
Krzywinski MI et al (2009) Circos: an information aesthetic for comparative genomics. Genome Res 19:1639–1645
Article PubMed CAS Google Scholar
Ledford H (2010) Big science: the cancer genome challenge. Nature 464:972–974
Article PubMed CAS Google Scholar
Milne I et al (2010) Flapjack – graphical genotype visualization. Bioinformatics 26:3133–3134
Article PubMed CAS Google Scholar
Morris JA et al (2010) Evoker: a visualization tool for genotype intensity data. Bioinformatics 26:1786–1787
Article PubMed CAS Google Scholar
Neapolitan RE (2003) Learning Bayesian networks. Prentice Hall, Harlow
Google Scholar
Novere NL et al (2009) The systems biology graphical notation. Nat Biotechnol 27:735–741
Article PubMed Google Scholar
Rajaram S, Oono Y (2010) NeatMap – non-clustering heat map alternatives in R. BMC Bioinformatics 11:45
Article PubMed Google Scholar
Risch N, Merikangas K (1996) The future of genetic studies of complex human diseases. Science 273:1516–1517
Article PubMed CAS Google Scholar
Rual J-F et al (2005) Towards a proteome-scale map of the human protein-protein interaction network. Nature 437:1173–1178
Article PubMed CAS Google Scholar
Sato N, Ehira S (2003) GenoMap, a circular genome data viewer. Bioinformatics 19:1583–1584
Article PubMed CAS Google Scholar
Stothard P, Wishart DS (2005) Circular genome visualization and exploration using CGView. Bioinformatics 21:537–539
Article PubMed CAS Google Scholar
Tufte ER (1990) Envisioning information. Graphics Press, Cheshire
Google Scholar
Tufte ER (1997) Visual explanations: images and quantities, evidence and narrative. Graphics Press, Cheshire
Google Scholar
Tufte ER (2001) The visual display of quantitative information. Graphics Press, Cheshire
Google Scholar
Tufte ER (2003) The cognitive style of PowerPoint. Graphics Press, Cheshire
Google Scholar
Tufte ER (2006) Beautiful evidence. Graphics Press, Cheshire
Google Scholar
Witten IH, Frank E (2005) Emboss European molecular biology open software suite. In: Data mining – practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San Francisco
Google Scholar
Zacharias V (2008) Development and Verification of Rule-Based Systems – A Survey of Developers, http://www.cs.manchester.ac.uk/ruleML/presentations/session1paper1.pdf

Download references

Author information

Authors and Affiliations

Cincinnati, OH, USA
Rob Sullivan

Authors

Rob Sullivan
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sullivan, R. (2012). Representing Data Mining Results. In: Introduction to Data Mining for the Life Sciences. Humana Press. https://doi.org/10.1007/978-1-59745-290-8_4

Download citation

DOI: https://doi.org/10.1007/978-1-59745-290-8_4
Published: 12 November 2011
Publisher Name: Humana Press
Print ISBN: 978-1-58829-942-0
Online ISBN: 978-1-59745-290-8
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics