Skip to main content

Representing Data Mining Results

  • Chapter
  • First Online:
Introduction to Data Mining for the Life Sciences

Abstract

How many different ways can we represent our data in order to convey its meaning to its intended audience? What is the most effective approach? Is one technique better than another? The answer depends on your data, your audience, and the message you are trying to convey. In this chapter, we provide context for the output of our analyses, discussing very traditional representation methodologies, including tables, histograms, graphs, and plots of various types, as well as some of the more creative approaches that have seen increasing mindshare as vehicles of communication across the Internet. We also include an important technique that spans both the algorithmic and representation concepts – trees and rules – since such techniques can be valuable for both explanation and input to other systems and show that not all representations necessarily need to be graphical.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    These are also known as cross classification tables.

  2. 2.

    Data of T- and B-cell acute lymphocytic leukemia from the Ritz Laboratory at the DFCI and available from the Bioconductor web site.

  3. 3.

    http://evolution.genetics.washington.edu/phylip.html

  4. 4.

    http://www.geneious.com/

  5. 5.

    We are, of course, making an assumption that the data can be easily split accordingly. This may not be possible given the data in our learning dataset. However, if it is, Gini branching will choose this split.

  6. 6.

    This measure is used in ID3 and C4.5.

  7. 7.

    An interesting example relates to the drug Viagra which was initially clinically tested as a heart drug but which…well, you know how that ended up.

  8. 8.

    We shall return to this form of the rule later.

  9. 9.

    Witten and Frank (2005 #11) used the term coverage synonymously with support.

  10. 10.

    This is also often referred to as a transaction. We have elected to continue with our definition of our dataset being analyzed as comprising a set of instances.

  11. 11.

    The algorithm suffers from a number of inefficiencies and trade-offs that have resulted in many other algorithms being proposed. For more details, see the articles cited or search “Apriori algorithm.”

  12. 12.

    Valentin Zacharias, “Development and Verification of Rule-Based Systems – A Survey of Developers”, http://www.cs.manchester.ac.uk/ruleML/presentations/session1paper1.pdf.

  13. 13.

    http://www.sciencedirect.com/science/journal/09507051

  14. 14.

    We can use the R function range(sampleData) to obtain this information.

  15. 15.

    We return to this dataset in Chap. 6.

  16. 16.

    http://www2.warwick.ac.uk/fac/sci/moac/students/peter_cock/r/heatmap/

  17. 17.

    http://flowingdata.com/category/visualization/network-visualization/

  18. 18.

    We’re going to take some artistic license here. “Network” and “graph” are often used interchangeably in the literature.

  19. 19.

    http://sbml.org/More_Detailed_Summary_of_SBML

  20. 20.

    Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences, 3rd edn. Cohen, et al.

  21. 21.

    Treeview is now available from Google code at http://code.google.com/p/treeviewx/ for the Linux/Unix variant and http://taxonomy.zoology.gla.ac.uk/rod/treeview.html for the Windows and Mac versions.

  22. 22.

    http://www.visual-literacy.org/periodic_table/periodic_table.html

  23. 23.

    http://www.visualcomplexity.com/vc/index.cfm

  24. 24.

    http://flowingdata.com/

  25. 25.

    http://www.smashingmagazine.com/2007/08/02/data-visualization-modern-approaches/

  26. 26.

    http://www.ted.com/talks/david_mccandless_the_beauty_of_data_visualization.html

References

  • Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Bocca JB, Jarke M, Zaniolo C (eds) Proceedings of the 20th international conference on very large data bases, {VLDB}. Morgan Kaufmann, San Francisco, pp 487–499

    Google Scholar 

  • Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on management of data, Washington DC, pp 207–216

    Google Scholar 

  • Cohen W, Ravikumar P, Fienberg S (2003) A comparison of string distance metrics for name-matching tasks

    Google Scholar 

  • Chiaretti S et al (2004) Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival. Blood 103:2771–2778

    Article  PubMed  CAS  Google Scholar 

  • Gentleman RC et al (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5:R80

    Article  PubMed  Google Scholar 

  • Gibson R, Smith DR (2003) Genome visualization made fast and simple. Bioinformatics 19:1449–1450

    Article  PubMed  CAS  Google Scholar 

  • Hagel J, Facchini P (2010) Biochemistry and occurrence of O-demethylation in plant metabolism. Front Physiol 1:14

    PubMed  CAS  Google Scholar 

  • Hall BG (2001) Phylogenetic trees made easy : a how-to manual for molecular biologists. Sinauer, Sunderland, Mass

    Google Scholar 

  • Han X (2006) Inferring species phylogenies: a microarray approach. In: Computational intelligence and bioinformatics: international conference on intelligent computing, ICIC 2006, Kunming, China. Springer, Berlin/Heidelberg, pp 485–493

    Google Scholar 

  • Kerkhoven R et al (2004) Visualization for genomics: the Microbial Genome Viewer. Bioinformatics 20:1812–1814

    Article  PubMed  CAS  Google Scholar 

  • Krzywinski MI et al (2009) Circos: an information aesthetic for comparative genomics. Genome Res 19:1639–1645

    Article  PubMed  CAS  Google Scholar 

  • Ledford H (2010) Big science: the cancer genome challenge. Nature 464:972–974

    Article  PubMed  CAS  Google Scholar 

  • Milne I et al (2010) Flapjack – graphical genotype visualization. Bioinformatics 26:3133–3134

    Article  PubMed  CAS  Google Scholar 

  • Morris JA et al (2010) Evoker: a visualization tool for genotype intensity data. Bioinformatics 26:1786–1787

    Article  PubMed  CAS  Google Scholar 

  • Neapolitan RE (2003) Learning Bayesian networks. Prentice Hall, Harlow

    Google Scholar 

  • Novere NL et al (2009) The systems biology graphical notation. Nat Biotechnol 27:735–741

    Article  PubMed  Google Scholar 

  • Rajaram S, Oono Y (2010) NeatMap – non-clustering heat map alternatives in R. BMC Bioinformatics 11:45

    Article  PubMed  Google Scholar 

  • Risch N, Merikangas K (1996) The future of genetic studies of complex human diseases. Science 273:1516–1517

    Article  PubMed  CAS  Google Scholar 

  • Rual J-F et al (2005) Towards a proteome-scale map of the human protein-protein interaction network. Nature 437:1173–1178

    Article  PubMed  CAS  Google Scholar 

  • Sato N, Ehira S (2003) GenoMap, a circular genome data viewer. Bioinformatics 19:1583–1584

    Article  PubMed  CAS  Google Scholar 

  • Stothard P, Wishart DS (2005) Circular genome visualization and exploration using CGView. Bioinformatics 21:537–539

    Article  PubMed  CAS  Google Scholar 

  • Tufte ER (1990) Envisioning information. Graphics Press, Cheshire

    Google Scholar 

  • Tufte ER (1997) Visual explanations: images and quantities, evidence and narrative. Graphics Press, Cheshire

    Google Scholar 

  • Tufte ER (2001) The visual display of quantitative information. Graphics Press, Cheshire

    Google Scholar 

  • Tufte ER (2003) The cognitive style of PowerPoint. Graphics Press, Cheshire

    Google Scholar 

  • Tufte ER (2006) Beautiful evidence. Graphics Press, Cheshire

    Google Scholar 

  • Witten IH, Frank E (2005) Emboss European molecular biology open software suite. In: Data mining – practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San Francisco

    Google Scholar 

  • Zacharias V (2008) Development and Verification of Rule-Based Systems – A Survey of Developers, http://www.cs.manchester.ac.uk/ruleML/presentations/session1paper1.pdf

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Sullivan, R. (2012). Representing Data Mining Results. In: Introduction to Data Mining for the Life Sciences. Humana Press. https://doi.org/10.1007/978-1-59745-290-8_4

Download citation

Publish with us

Policies and ethics