Skip to main content
Log in

Auditory universal accessibility of data tables using naturally derived prosody specification

  • Long Paper
  • Published:
Universal Access in the Information Society Aims and scope Submit manuscript

Abstract

Text documents usually embody visually oriented meta-information in the form of complex visual structures, such as tables. The semantics involved in such objects result in poor and ambiguous text-to-speech synthesis. Although most speech synthesis frameworks allow the consistent control of an abundance of parameters, such as prosodic cues, through appropriate markup, there is no actual prosodic specification to speech-enable visual elements. This paper presents a method for the acoustic specification modelling of simple and complex data tables, derived from the human paradigm. A series of psychoacoustic experiments were set up for providing speech properties obtained from prosodic analysis of natural spoken descriptions of data tables. Thirty blind and 30 sighted listeners selected the most prominent natural rendition. The derived prosodic phrase accent and pause break placement vectors were modelled using the ToBI semiotic system to successfully convey semantically important visual information through prosody control. The quality of the information provision of speech-synthesized tables when utilizing the proposed prosody specification was evaluated by first-time listeners. The results show a significant increase (from 14 to 20% depending on the table type) of the user subjective understanding (overall impression, listening effort and acceptance) of the table data semantic structure compared to the traditional linearized speech synthesis of tables. Furthermore, it is proven that successful prosody manipulation can be applied to data tables using generic specification sets for certain table types and browsing techniques, resulting in improved data comprehension.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Black, A., Hunt, A.: Generating F0 contours from the ToBI labels using linear regression. In: Proceedings of the 4th International Conference on Spoken Language Processing, Philadelphia, USA, vol. 3, pp. 1385–1388 (1996)

  2. Burnett, C.D., Walker, R.M., Hunt, A. (eds.): Speech Synthesis Markup Language (SSML) Version 1.0, W3C Recommendation. http://www.w3.org/TR/speech-synthesis/ (2004). September 2004

  3. Caldwell, B., Cooper, M., Guarino Reid, L., Vanderheiden, G. (eds.): Web Content Accessibility Guidelines 2.0, W3C Candidate Recommendation. http://www.w3.org/TR/WCAG20/ (2008). 30 April 2008

  4. Chisholm, W., Vanderheiden, G., Jacobs, I. (eds.): HTML Techniques for Web Content Accessibility Guidelines 1.0, W3C Note. http://www.w3.org/TR/WCAG10-HTML-TECHS/ (2000). 6 November 2000

  5. Chisholm, W., Vanderheiden, G., Jacobs, I. (eds.): Web Content Accessibility Guidelines 1.0, W3C Recommendation. http://www.w3.org/TR/WAI-WEBCONTENT/ (1999). 5 May 1999

  6. Embley, D.W., Hurst, M., Lopresti, D.P., Nagy, G.: Table-processing paradigms: a research survey. Int. J. Document Anal. 8(2–3), 66–86 (2006)

    Article  Google Scholar 

  7. Filepp, R., Challenger, J., Rosu, D.: Improving the accessibility of aurally rendered HTML tables. In: Proceedings of ACM Conference on Assistive Technologies (ASSETS), pp. 9–16 (2002)

  8. Hurst, M.: Towards a theory of tables. Int. J. Document Anal. 8(2–3), 123–131 (2006)

    Article  Google Scholar 

  9. Hurst, M., Douglas, S.: Layout & language: preliminary experiments in assigning logical structure to table cells. In: Proceedings of 4th International Conference on Document Analysis and Recognition (ICDAR), pp. 1043–1047 (1997)

  10. Kottapally, K., Ngo, C., Reddy, R., Pontelli, E., Son, T.C., Gillan, D.: Towards the creation of accessibility agents for non-visual navigation of the web. In: Proceedings of the ACM Conference on Universal Usability, Vancouver, Canada, pp. 134–141 (2003)

  11. Larson, J.A. (ed.): Introduction and Overview of W3C Speech Interface Framework, W3C Working Draft. http://www.w3.org/TR/voice-intro (2000). 4 December 2000

  12. Lilley, C., Raman, T.V.: Aural Cascading Style Sheets (ACSS), W3C Working Draft. http://www.w3.org/TR/WD-acss (1999). 2 September 1999

  13. Lim, S., Ng, Y.: An automated approach for retrieving hierarchical data from HTML tables. In: Proceedings of 8th ACM International Conference on Information and Knowledge Management (CIKM), pp.466–474 (1999)

  14. Oogane, T., Asakawa, C.: An interactive method for accessing tables in HTML. In: Proceedings of International ACM Conference on Assistive Technologies, pp. 126–128 (1998)

  15. Penn, G., Hu, J., Luo, H., McDonald, R.: Flexible web document analysis for delivery to narrow-bandwidth devices. In: Proceedings of 6th International Conference on Document Analysis and Recognition (ICDAR), pp. 1074–1078 (2001)

  16. Pitt, I., Edwards, A.: An improved auditory interface for the exploration of lists. In: ACM Multimedia, pp. 51–61 (1997)

  17. Pontelli, E., Gillan, D.J., Gupta, G., Karshmer, A.I., Saad, E., Xiong, W.: Intelligent non-visual navigation of complex HTML structures. Univers. Access Inf. Soc. 2(1), 56–69 (2002)

    Article  Google Scholar 

  18. Pontelli, E., Gillan, D., Xiong, W., Saad, E., Gupta, G., Karshmer, A.: Navigation of HTML tables, frames, and XML fragments. In: Proceedings of ACM Conference on Assistive Technologies (ASSETS), pp. 25–32 (2002)

  19. Pontelli, E., Xiong, W., Gupta, G., Karshmer, A.: A domain specific language framework for non-visual browsing of complex HTML structures. In: Proceedings of ACM Conference on Assistive Technologies (ASSETS), pp. 180–187 (2000)

  20. Raggett, D., Le Hors, A., Jacobs, I. (eds.): Tables, HTML 4.01 Specification, W3C Recommendation. http://www.w3.org/TR/REC-html40 (1999)

  21. Raggett, D., Glazman, D., Santambrogio, C. (eds.): CSS3 Speech Module, W3C Working Draft. http://www.w3.org/TR/css3-speech (2004). December 2004

  22. Raman, T.: An audio view of (LA)TEX documents, TUGboat. In: Proceedings of 1992 Annual Meeting, vol. 13, no. 3, pp. 372–379 (1992)

  23. Ramel, J.-Y., Crucianou, M., Vincent, N., Faure, C.: Detection, extraction and representation of tables. In: Proceedings of 7th International Conference on Document Analysis and Recognition (ICDAR), pp. 374–378 (2003)

  24. Silva, A.C., Jorge, A.M., Torgo, L.: Design of an end-to-end method to extract information from tables. Int. J. Document Anal. Recogn. 8(2), 144–171 (2006) (special issue on detection and understanding of tables and forms for document processing applications)

    Google Scholar 

  25. Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., Pierrehumbert, J., Hirschberg, J.: ToBI: a standard for labeling english prosody. In: Proceedings of International Conference on Spoken Language Processing (ICSLP), vol. 2, pp. 867–870 (1992)

  26. Spiliotopoulos, D., Xydas, G., Kouroupetroglou, G., Argyropoulos, V.: Experimentation on spoken format of tables in auditory user interfaces. Universal access in HCI. In: Proceedings of 11th International Conference on Human–Computer Interaction (HCII-2005), Las Vegas, USA, 22–27 July (2005)

  27. Spiliotopoulos, D., Xydas, G., Kouroupetroglou, G.: Diction based prosody modeling in table-to-speech synthesis. Lecture Notes in Artificial Intelligence, vol. 3658, pp. 294–301. Springer, Berlin (2005)

    Google Scholar 

  28. Stephanidis, C.: Designing for all in the information society: challenges towards universal access in the information age. ERCIM ICST Research Report, ICS-Forth, Heraklion, Crete, pp. 21–24 (1999)

  29. Stephanidis, C.: User interfaces for all: new perspectives into human–computer interaction. In: Stephanidis, C. (ed.) User interfaces for all, pp. 3–17. Lawrence Erlbaum, Mahwah, NJ (2001)

    Google Scholar 

  30. Viswanathan, M., Viswanathan, M.: Measuring speech quality for text-to-speech systems: development and assessment of a modified mean opinion score (MOS) scale. Comput Speech Lang 19(1), 55–83 (2005)

    Article  Google Scholar 

  31. Xydas, G., Kouroupetrolgou, G.: Text-to-speech scripting interface for appropriate vocalisation of E-texts. In: Proceedings of 7th European Conference on Speech Communication and Technology, pp. 2247–2250 (2001)

  32. Xydas, G., Argyropoulos, V., Karakosta, T., Kouroupetroglou, G.: An experimental approach in recognizing synthesized auditory components in a non-visual interaction with documents. In: Proceedings of Human–Computer Interaction (2005)

  33. Xydas, G., Spiliotopoulos, D., Kouroupetroglou, G.: Modeling emphatic events from non-speech aware documents in speech based user interfaces. In: Proceedings of International Conference on Human–Computer Interaction (HCII), Theory and Practice, vol. 2, pp. 806–810 (2003)

  34. Xydas, G., Kouroupetroglou, G.: Tone-Group F0 selection for modeling focus prominence in small-footprint speech synthesis. Speech Commun. 48(9), 1057–1078 (2006)

    Google Scholar 

  35. Yesilada, Y., Stevens, R., Goble, C., Hussein, S.: Rendering tables in audio: the interaction of structure and reading styles. In: Proceedings of ACM Conference on Assistive Technologies (ASSETS), pp. 16–23 (2004)

Download references

Acknowledgments

The work described in this paper has been partially supported by the European Social Fund and Hellenic National Resources under the HERACLITUS project of the EPEAEK II programme, Greek Ministry of Education. We would like to thank Manolis Platakis, Dimitris Sifakis, and the students of the University of Athens, Department of Informatics and Telecommunications, as well as the members of the Panhellenic Association of the Blind for their participation in the experiments described in this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dimitris Spiliotopoulos.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Spiliotopoulos, D., Xydas, G., Kouroupetroglou, G. et al. Auditory universal accessibility of data tables using naturally derived prosody specification. Univ Access Inf Soc 9, 169–183 (2010). https://doi.org/10.1007/s10209-009-0165-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10209-009-0165-0

Keywords

Navigation