Persistent Identifier
|
doi:10.18710/WI9TEH |
Publication Date
|
2024-02-29 |
Title
| Background data for: Latent-variable modeling of ordinal outcomes in language data analysis |
Author
| Krug, Manfred (University of Bamberg) - ORCID: 0000-0002-9508-8468
Vetter, Fabian (University of Bamberg) - ORCID: 0000-0002-3654-5489
Sönning, Lukas (University of Bamberg) - ORCID: 0000-0002-2705-395X |
Point of Contact
|
Use email button above to contact.
Sönning, Lukas (University of Bamberg) |
Description
| This dataset contains tabular files with information about the usage preferences of speakers of Maltese English with regard to 63 pairs of lexical expressions. These pairs (e.g. truck-lorry or realization-realisation) are known to differ in usage between BrE and AmE (cf. Algeo 2006). The data were elicited with a questionnaire that asks informants to indicate whether they always use one of the two variants, prefer one over the other, have no preference, or do not use either expression (see Krug and Sell 2013 for methodological details). Usage preferences were therefore measured on a symmetric 5-point ordinal scale. Data were collected between 2008 to 2018, as part of a larger research project on lexical and grammatical variation in settings where English is spoken as a native, second, or foreign language. The current dataset, which we use for our methodological study on ordinal data modeling strategies, consists of a subset of 500 speakers that is roughly balanced on year of birth. (2023-07-19)
Abstract: Related publication In empirical work, ordinal variables are typically analyzed using means based on numeric scores assigned to categories. While this strategy has met with justified criticism in the methodological literature, it also generates simple and informative data summaries, a standard often not met by statistically more adequate procedures. Motivated by a survey of how ordered variables are dealt with in language research, we draw attention to an un(der)used latent-variable approach to ordinal data modeling, which constitutes an alternative perspective on the most widely used form of ordered regression, the cumulative model. Since the latent-variable approach does not feature in any of the studies in our survey, we believe it is worthwhile to promote its benefits. To this end, we draw on questionnaire-based preference ratings by speakers of Maltese English, who indicated on a 5-point scale which of two synonymous expressions (e.g. package-parcel) they (tend to) use. We demonstrate that a latent-variable formulation of the cumulative model affords nuanced and interpretable data summaries that can be visualized effectively, while at the same time avoiding limitations inherent in mean response models (e.g. distortions induced by floor and ceiling effects). The online supplementary materials include a tutorial for its implementation in R. (2024-01-12) |
Subject
| Arts and Humanities |
Keyword
| Maltese English
lexical variation
ordinal data
latent variable
ordered regression
methodology
statistical analysis
regression
ordinal regression |
Related Publication
| Sönning, Lukas, Manfred Krug, Fabian Vetter, Timo Schmid, Anne Leucht & Paul Messer. Latent-variable modeling of ordinal outcomes in language data analysis. [submitted for review] |
Language
| English |
Producer
| University of Bamberg https://www.uni-bamberg.de/eng-ling/ |
Production Location
| Malta |
Contributor
| Researcher : Hilbert, Michaela
Researcher : Pabel, Sebastian
Data Manager : Scheiner, Katharina
Data Manager : Linne, Anja
Researcher : Schützler, Ole
Researcher : Lucas, Christopher
Researcher : Peterson, Nicholas |
Funding Information
| Bavarian Ministry for Science, Research and the Arts
Spanish Ministry of Education and Science with European Regional Development Fund: HUM2007-60706/FILO
German Humboldt Foundation |
Distributor
| The Tromsø Repository of Language and Linguistics (TROLLing) (TROLLing) https://trolling.uit.no/ |
Depositor
| Sönning, Lukas |
Deposit Date
| 2023-07-19 |
Time Period
| Start Date: 2008-01-01 ; End Date: 2018-12-31 |
Date of Collection
| Start Date: 2008-01-01 ; End Date: 2018-12-31 |
Data Type
| questionnaire data; survey data |