How Masterly Are People at Playing with Their Vocabulary? Analysis of the Wordle Game for Latvian

In this paper, we describe adaptation of a simple word guessing game that occupied the hearts and minds of people around the world. There are versions for all three Baltic countries and even several versions of each. We specifically pay attention to the Latvian version and look into how people form their guesses given any already uncovered hints. The paper analyses guess patterns, easy and difficult word characteristics, and player behaviour and response.


Introduction
Word guessing games are a phenomenon that represents the use of language in unusual socio-cultural contexts. Depending on the rules of a game, the meaning of a word can be completely irrelevant, whereas the structural elements of a word, such as its length and the composition of its letters, may play an important role. Despite words being used as game attributes without the need to know their meaning and context of use, it can be argued that such games contribute to vocabulary mastery and general language training.
One of the first computer games created in Latvia in the mid-1990s was the word scoring game Lingo. Inspired by a popular TV show, it was created by the language technology company Tilde. The game required guessing a five-letter word, the first letter of which was known to a player. As one of the few games available on almost all computers in Latvia at the time, it became very popular among players of all ages. The game's word corpus contained 999 words (Čudare, 2021). The game was required to be installed on a computer and could be played offline without restriction or any limit.
Almost thirty years later, in October 2021, an online word-puzzle game Wordle 3 was invented by a software engineer Josh Wardle. In a relatively short time, it became globally popular, attracting more and more new players and spawning new language versions around the world. The principle of the game is fairly simple and similar to Lingo -a player is given six attempts to guess a five-letter word. After each guess, the letters are coloured in three colours (grey, orange, green), giving the player a hint on how to continue guessing the hidden word. Unlike other word games, the specifics of Wordle are that only one word can be guessed per day, which is the same for all players. For it to work, players must be disciplined not to reveal the word of the day to others.
However, to share results instantly without spoiling the enjoyment of the game for others, Wordle offers to create an abstract figure made up of emoji library squares in three colours. It contains a geometric pattern that shows the progress and result of a guess without revealing the word behind it. This figure that players share on social networks, is the most important representational attribute of the game, also acting in a symbolic way as a communication and interaction element within the community.
Although While the Wordle developer admitted that the game is most appreciated precisely because of the fun it brings (Victor, 2022), Wordle users and re-designers have managed to add additional value to the game showing potential for promoting learning -it is being used in education for new language acquisition (Brown, 2022;Vincent, 2022), as well as to revitalise endangered languages (Schenck, 2022;CBC-News, 2022).

Game Construction
Shortly after the swift rise in popularity of the original Wordle game several versions of its reconstruction started popping up on GitHub. Of these the most popular became React Wordle 5 , which so far has over 1,700 forks and over 2,200 stars, and has been used as a base to create Wordle versions in 43 different languages (even Latin and Cornish), 32 thematic versions (such as birds, super heroes, airport codes), and even 20 mathematics, science, technology oriented ones (for example, gene symbols, JavaScript, prime numbers). The base code, which was made using React, Typescript, and Tailwind libraries, has been developed for easy adaption to new languages or themes. For example, to have a personal list of daily words and valid guesses only two files need to be updated, and to adapt the code to a new language 7 to 9 other files need changes, for which detailed instructions have are provided in the GitHub repository.

Adapting Wordle into Latvian and Audience Involvement
The first Latvian version of Wordle was created in mid January 2022. The game was named 'Vārdulis' -deriving its title from Latvian 'vārds' (word), but keeping a sonic resemblance to original title. Giving the game a unique, Latvian-specific name was a successful choice, as it was easy to find Vārdulis mentions on social media from day one of the game's launch. This, in turn, is essential for communication between players.
Even though Wordle is meant to be played in a single-player manor, an essential part of the game is sharing the result, i.e. game's auto-generated grids of emoji squares, and discussing the word of the day without revealing it on social media, such as Twitter 6 , or internal communication tools, such as WhatsApp. The impact of social media on the popularity of the game is significant. Sharing a score is often a conversation starter with other, previously unknown players, it is also a micro-competition to compare who has the better score and more successful choice of words. The words of the day are discussed and evaluated mostly in terms of their game-specific difficulty.
When developing the Latvian version, the decision was made to also include person names and various inflections of the words instead of plain singular nominative forms of nouns, incorporating words that have four letters in the nominative case, but five when conjugated (e.g., flower: nom. 'puķe' -gen. 'puķes'), thus making Vārdulis much more of a challenge than its English counterpart. However, a decision to include such words was reached in order to highlight the diversity of the language and have a more abundant set of data for subsequent play analysis. In addition, to include the possibility to learn more about the meaning of words, a link to the word entry in the online dictionary and thesaurus 7 developed by the Institute of Mathematics and Informatics of the University of Latvia was included in the window that pops up when the game is finished.
In the public discussions on Twitter which is the most common public space for Vārdulis players to meet, it can be seen that the most topical issue regarding Vārdulis is the extended dictionary, i.e. the inclusion of inflected forms in the game. The criticism was particularly strong in the first months of the game. Players complained that the game's rules thus are not fair and that they should stick to the rules of the original version, that the Latvian version is too complicated, that there are too many conjugations in Latvian to win the game in six attempts. It is also joked that the title of the game should rather be "guess the correct conjugation". 8 Over time, the criticism decreased, players accepted the rules and the vocabulary used by players increased.
Vārdulis, just like its original Wordle is limited to one game per day. The average game sessions per day from January 28 to April 14, 2022 is 935, however it took around 2 weeks for popularity to rise from a few hundred plays to around a thousand per day.
By exploring the user statistics of tezaurs.lv in Google Analytics, 9 it can be seen that the daily word is one of the most frequent searches in the database on a given day. On average, 5.7% of players navigate to the thesaurus to explore a particular word.
Exploring which words are most frequently consulted in the tezaurs.lv, two tendencies can be observed: first, less known or unusual word, for example, 'adobe' (meaning air-dried clay brick) that many of the players have never heard in Latvian was searched for on tezaurs.lv by 61.97% of players. Secondly, words that were difficult to guess or that a large number of players failed to guess at all. For example, 40.81% of players failed to guess quite common word 'šuves' (stitches), accordingly, on the given day, 21.14% of players searched for this word on the tezaurs.lv.
Overall, it can be concluded that the linking of tezaurs.lv with Vārdulis is successful and serves its purpose well, but it could also be used in a more targeted way by regularly including less known and used words in the list of daily words, which would provide additional opportunities for mastering vocabulary of players. However, as the game is to some extent competitive and players aim to complete the game in as few attempts as possible, players' frustration and public complaining could be expected.

Word List Generation
There are two word lists necessary to play the game -a list of daily guesses (main list) and a list of all valid guesses (secondary list). Construction of both lists was performed semi-automatically. First, we acquired all monolingual Latvian corpora from Opus (Tiedemann, 2012), tokenised the data, filtered out only tokens consisting of 5 characters, and finally removed any tokens which had any character outside the 33 character Latvian alphabet. To make the game reasonably challenging, we ordered the remaining tokens by frequency of occurrence in the corpora and chose the 1,500 most frequent words for the main list and everything else for the secondary list.
To maintain purely words in the Latvian language, we cross-referenced the list with the Lexical Database for Latvian (Spektors et al., 2016) and manually reviewed each word. After this, 1,430 words remained in the main list while some very frequent foreign words such as "China" or "Apple" were removed. We selected the further 15,000 words from the list ordered by frequency for the secondary list, also cross-referencing with the Lexical Database, but without manually verifying.
The secondary list, however, was still at times falling short of it's objective by failing to recognise perfectly valid Latvian words in specific inflections which may not have necessarily been among the 16,500 most frequent five-character words in the corpora. To improve the list, we once again turned to the Lexical Database and selected all words in lengths of 3 to 8 characters, automatically inflected them to all possible word forms using an inflection generator (Ņ ikiforovs, 2011), and filtered the results down to inflections of the words spanning exactly 5 characters. While still not fully exhausted, the secondary list grew to 22,341 words.

Play Analysis
The design of our version of the game includes logging the array of guesses for each session played until the end (either correct guess or failed after six attempts). In this section we analyse game data of 77 daily words collected between January 28th and April 14th of 2022. Table 2 shows the top 10 most difficult words to guess ordered by the amount of plays where the player was unable to guess the word after six guesses, and top 10 easiest words ordered where only very few players were unable to find the correct word while most were successful after the third or fourth guess. Here it is visible that a good deal of the easy words are nouns in singular nominative form, most of them do not contain diacritics, and have almost no repetition of characters within the word. On the other hand, most of the difficult words contain at least one or two diacritics, have repeating characters within the word, and none of the words are in singular nominative nouns.
The total number of tokens used by Vārdulis players in 77 days is 12,705. As it can be seen in Figure 1, the vocabulary used by players tends to expand. Table 1 shows the most popular word choices at each stage of the game. All words in columns G3-G6 have been the correct word of the day at some point. From the opening guess column G1 we clearly see that most players start with a singular nominative noun without diacritics, and with no overlapping characters within the word to make use of uncovering hints for future guesses. An interesting observation in Table 1 is that the most popular opening word by far is "Saule" (the Sun), followed by "siena" (wall), and "tiesa" (court or truth).
We look in detail at the most challenging word so far in the game and depict most common guess paths taken by players in Figure 2. The different arrows show at which of the six attempts to guess players were at. It is visible here that the vast majority of guesses at the last stages had already uncovered the ending of the correct answer "AS", and some had other critical characters uncovered like "Ī", "C" or "Ņ ".

Conclusion
In this paper, we provided insight in a brief linguistic exercise that has become a fun pastime for a few minutes each day for many players around the world. The creation of a near complete Latvian version of the game is described with further hints on how to make it more or less challenging and the possibility of enriching the vocabulary by linking the game to an online thesaurus is examined. While providing a glimpse into the public perception of the Latvian version of the game, we also dive deep in analysing how the Latvian word game has been played over the first two and a half months, looking at players' strategies, easier and more difficult words to guess. In future work, we plan to automatically analyse each daily word morphologically and attempt to predict the difficulty level or even guess the distribution based on a machine learning model. Table 2. Easiest and most difficult words to guess. Row C indicates the number of occurrences of the word in the specific form in the corpus, rows G1-G6 represent guesses, and row X represents failed games after 6 guesses. English translations of the words can be found in Appendix A.

Acknowledgements
This work has received funding from the "European Social Fund via IT Academy programme" and the project "Research on Modern Latvian Language and Development of Language Technology" (No. VPP-LETONIKA-2021/1-0006).

Appendix B. English Translations of Top 15 Guesses
English translations and accompanying the part-of-speech tags (Heine and Narrog, 2011) of the top 15 guesses at each turn (from Table 1) are shown in Table 4.