Morphemic Structure of Lithuanian Words

Abstract The Lithuanian language is a typical flectional language that has a very sophisticated system of grammatical forms and many means of derivation; it is also characterized by uncertain boundaries between morphemes. All this makes the morphemic analysis of the Lithuanian language very complex. The aim of this research is to define and describe morphemic structural models of inflective parts of speech (i.e. nouns, adjectives, numerals, pronouns, and verbs) and regularities of their usage in contemporary Lithuanian.


Introduction
There are 11 parts of speech in the Lithuanian language, namely nouns, adjectives, pronouns, numerals, verbs, adverbs, conjunctions, particles, pronouns, interjections, and onomatopoeia. Conjunctions, particles, prepositions, interjections, and onomatopoeia are fixed and uninflected. They cannot be morphologically split, as synchronically they are formed only by the root, while adverbs have degree (e.g., gerai -geriau, geriausiai 'good, better, the best'). Almost all nominal words, i.e. nouns, adjectives, pronouns, numerals are declined by gender, number, and case. Due to semantic reasons some nominal words do not have all of these categories. The verb is by far the most difficult part of speech. Lithuanian verbs can be uninflected (infinitive, adverbial participle), conjugated, and even declined (participles, and partially half participles).
As it is seen in Table 1, Lithuanian grammatical categories are mostly expressed by inflections. Suffixes for nominal words are only used for adjectives to express the degree. There is quite a different situation with verbs, as suffixes are used for degree, tense, voice, and mood.
Lithuanian is a flectional language, where grammatical relations between words are expressed by inflections that may have multiple meanings. For example, the inflection -uose in the word ger-uose 'good' indicates the masculine gender, the plural number, and the Locative case. Hence, the majority of words consists of both the root and the inflection. An inflection may also function as a derivational affix, e.g. stal-ius 'carpenter' (derived from stal-as 'table'), vilk-ė 'she-wolf' (derived from vilk-as 'wolf'). Due to this reason, we do not use the term primary word, as the morphemic structure does not always distiguish between deri-vative and primary words, e.g. nam-as 'house' and stal-ius 'carpenter' have the same structure (root + flexion); however the former is primary, while the latter is a derivative word. We are using a more neutral term of simple word for words that do not have derivational affixes.
Prefixes and suffixes typically play the role of derivational affixes in Lithuanian, and they are instrumental in creating new words. The Lithuanian language has many derivational affixes -almost 600 suffixes and around 60 prefixes5. They show the huge derivational potential of the language. In Lithuanian suffixes also occur in inflections. To this group belong forms of derivational verbs, for example, skait-o and skaitant-is (a present tense participle form with the inflectional suffix -ant-). Degrees of adjectives are also expressed by means of inflectional suffixes, e.g. ger-as, ger-esn-is, and ger-iaus-ias 'good, better, the best'. It should be noted, that suffixes perform a single function: they are either derivational or inflectional, and inflectional suffixes always denote a single grammatical category. In this respect inflections differ from suffixes, as they can be both derivational and inflectional, and the latter may denote more than one grammatical category. Prefixes are rarely used in derivation (most commonly they are used to form reflexive participles, e.g. be-si-suk-ant-is 'turning': the usage of prefix be-makes the reflexive marker -si-to come into the front of the word, thus avoiding the formation of the wordform *suk-ant-is-is, which would be hard to pronounce).
Expression of reflexivity by the means of affixes is not usual, as it is conveyed by a reflexive affix whose position depends on whether a word contains a prefix or not; thus it is at the end in a prefix-less word (e.g. praus-ia-si 'washing himself'), while in a word with a prefix the position of the reflexive marker is between the prefix and the root (e.g. ne-si-praus-ia 'not washing himself').
Moreover, Lithuanian compounds can also be formed by combining two or more roots into a single word, e.g. darb-dav-ys 'employer' or joining them with a connective vowel, e.g. darb-o-tvark-ė 'agenda'. New words may also be derived from derivatives, for example, žem-uog-iav-im-as 'strawberry picking' is formed from the compound noun žem-uog-ė 'strawberry'.
Thus summing up the functions of Lithuanian affixes, one should consider that inflections of nominal words and declined verbs denote gender, number and case categories, while inflections of verbs denote tense, number and person. As it has been mentioned previously, sometimes inflections are used as derivational affixes in forming new wordforms. Inflectional suffixes of nominal words denote degree for nominal words, and tense, mood and voice for verbs (these categories also remain in nouns that are derived from verbs). Due to the ambiguity of such affixes, only inflectional and derivative suffixes can easily be distingu-ished. Thus, in section 4, which is about the morphemic structure of different parts of speech, only suffixes are classified as derivational and inflectional and no such classification is used for other affixes.
Having this in mind, one can assume that since the morphemic word structure6 in Lithuanian is so diverse it may not show any regularities or have characteristic models. This very assumption was, in fact, the reason for undertaking this research in order to establish certain regularities of morphemic word structure in the Lithuanian language.
The aim of this research is to define and describe morphemic structural models of inflective parts of speech (i.e. nouns, adjectives, numerals, pronouns, and verbs) and regularities of their usage in contemporary Lithuanian. The main works on morphemics are presented in Chapter 2 of this paper. In Chapter 3 we describe the analysed data and present problems that have occurred during the compilation of this data (morpheme boundary detection, boundaries of certain words, and an uncertain line between synchronic and diachronic aspects). The main results of this research are presented in Chapter 4, where morphemic structures of nouns, adjectives, pronouns, numerals, and verbs are presented. In Chapter 5 we compare morphemic structures of different parts of speech.
One should consider that morphemic and derivational analysis do not correspond, e.g. the morphemic analysis of the noun parašas 'signature' splits the wordform into prefix+root+flection (pa-ra-šas), while the derivational analysis into stem+flextion (paraš-as). The derivational analysis (simple and derivative words) is beyond the scope of this paper, and we will only briefly refer to some of its aspects in the paper.
Readers should note that this is one of the first works for the Lithuanian language, which presents such an extended analysis of morphemic structural models and their frequencies of usage that is based on empirical data. We compare morphemic structures of different parts of speech, we assess their level of complexity, regularity, and we highlight differences. The research is also useful for showing the complexity of Lithuanian words and multi-functionality of Lithuanian affixes. We also believe that these results will be useful in comparative studies.

Previous Research in Morphemics
The morphemic structure of Lithuanian words was analysed by Kuosienė (1986) and Keinys (2009), however, their research does not employ a quantitative data analysis. Typically, Lithuanian grammars (e.g. DLKG 1996) present only general information about morphemes. The morphemic structure of Lithuanian verbs was recently analysed by Rimkutė et al. (2011).
Other works related to Lithuanian morphemics deal with phonological structure of morphemes (Kruopienė 2000a, Kruopienė 2000b, Kruopienė 2001, Kruopienė 2002, Kruopienė 2005Kaukienė 1994, Kaukienė 2002, Karosienė 2004Akelaitienė 1996, Akelaitienė 2001, Akelaitienė 2008) but, as is the case regarding morphemic structure, none of them are based on a quantitative analysis. In view of this, the present research is important not only for Lithuanian scholars but also for researchers of other languages who analyse morphemic word structures. It should be noted that this research is based on rich electronic data of contemporary Lithuanian; therefore, it is believed that at least main regularities in morphemic word structure may be observed and reliable conclusions may be drawn concerning regularities of morphemic structure.
Comprehensive information about morphemic structure of other related languages, i.e. Latvian, Russian, and Czech, was not found. The existing research is limited to presenting what sort of morphemes may form words in the respective languages (see, e.g., Kalnača 2004;Sarkans 1996 andLevāne-Petrova et al. 2000 for Latvian;Polikarpov 2000 for Russian;Sedláček 2004 for Czech).

Data and Methods
The analysed data consist of samples from the Corpus of the Contemporary Lithuanian Language (http://tekstynas.vdu.lt), which covers a wide variety of texts: scientific publications, periodical texts, fiction, documents (laws, decisions, protocols, statutes). Each functional style is represented by samples that contain an equal number of words. Fragments from The Database of Spoken Language (http://donelaitis. vdu.lt/garsynas) have also been used. The size of the whole experimental corpus is 310,000 words. Texts have been semi-automatically morphologically annotated. The morphological tagger7 has identified the part of speech, gender, number, case, person, tense, etc.; it has also disambiguated morphologically ambiguous words (e.g. the word vidutiniškai 'on the average' may be an adverb or an adjective, while nužudyti 'to kill' may be an infinitive or a participle in the passive voice). However, it has ignored morphological multiword units and has not identified grammatical categories for unrecognized words (such as proper names, shortened word forms, new words). Some mistakes of the morphological tagger have been corrected, and missing information for unrecognized words have been added manually.
Fragments of natural written language (not dictionaries) have been chosen for the analysis of Lithuanian morphemic structure. There are several reasons that justify this choice. First of all, there is no good contemporary dictionary of standard Lithuanian. The content of the existing Modern Lithuanian Dictionary (DLKŽ 2006) does not match its name, as it includes many obsolete, dialectic, rare, and jargon words whose morphemic structure is often difficult to define.
The next argument concerns the very structure of dictionaries: they only present headwords in spite of the fact that the Lithuanian language has a very elaborate inflectional system. For example, the verb skaityti 'read' is presented by three main forms skait-y-ti, skait-o, skait-ė, while verb forms that are derived from them (e.g. skait-ant-is (participle), skait-ant (adverbial participle), skait-y-dav-o (past frequentative tense), skait-y-dam-as (half participle) by adding inflectional suffixes are not included.
In addition, dictionaries often do not include many derivational words that are very important for the morphemic analysis, such as prefix derivatives (te-be-skait-o 'still reading', ne-skait-o 'not reading') or diminutive words (e.g. stal-iuk-as 'a little table', sūn-el-is 'a little son', ger-ut-is 'a good one').
Even though dictionary-based research could be interesting as well, it will not concern us here. Many problems have arisen while trying to establish boundaries of morphemes. In this study, theoretical principles for morpheme boundary detection are formulated following Urbutis (2009, 165-166). A part of a word is considered to be a morpheme if both (or all) parts of a word (potential morphemes) occur in other words (e.g. the root in nam-el-is 'a little house' also occurs in nam-as 'a house' and nam-in-is 'domestic', while the suffix is found in such words as stal-el-is 'a little table' or vaik-el-is 'a little child'). Unfortunately, these assumptions do not solve all problems that may occur while analysing the large volume of empirical data. The problems include the following: an overlap of morphemes, the boundary detection according to synchronic or diachronic view, as well as boundary problems within international words.
Words are decomposed into as many morphemes (the smallest meaningful parts of a word) as it is allowed by the contemporary Lithuanian language. However, in some cases the analysis of certain words had to include the diachronic aspect. For example, adjectives gyvas 'alive', pilnas 'full' are semantically associated with the verbs gy-ti 'to heal' and pil-ti 'to pour', however contemporary speakers do not associate these words any more. Some researchers present these adjectives as suffix derivatives (Pakerys 1994, 356), while others as inflectional derivatives (DLKG 1996, 223). In the cases, when it is not clear whether morphemes can be separated according to the synchronic view, we present the boundaries in parentheses, e.g., gy(-)v-as 'alive'.
Inasmuch as our research is synchronic, we treat pronominal inflections as one morpheme, although historically these forms were formed, while the pronouns jis, ji 'he, she' were attached to the simple inflection, e.g. gero+jo=gerojo 'good'.
The boundaries of morphemes constituting words in the Lithuanian language are not always clear; various phonological processes take place at a morphemic juncture: either front or back consonants are adjusted to the adjacent consonants of the other morpheme, sometimes morphemes overlap, then sounds are skipped or contracted. The reasons of such variation are phonological (degemination, dissimilation, contraction, when geminates that are not characteristic to the Lithuanian language are eliminated) and phonetic (metathesis and elimination, when sounds are switched or omitted to make pronunciation easier). The classification of these processes varies in Lithuanian linguistics (Kazlauskienė 2012). Some authors (DLKG 1996) classify these processes as contraction, elimination, and degemination. Other linguists do not classify these processes and identify them by the term "sound omission" (Urbutis 2009). The described individual cases have only marginally influenced our analysis, as in our work the biggest attention has been paid to the distribution of morphemes and their number and not to the problems of morpheme boundary detection, which would require a separate study.
It is a complicated matter to detect morpheme boundaries within international words, as commonly a borrowed word is being adapted to the Lithuanian language and it acquires grammatical features of the part of speech, e.g. the verb bomb-ard-uo-ti 'to bombard' is borrowed from the French word bombarder, however in the Lithuanian language there is also the noun bomb-a 'a bomb'. If international words were broken into smaller parts, it would not contradict with regularities of morphosyntactics, and it would hardly have influence on the variability of models. However, the further analysis of morphemic structure and allomorphs would become more complicated, as the morphemic inventory would be supplemented with uncharacteristic morphemes for the Lithuanian language, as in the case of bomb-ard-uo-ti, we would need to distinguish the foreign suffix -ard-.
We understand that we still have left some open issues when establishing morphemic boundaries in our analysed data. The reason for this is the fact that due to the lack of theoretical discussion on morphemic boundaries and on the difference between morphemic and derivational analyses some of our choices may be debatable. Nevertheless, individual interpretations cannot change the main regularities of Lithuanian morphemic structure.
All the problematic cases mentioned above are treated consistently, and in the future they may be reviewed. Certainly not all difficulties of morphemic analysis have been mentioned, as first of all we would like to focus on regularities of morphemic composition in nominal and verbal words, and therefore we do not deal with other known problems of morphemic analysis, e.g. whether or not all words can be split into morphemes, whether or not some morphemes may be treated as such according to the synchronic point of view, and etc.
The initial marking of morphemic boundaries was performed manually by using special symbolic notifications (see examples below, e.g. the word rodydavosi 'appeared' was marked as ro9d!y-dav=o+si (where the exclamation mark separates the root rod-from the derivational suffix -y-, the hash separates the inflectional suffix -dav-, the equals sign separates the inflection, the number 9 denotes the acute (falling pitch) accent8 that falls onto the sound o, and the plus sign separates the reflexive affix).
The Database of Lithuanian Morphemics Data10 has been compiled and it is freely available online. Users can search for wordforms and morphemes in the database, where he/she can find wordforms with marked morphemic boundaries (the type of morphemes is not given yet). The database is important that it shows the big morphemic variation in the Lithuanian language.
The data in the The Database of Lithuanian Morphemics Data can be useful for improving applications, for instance the data may improve the automatic morphological analysis by using information about potential affixes of a given word (currently the Lithuanian morphological analyser expects many non-existing wordforms). The data could also be used for the lemmatisation of multi-word units (see Boizou et al. 2015). Besides, the affix information could be used in the language teaching, as learners could be taught how to form new words and which derivative or grammatical senses result from adding a certain affix.

Nominal and Verbal Models of Morphemic Structure
This chapter discusses the most typical examples and the most productive models of morphemic structure as found in Lithuanian nominal words (nouns, adjectives, pronouns, and numerals) and verbal items. The results are based on the 310,000 word corpus that has been described in the previous chapter. Although a corpus of different size and composition would give slightly different results, we do not think that the difference would be essential. Figure 1 presents the quantitative information regarding structural models as identified for different parts of speech. It is obvious that parts of speech could be divided into three groups in terms of diversity of morphemic structures: verbal items belong to the high diversity group, nouns and adjectives to medium diversity group, and pronouns and numerals to low diversity group. Nouns and adjectives as well as pronouns and numerals have equal number of structural models in the present research.

Nouns
The analysed material comprises 142,086 noun tokens, covering 11,799 different word forms. All the nouns can be generalised as representing 63 structural models.
Nouns may have from one to nine morphemes. Bi-morphemic nouns are the most frequent while trimorphemic and quadri-morphemic nouns are somewhat less frequent. Nouns with 2-4 morphemes make up 96% of all analysed nouns (see Figure 2).
Bi-morphemic nouns dominate and they make up almost half of all noun usage cases (46.5%). These are mostly nouns that have a root and an inflection forming the most frequent noun model RF, e.g. darb-o 'work' (Nmsg12), žem-ės 'land' (Nfsg). Endingless Vocative case also belong to this group (the RS d model), e.g. tėt-uk 'daddy' (Nmsv), mam-ut 'mummy' (Nfsv). This model is very rare and it is only used in the spontaneous speech.
It should be noted that when the noun has two or more derivational suffixes, then only the last one (the one that precedes the inflection) is the derivational suffix of a given word, while the others belong to the stem, e.g. the noun darb-uo-toj-as 'employer' that is derived from the verb darb-uo-tis 'working', which is derived from the noun darb-as 'work'.
Hexa-morphemic nouns make up only 0.6% (21 models), but their usage frequencies are low. In fact, many occurrences of this model are unique. Three of the most frequent are PdRS d S d F (0.2%, e.g. Hepta-morphemic nouns fall into 9 models (0.05% of all nouns). The type of nouns in them are verbal abstracts derived from suffixal verbs, e.g. the Only two octa-morphemic nouns (0.02%) have been found in the data, the One nona-morphemic word that has 7 roots was also found: fibro-ezo-fago-gastro-duo-denoskop-ij-a 'fiber esophagogastroduodenoscopy' (Nfsn/Nfsi) (the RRRRRRRS d F model).
Simple nouns RF are the most numerous (approx. 47% of all nouns). However, they are not all simple, e.g. gėr-is is the inflectional derivative that is derived from the adjective ger-as 'good'. A large part of the nouns have derivational suffixes (approx. 42%); the number of derivational suffixes in these nouns range from one to four. More than one tenth of nouns (14%) have prefixes (from one to three). These nouns may also contain many derivatives of inflectional morphemes, and thus the number of their models is quite big (39 models). Only 4% of all nouns have two roots. In spite of such small percentage, their morphemic structure is very diverse (38 models). Note that if a noun has more than two roots, it must be a borrowing.

Adjectives
The analysed material takes account of 30,097 adjectives (word tokens); 11,799 of them are different wordforms, which can be generalised as representing 63 structural models.
Adjectives may have from one to seven morphemes. Adjectives of 2-4 morphemes, just like nouns, make up even 95% of all cases. However, in the case of adjectives tri-morphemic wordforms are the most frequent ones (bi-morphemic for nouns) (see Figure 3). Uni-morphemic adjectives as nouns are uninflected international words, e.g. makro, mikro, mezo (the R model, 0.02% of all adjectives).
Bi-morphemic adjectives are of two types: adjectives with a root and an inflection (the RF model is the second most frequent model of adjectives, e.g. aišk-u 'clear', sunk-u 'heavy' (both neutral gender).
The overall usage of adjectives is dominated by adjectives that have derivational suffixes (approx. 62%). More than one tenth of adjectives (14%) have prefixes. Compound adjectives are rare. Only 5% of all adjectives have two roots.

Pronouns
There are 39,738 pronouns in the analysed material; they include 734 different word forms. All pronouns fall into 10 structural models. Pronouns may have from one to five morphemes. Most frequent are bi-morphemic pronouns (see Figure 4).

Figure 4. Percentage of Pronouns with a Different Number of Morphemes
The most frequent uni-morphemic pronoun is aš 'I' (P0sn15), which belongs to the R model.

The longest penta-morphemic pronouns belong to 2 structural models: RRRS d F ka(-)ž-k-ok-s (0.42% of all analysed pronouns) and š-i-t-ok-s 'such' belongs to the RcRS d F model (0.09% of all analysed pronouns).
Two models (RF and RS d F) make the core of pronoun models (more than 90%). The overall usage of pronouns is dominated by primary pronouns (approx. 87%). The diversity of structural models is not possible in this case. 8% of pronouns are suffixal derivatives and belong to the structural model RS d F; only 5% of all pronouns have two roots. Connective morphemes, inflections, and suffixes determine their diverse morphemic structure: RR, RcR, RRF, RcRF, RRRS d F, RcRS d F. Pronouns with prefixes are very rare.
In sum, the most typical morphemic features of Lithuanian pronouns are as follows: they are primary, bi-morphemic, and belong to the RF model.

Numerals
The analysed material contains 5,008 numerals (word tokens) of 513 different word forms. All numerals may be generalised into 10 structural models.
Numerals have from one to five morphemes. As in the case of pronouns, the most frequent are bi-morphemic numerals; however, tri-morphemic ones are also numerous (see Figure 5).16 There are only two uni-morphemic numerals du 'two' and dešimt 'ten', but they are rather frequent (8.4% of all numerals).
Two thirds of all numerals are bi-morphemic. Most of bi-morphemic numerals are simple and definite, consisting of a root and an inflection, and belong to the RF model (66.4%, e.g. dv-i 'two' (Mcf0n17), tr-ys 'three' (Mcm0n)).
The numeral usage is dominated by primary numerals (approx. 74%, two structural models, R and RF). 13% of the numerals are suffixal derivatives and belong to the structural models RS d F, RS i F, and RS d S d F. 13% of all numerals are compounds; they show the structural models RcR, RRF, RcRF, RRS d F, and RcRS d F. Numerals do not have prefixes.
The data analysis has revealed that most typical numerals in the Lithuanian language are primary ones and suffixal derivatives with 1-3 morphemes. Out of 10 morphemic structural models for numerals, the most productive 3 models account for 87% of all wordforms. The three models are the following: RF (66.4%, e.g. vien-as 'one'), RS d F (12%, e.g. tr-eč-ias 'third'), R (8.4%, du 'two'). The generalised formula for numerals is RS d0-1 F 0-1 . They differ from pronouns by the rather high frequency of occurrence of R model.

Verbs
The analysed material comprises 92,468 verbs (word tokens) which include 29,878 different word forms. Their morphemic structures are very diverse: all the verbs can be classified into 116 structural models. However, their frequencies of occurrence are immensely different (from just 1 occurrence up to 18 thousand).
Uni-morphemic verbs are very infrequent. Just one occurrence has been found in the data: it is the shortened verb reik 'must' (main form, indicative mood, present tense, 3 person18), which has been used in spontaneous speech (the R model, 0.02% of all verbs).
All models can be classified into three groups according to their frequencies: a) 10 most frequent models that have occurred more than 2 thousand times (they make up 75% of all analysed verbs; see Table  2); b) 13 models of medium frequency that have occurred more than 400 times (10% of all analysed verbs); c) the 90 remaining models are very rare (they make up 5% of all verbs). To summarize, it can be said that only about 20 morphemic models of verbs are most typical of the Lithuanian language, while the rest of the models are peripheral. About one third of verbs are primary (approx. 33% of all verbs). These are verbs with a root, an inflection or (and) inflectional suffix: R (reik 'need' (main form, indicative mood, present tense, 3 person)), RF (20% of all analysed verbs, e.g. yr-a 'is' (main form, indicative mood, present tense, 3 person), buv-o 'was' (main form, indicative mood, past tense, 3 person), žin-ai 'you know' (main form, indicative mood, present tense, singular, 2 person)), RS i (6%, e.g. bu-s 'will be' (main form, indicative mood, future tense, 3 person), ei-k 'go' (main form, imperative mood, singular, 2 person), RS i F (8%, e.g. bū-t-ų (main form, subjunctive mood, 3 person), bū-dav-o (main form, indicative mood, past frequent tense, 3 person), bū-s-i (main form, indicative mood, future tense, singular, 2 person) 'to be forms'), RS i S i (e.g. bū-s-iant 'to be form' adverbial participle, future tense), and RS i S i F (very rare, e.g. kalb-ė-t-a 'spoken' (participle, passive voice, past tense, feminine, singular, Nominative/Instrumental/neutral gender)). Two thirds of verbs in the analysed material have derivational affixes. No verbs with all four types of derivational morphemes (prefix, suffix, reflexive particle, and two roots) were found. Verbs with three different derivational morphemes are also very rare (approx. 2% of all verbs). Typically, it is a prefix, a suffix, and a reflexive particle, e.g. su-si-rūp-in-o 'concerned', or a prefix, suffix and two roots, e.g. nu-foto-grafuo-ti 'to take a picture'. Verbs with two affixes of different types are relatively frequent (approx. 20% of all verbs). Characteristic combinations are as follows: a prefix and a suffix (e.g. ne-gal-ėj-o 'could not' (main form, indicative mood, past tense, 3 person)), a prefix and a reflexive particle (e.g. at-si-rad-o 'occurred' (main form, indicative mood, past tense, 3 person)), and a suffix and a reflexive particle (e.g. laik-y-ki-s 'hold on' (main form, imperative mood, singular, 2 person)). Most frequent are verbs with prefixes (approx. 28%) and suffixes (approx. 15%). Thus the most frequent types of derivational verbs contain a prefix (28%), a suffix (15%), a prefix and a suffix (12%), a prefix and a reflexive particle (6%), a reflexive particle (2%), a prefix, a suffix and a reflexive particle (2%), and, finally, a suffix and a reflexive particle (1%). Verbs with other combinations of derivational morphemes make up only about 1% of all verbs.
About one third (31%) of all used verbs have derivational suffixes. Verbs with two derivational suffixes are quite rare (only 2% of all verbs), while verbs with three suffixes are very uncommon, e.g. kalb-ėj-o 'spoke' (main form, indicative mood, past tense, 3 person), mok-y-toj-av-au 'I used to be a teacher' (main form, indicative mood, past frequent tense, singular, 1 person).
To summarise, typical verbs in the Lithuanian language are basic verbs consisting of 2-4 morphemes or verbs with one derivational affix. The generalised formula for a typical verb is P 0-1 RS d0-1 S i0-1 F 0-1 .  Figure 7 show that contemporary Lithuanian is dominated by bi-morphemic words with a root and an inflection (RF). Pronouns, numerals, verbs and nouns have the largest proportion of bi-morphemic words. Adjectives are mostly derivatives with the model root + derivational suffix + inflection (RS d F). Such a model is quite frequent among nouns too. The Figure 7 also suggests that suffixes are quite typical to inflected parts of speech. This fact coincides with one of language universalities, that "suffixes are more prototypical affixes than prefixes" (Plungian 2003: 90). The prefixal derivation is more characteristic to verbs rather than to nominal words. It is also evident from the Table 6 that while each part of speech is dominated by one or two morphemic models, morphemic models of verbs are distributed more evenly.  The generalised morphemic structure formula of a Lithuanian word is the following19: P 0-3 R 1-7 Sd 0-4 Si 0-2 F 0-1. Theoretically, one word may contain three prefixes, a reflexive affix, and two roots with a connective vowel, two suffixes, and an inflection; however, such words do not exist in the language. The maximum realisation of the formula is very rare, but possible, e.g. the noun -ne-su-der-in-am-um-as 'incompatibility' (Nmsn) (the PPRS d S i S d F model), the adjective -pa-vyzd-ing-iaus-iu 'with the most exemplary' (Amsis) (the PRS d S i F model), the verb -ne-į-si-są-mon-in-t-as 'not realised' (participle, passive voice, past tense, masculine, singular, Nominative/feminine, plural, Accusative) (the PPdPRS d S i F model).

Overview of Regularities of Morphemic Structures across Different Parts of Speech
There are differences between different parts of speech, for instance, numerals do not have prefixes, while pronouns can only have one prefix ne-. Lithuanian compound words have no more than two or at most three roots. Numerals and pronouns have only one or two derivational suffixes. Nouns do not have inflectional suffixes except those that are derived from verbs (see generalised morphemic structure in Table 4).
The As in many other languages, the Lithuanian language is dominated by one-root nouns. Two-root nouns are also quite frequent. The largest number of roots (even 7) has the international noun fibro-ezo-fagogastro-duo-deno-skop-ij-a 'fiber esophagogastroduodenoscopy' (Nfsn/i) (the RRRRRRRS d F model) in the analysed data. However, this individual case does not mean that multi-root words are typical to Lithuanian. By and large, international nouns have more roots than Lithuanian nouns. It should be noted, that sometimes two or more roots do not occur sequentially as they can be intervened by other morphemes (i.e. connective vowels), e.g. ei-l-ė-raš-t-uk-as 'a little poem' (Nmsn) (the RS d cRS d S d F model).
As it has already been mentioned, inflectional suffixes are not typical to nouns, as they occur only when a noun is derived from a verb, e.g. pri-klaus-om-yb-ės 'dependence' (Nfsg) (the noun is derived from the passive present tense participle pri-klaus-om-as 'dependent') or in the cases, when a suffix occurs in all cases of words (except singular Nominative) that have consonant endings of stems, e.g. vand-en-į 'water' (Nmsa) (where -en-by different researchers is treated as a part of root or as a suffix). The maximum number of inflectional suffixes is two, e.g. mir-št-am-um-as 'mortality' (Nmsn) (the RS i S i S d F model). However, onesuffix words are more frequent.
The prefixal derivation is not typical to adjectives and therefore they are not very frequent. Adjectives with two prefixes have been found in the analysed data, e.g. ne-pa-naš-us 'not similar' (Amsn) (the PPRF model). We have not found any adjective with more prefixes, however theoretically more prefixes could be used. Adjectives may have the maximum of three roots. Three-root adjectives may be of the international origin, e.g. mikro-bio-log-in-ės 'microbiological' (Afsg/Afpn) (the RRRS d F model), as well as of the Lithuanian origin, e.g. vien-uo-lik-meč-io 'eleven years old' (Amsg) (the RcRRF model). The most frequent adjectives are with one root, although two-root adjectives may also occur.
We have found adjectives with as much as 4 derivational suffixes, e.g. gy(-)v-en-im-išk-us 'true to life' (Ampa) (the RS d S d S d S d F model). The number of derivational suffixes is not large. Two-suffix adjectives are the most frequent ones, e.g. žin-om-iaus-iu 'the most famous' (Amsis) (the RS i S i F model; this adjective is of the verbal origin, which is shown by the suffix -om-, while the other suffix -iaus-is used to form the superlative degree).
Pronouns do not have many different morphemic structural models. They may have the maximum of two roots, e.g. kiek-vien-as 'everyone' (Pmsn/Pfpa) (the RRF model) and one derivational suffix, e.g. t-ok-s 'such' (Pmsn) (the RS d F model). Inflectional suffixes and prefixes are not typical to pronouns, although we have found one exception in the analysed data -ne-mūs-išk-ė 'not our' (Pfsn) (the PRS d F model).
Numerals have a rather simple morphemic structure. They may have the maximum of two roots, e.g. aštuon-io-lik-t-ojo 'the eighteenth' (Momsg) (the RcRS d F model). They also may have the maximum of two derivational suffixes, e.g. tr-ej-et-o 'triplet' (Mc00g) (the RS d S d F model), while inflectional suffixes are not typical to numerals.
The prefixal derivation is typical to verbs, therefore a large part of verbs have one, two or even three prefixes. It is typical for verbs that prefixes are not necessarily distributed sequentially, but they are often intervened by a reflexive affix, e.g. ne-į-si-są-mon-in-t-as 'unrealised' (participle, passive voice, past tense, masculine, singular, Nominative/feminine, plural, Accusative) (the PPdPRS d S i F model).
As far as roots are concerned, the maximum number of roots in verbs is two, e.g. bad-mir-iav-o 'starve to death' (main form, indicative mood, past tense, 3 person) (the RRS d F model).
Verbs may have up to four derivational suffixes, e.g. dė-st-y-toj-au-ti 'to lecture' (the RS d S d S d S d S i model) and up to two inflectional suffixes, e.g. bū-s-im-os 'to be in future' (participle, active voice, future tense, feminine, singular, Genitive/plural, Nominative) (the RS i S i F model).
The largest and the smallest structural model is always a rare phenomenon in the language. So what is typical to all inflected parts of speech according to this analysis? The generalised model P 0-1 RS d0-2 F is the most characteristic to nouns and adjectives. The root and the flexion are necessary morphemes for nouns and adjectives, if they have affixes, they are typically limited to 1 prefix and 2 derivational suffixes. The prefix is not a characteristic affix for numerals (RS d0-2 F 0-1 ) and pronouns (RF), while flection-less numerals are common in the natural language. Verbs (P 0-1 RS d0-1 S i0-1 F 0-1 ) differ from other inflected parts of speech that flection-less forms with inflectional suffix are quit common in the language.
Although our research is synchronic, but a couple of notes regarding historical aspects may be observed. The grammatical categories inherited from Proto-Indo-European are typically expressed by multisense flexions, e.g. gender: ger-as -ger-a 'good', number: ger-as -ger-i; case ger-as -ger-o, ger-am; person: dirb-u -dirb-I 'work'; while more recent grammatical categories are expressed by inflectional suffixes, e.g. future and frequentative tense dirb-u -dirb-s-iu, dirb-dav-au.

Concluding remarks
Having analysed a 310 thousand-word Lithuanian corpus (where nouns make up 46% of all words, verbs -30%, pronouns -13%, adjectives -10%, and numerals -2%), some important conclusions may be drawn about the morphemic structure of the Lithuanian language.
Lithuanian inflective parts of speech may consist of up to 9 morphemes. Word forms of two, three, and four morphemes make up the largest part of the Lithuanian vocabulary (93% of all analysed words). The data contained 43% of bi-morphemic words, 33% of tri-morphemic, and 17% of quadri-morphemic words. The number of morphemes in a word is not limited. One can say that uni-morphemic words and words that are longer than 4 morphemes belong to the language periphery.
The range of morphemic structural models as found for different parts of speech is very diverse: numerals and pronouns have just 10 models, nouns and adjectives 63, while verbs -116. In addition, frequencies of different morphemic structural models are quite different. The most frequent are simple words and words with one suffix or one prefix: RF (41% of all analysed words), RS d F (17%), and PRF (8%). The following three models are also among the most frequent: RS d S i F (6%), RS i F, and PRS d F (3% each). These six models make up even 78% of all inflective parts of speech in the analysed texts.
It may be concluded that the most typical words in the Lithuanian language are primary words with 2-4 morphemes.
The authors plan to continue the morphemics research, while further focusing on the statistical analysis of the data, e.g. describing regularities of the Lithuanian language according to Menzerath-Altmann's law (cf. Köhler et al. 2005: 349), statistical regularities of morpheme and word length (cf. Meyer 1999), morphemic ambiguity, and frequency regularities of morphemic usage (cf. Krott 1999).

Abbreviations c
connecting vowel d reflexive affix F flection P prefix R root Sd derivational suffix (derivational suffix is used to create new words, e.g. nouns from verbs or adjectives from nouns) Si inflectional suffix (inflectional suffix is used to create new grammatical forms, which are not new words, e.g. degrees of adjectives and adverbs, past tense verbs, or participles)