Mapping the Dimensions of Linguistic Distance: A Study on Quantitative and Qualitative Geolinguistics of Banjar Sundanese Dialect

The study was motivated by the factual condition of methodological and theoretical deficiencies promoting the mapping and classification studies of Sudanese Dialect. The study aims to investigate (1) the comprehensive regional classification of Banjar Sundanese Dialect and (2) the linguistic and non-linguistic factor identification supporting the regional distance in the classification of Banjar Sundanese Dialect. In this case, the study applied a combination method (mixed research methods). The data were collected through participant observation. Furthermore, the calculation of Banjar Sundanese Dialect linguistic distance employed the Levenshtein algorithm in Gabmap. Multidimensional scaling was used to ensure the reliability of the clustering results. Based on the calculation results of the linguistic distance, Banjar Sundanese Dialect can be classified into three sub-dialects, namely the standard Sundanese sub-dialect; the Java-influenced sub-dialect; and the Java-dominated sub-dialect. The study reveals that there are significant differences between Banjar Sundanese Dialect, especially Purwaharja and Langensari sub-dialects, and Standard Sundanese Dialect. One of these differences is caused by the influence of the Javanese language.


I. INTRODUCTION
Theoretically, language becomes one of the important aspects of human life, especially in the process of communication and interaction (De Stefani & De Marco, 2019;DiStefano et al., 2016;Mondada, 2016;Nasrullah et al., 2019). One of the intentions of the human interaction process is to associate perceptions, ideas, thought, expectation, and so on. The communication and interaction processes will not work properly if the medium and device are not fulfilled. Those mediums and devices are through languages in the context of oral communication between humans. Therefore, in the process of communication and interaction, language occupies an important position in human life. There is a frequent difference leading to conflict in the process of communication (Han & Wu, 2020). In the communication process, horizontal conflicts are frequently in line with errors and misperceptions among humans. Therefore, mastering the language as a medium is one of the efforts to minimize horizontal friction.
The speech area has the first dominated language used as an introduction to communication among members. Local languages frequently experience complex dynamics along with their development, mainly when it is associated with the development of globalization with the fact that English has become an international language (Davitishvili, 2017;Tupas, 2015). Moreover, this dynamic is getting more complex in Indonesia since local languages confront the language politics that renders them national and state languages. On the other hand, because of this policy, the Indonesian advantage in enacting the language as a national language can reduce miscommunication and misperception due to language differences in the inter-ethnic communication process. Inter-ethnic communities can liberally communicate with each other regardless of their regional language differences if the Indonesian language is treated as the national language. However, on the other hand, there is a great impact from this language policy on the existence of local language which gradually shifting (Rahmi, 2015;Sudaryanto et al., 2019).
Many ethnic groups are spreading from Sabang to Merauke in Indonesia. Each ethnic group has its language and culture. The local languages existing in a speech area are used as a means of communication between communities within the ethnic group. Besides, they also have cultural diversity other than language property. Therefore, the differences in language and culture must be preserved as intellectual property to maintain the diversity and unity of the nation and state.
Sundanese is set as the mother tongue and used as a communication tool by the communities daily. The existence of a language policy that makes Indonesian the national language at least affects the use of regional languages, including the use of Sundanese by the Sundanese communities. Therefore, the existence of Sundanese, as well as other local languages, is starting to get functional pressure, along with the existing Indonesian language policy.
Based on the fact above, Sundanese speakers seem reluctant to use their local language, even though they are originally Sundanese speakers. Therefore, whether it is realized or not, the Sundanese language is changing. This phenomenon can lead to the impact of language extinction if the condition persists.
Banjar is a part of West Java Province which has a very complex dynamic level of language and culture. In addition, the geographical site of Banjar City which is directly adjacent to Central Java Province is one of the causes of this complexity. This situation makes Banjar become one of the language enclaves that allow two or more languages to interact. Linguistic interaction will also issue complex linguistic dynamics. The majority of the population of Banjar is Sundanese. Many Javanese lived and settled in parts of West Java, including in Banjar City, due to the political policies of the New Order era, such as migration and population distribution in Indonesia. Moreover, this policy also has a major impact on socio-cultural and linguistic changes in Banjar City. The life of the communities in Banjar reflects the language mixing, cultural behavior, and religious value because of the social, cultural, and linguistic interactions between Sundanese and Javanese in several areas in this city. This is certainly adjusted to their daily needs and the ease of each other's interaction and communication. Banjar Sundanese Dialect is a fascinating phenomenon to study in a geolinguistic framework. The linguistic mapping process has not only yet been carried out in a complex and complete manner, but also it is because the geographical location of Banjar City also directly contacts the Central Java Province which can lead to language and cultural interaction.
The study of mapping Sundanese dialects is not a new study. There have been some studies that have contributed to the Sundanese Dialect mapping both specifically carried out in parts of the East Priangan region (Garut, Tasikmalaya, Ciamis, Banjar, and Pangandaran) as well as in other Sundanese speech areas (Munawarah & Datang, 2019;Rahmawati & Lestari, 2017;Thamrin & Isnendes, 2019;Widyastuti, 2017). However, despite the contributions of several studies that have been carried out, the problem of language mapping specifically Sundanese dialect is still the subject of academic discussion among dialectologists. More specifically, to the best of the researcher's knowledge, there is no consensus among Sundanese dialectologists on the problem of classifying the Sundanese dialect itself. There are several reasons related to the disagreement, such as the lack of linguistic data which is a marker of the dialectal distribution of Sundanese; language migration wave; and the long history of language contact between Sundanese and non-Sundanese, particularly Javanese. Based on various reasons above, the most important factor that is often overlooked is the methodology and theoretical weaknesses existing in several previous studies. If observed more specifically, almost all previous studies on the mapping of Sundanese dialects only focused on descriptive qualitative mapping, instead of directing to the mapping and distribution of Sundanese dialects with quantitative and mixed paradigms.
This study is motivated by the factual conditions of methodological and theoretical deficiencies promoting the studies around the mapping and classification of Sundanese dialects. Due to the increasing challenge to create various mutual innovations between Sundanese and Javanese which experience linguistic contact and have a complex contact history of language especially in Banjar City, the classification of Banjar Sundanese dialect demands a different approach from the traditional comparative method. There are recent innovative approaches such as Historical Glottometry (Daniels et al., 2019;Elias, 2019;Leddy-Cecere, 2021) combining traditional comparative methods with areal classification to overcome issues related to language classification. This study employs the Dialectometry method, which measures the distance between related languages, based on randomly selected aggregate data (Dorta & González Rodríguez, 2019;Dunn, 2018;Saddhono & Hartanto, 2021;Wolk & Szmrecsanyi, 2018). Using this approach, this study aims to examine (1) the comprehensive regional classification of Banjar Sundanese Dialect and (2) the linguistic and non-linguistic factor identification supporting the regional distance in the classification of Banjar Sundanese Dialect.

II. METHOD
There are two approaches used in this study, namely the theoretical and methodological approaches. Theoretically, the approach used in this research is a dialectological approach that focuses on the study of dialectometry. Methodologically, this research used a combination method (mixed research methods). In this case, mixed method can be comprehended as a research approach combining and connecting qualitative and quantitative research methods (McKim, 2015;Pelto, 2015). The consideration of using this method in this study is that the method can describe, explain, and build the relationship from the discovered categories and data.

A. Procedures
This study is divided into three stages, namely the stage of providing data, the stage of data analysis, and the stage of presenting the data analysis results. The implementation of the research begins with the provision of data using the following techniques: (1) participation techniques, (2) observation techniques, (3) interview techniques, and (4) the reconstruction and introspection techniques of intuitive data. The first step carried out at the stage of storing data is to determine and establish the area as the location to implement this research.

B. Data Collection
As the data collection, this study employs participant observe or participatory observation with introspection, in-depth interviews, and document review (Lopez-Dicastillo & Belintxon, 2014;Simonÿ et al., 2018). Furthermore, this study uses triangulation techniques to examine the stability and validity of the data that has been accumulated. Data triangulations are viewed as an attempt to compare and re-check the degree of credibility of information obtained at different times and tools in qualitative methods (Jentoft & Olsen, 2017;Kern, 2016;Renz et al., 2018). Moreover, the source of triangulation is applied in this study by comparing the data from the observation results with the interview results from the informants at the research site and also comparing the interview results with the document content, involving demographic data and secondary sources in the form of historical data from the Culture Service of West Java and Central Java Provinces.
The list of questions used in this study consisted of 300 basic Swadesh vocabularies which were modified according to the needs and objectives of the study. The question lists consist of three groups of questions and those are divided into 12 parts, which are vocabularies that contain the following meanings: kinship system, pronouns, body parts, house parts, tools, condition, parts of nature, plants and fruits, animals, food/drinks, characteristics/conditions, abstract expressions, and verbs, as well as question words, conjunctions, and others. The vocabularies used are everyday Sundanese vocabularies.
This study was conducted in Banjar City, West Java Province which consists of four sub-districts, namely Banjar, Pataruman, Purwaharja, and Langensari Districts. Five respondents were selected in the age range of 20-60 years in each observation area. It can be interpreted that 20 respondents were selected spreading over 4 predetermined observation areas in this study. As a result, there were 11 females and 9 males as total. The collected data was analyzed using quantitative and qualitative approaches. In this case, quantitative analysis is directed at calculating the dialectometry of Banjar Sundanese dialect classification and its mapping using a Gabmap (Leinonen, 2016;Nerbonne, 2011). The qualitative analysis was carried out to display several lingual symptoms in the Banjar Sundanese dialect. The linguistic distance between the sub-dialects in the Banjar Sundanese dialect is calculated based on the obtained data from the fieldwork. The entire vocabulary asked the respondents are classified phonetically and lexically so that there are only 30 chosen vocabularies fulfilling the criteria. Furthermore, the linguistic distance in this study was determined based on the phonetic and lexical distances of the Banjar Sundanese dialect. The linguistic distance of the Banjar Sundanese dialect is calculated using the Levenshtein algorithm in Gabmap. Levenshtein's algorithm gives absolute and relative (normalized) string distances. For the next step, a series of cluster analyses were carried out on the linguistic distance matrix.
As mentioned in the previous section, this study employs Gabmap to classify the Banjar Sundanese dialect and validate the classification. Gabmap is a web-based dialect classification and visualization developed by computational linguists at the University of Groningen. Besides, Gabmap provides several alternative statistical calculations for cluster analysis (Ward Method, Complete Link, Group Average, and Weighted Average). An alternative classification was introduced to Gabmap based on the results of Prokić ND Nerbonne (2008), which evaluated the stability of the cluster method. Gabmap offers two cluster validation techniques, namely multidimensional scaling, and fuzzy grouping to manage the analysis properly. In this study, multidimensional scaling was used to ensure the reliability of the clustering results. Gabmap provides multidimensional scaling plot results with several corresponding dialectal maps. Furthermore, Multidimensional scaling plots show the distance between sub-dialects in n-dimensional space. In other words, the entire distance matrix is applied as an input, and the substitution is provided in an n-dimensional space, where distance approximates the actual linguistic distance. The results of several multidimensional scaling measurements can be plotted in a Cartesian coordination system. Similar data points are accumulated close to each other in the plots. The cluster validation section of Gabmap offers the options that can be used to complete the classification by excluding discrete clusters and narrowing the analysis to sub-dialects that are not properly assigned to the same cluster. The results from this multidimensional scaling can also be used to examine the magnitude of variance checked by each measurement. The first measure of multidimensional scaling generally provides a view of the amount of variance presenting in the data. On the multidimensional scale, the data points with the same value are always displayed sequentially. Besides, Gabmap automatically yields language dialect maps using Google Earth and linguistic distances as an input.

III. RESULTS AND DISCUSSION
This section presents the results of calculating linguistic distances in the Banjar Sundanese dialect which refers to the phonetic and lexical classifications. Then, the results are combined with the results of multidimensional scaling.

A. The Classification of Banjar Sundanese Dialect
As explained in the previous section, the classification of the Banjar Sundanese dialect is complemented by the multidimensional scaling results provided by Gabmap. Through the process of calculating data, Gabmap provides a map for each multidimensional scaling. The multidimensional scaling of the classification of Banjar Sundanese Dialect can be seen in Fig. 1 and Fig. 2 below.    1 shows a multidimensional scaling plot of the linguistic distance in a two-dimensional space. The first dimension is shown by a solid arrow and the second dimension is indicated by a dotted arrow. From Fig. 1, it can be indicated that the first dimension shows that Sundanese in Banjar and Pataruman subdistricts has the lowest linguistic distance value, while Sundanes in Langensari sub-district has the highest linguistic distance value. Furthermore, Sundanese in Purwaharja District is in between these two extremes.
The second dimension (dotted arrow) shows us that Sundanese in Pataruman sub-district has the lowest linguistic distance value, while Sundanese in Langensari sub-district has the highest linguistic distance value. Sundanese in Banjar and Purwaharja sub-districts is in between these two extremes.
From the multidimensional scaling results presented in Fig. 1, it can be certainly seen that Sundanese in Banjar and Pataruman sub-districts form one group. However, Sundanese in Purwaharja District is independently separated. Furthermore, Sundanese in Langensari District is a separate sub-dialect. Based on the results of the multidimensional scaling of the Banjar Sundanese dialect in Fig. 1 above, it can be seen that the Banjar Sundanese dialect can be classified into three sub-dialects, namely the Banjar and Pataruman sub-dialects, the Purwaharja sub-dialects, and the Langensari sub-dialects.
The results are emphasized with the mapping results of multidimensional scaling in the Banjar Sundanese dialect. Fig. 2 shows the first-dimensional map of the multidimensional scaling results for the linguistic distance of the Banjar Sundanese dialect.
The light color indicates the area with the highest linguistic distance, namely Langensari. Furthermore, the mapping concludes that Sundanese in Banjar and Pataruman sub-districts are grouped into one subdialect. Furthermore, to obtain the results of the classification of subdialects in the Banjar Sundanese dialect, a linguistic distance calculation was carried out with the output in the form of a dendogram. The dendogram obtained from the calculation of the linguistic distance is shown in Fig. 3 and Fig. 4 below.  In the dendrogram (Fig. 3) above, very similar sub-dialect variants are shown in the same color (for example, the Banjar sub-dialect and the Pataruman sub-dialect), while those that are not the same subdialect are shown in different colors, such as the Purwaharja sub-dialect and the Langensari sub-dialect, even though they are both are in the same group. The dendrogram above also shows the sub-dialect classification of the Banjar Sundanese dialect which is the same as the multidimensional scaling results in the previous section. Fig. 4 confirms the subdialect classification of the dendrogram results. In Fig. 4, very similar sub-dialects are shown in the same picture, namely dark blue (Banjar sub-dialect and Pataruman sub-dialect).
Based on the previous explanations, Banjar Sundanese Dialect forms three sub-dialect clusters, namely the Banjar and Pataruman sub-dialects; the Purwaharja sub-dialects; and the Langensari sub-dialects. The three sub-dialect clusters are determined based on the calculation of the linguistic distance from the entire existing linguistic data. The linguistic distance that determines the sub-dialect classification is calculated based on the phonological and lexical aspects of the existing linguistic data.
The linguistic distance between the sub-dialects of Banjar Sundanese Dialect can be relatively seen in the following Fig. 5. From Fig. 5 above, it can be seen that the linguistic distance between sub-dialects is based on phonological and lexical aspects. The linguistic distance is indicated by colored lines, ranging from dark to light colors. The dark line indicates the closest linguistic distance. On the other hand, light-colored lines show a large linguistic distance.
The Banjar sub-dialect and the Pataruman sub-dialect are connected by a dark blue line. It can be indicated that the linguistic distance between the two subdialects is very close, even categorized as the same sub-dialect. Furthermore, the Pataruman sub-dialect with the Purwaharja sub-dialect is connected by a faded blue line. It shows that two sub-dialects have a relatively close linguistic distance, although there are some dialectal differences. The Banjar sub-dialect and the Purwaharja sub-dialect are connected by a faded line that approaches white. This indicates that the two sub-dialects have a remote linguistic distance. Finally, the Purwaharja sub-dialect with the Langensari sub-dialect is connected by a light blue line. It indicates that the two sub-dialects have a fairly close linguistic distance but are categorized as two different sub-dialects.
The linguistic distance between the sub-dialects of the Banjar Sundanese dialect is also emphasized by the results of mapping referent points based on the quadratic distance method. The mapping of referent points can show the influence between two variables, namely linguistic distance, and geographical distance. Why is geographical distance very important in determining language classification? It is because, in some cases, geographical distance also affects the characteristics of each sub-dialect presenting in a language. The inter-dialect of overall linguistic and geographical distances of Banjar Sundanese dialect can be seen in Fig. 6, Fig. 7, Fig. 8, and Fig. 9. From Fig. 6, Fig. 7, Fig. 8, it can be seen that geographical distance has no significant impact on linguistic distances between sub-dialects. In Fig. 6, the position of Banjar District which is geographically less than 6 km from Purwaharja District has a fairly high linguistic distance, namely 0.15, while the geographical distance of Banjar and Pataruman Districts is 6.5 km (further than the distance between Banjar and Purwaharja) but in fact, the two have a very close linguistic distance. Likewise, Fig. 7, Fig. 8, and Fig. 9 show the conditions that are not much different from those shown in Fig. 6 where the geographical distances are not directly proportional to the inter-dialect of linguistic distances (see also Fig. 10 below).   10 emphasizes the previous explanation that the geographical distance is not directly proportional to the linguistic differences between sub-dialects existing in Banjar Sundanese Dialect. Therefore, this indicates that geographical distance is not a determining factor for the inter-sub-dialect of linguistic differences.
Based on the classification of sub-dialects and the calculation of inter-sub-dialect linguistic distances in the Banjar Sundanese dialect, it can be seen that there are three sub-dialect clusters, namely the Banjar and Pataruman sub-dialects, the Purwaharja sub-dialects, and the Langensari sub-dialects. The three subdialects are determined by the differences in the linguistic distance between the sub-dialects. However, the fascinating phenomenon from the language condition in Banjar City is that the determination of the subdialect classification is only influenced by linguistic distance. In addition, the geographical distance of inter-sub-dialects is not directly proportional to the linguistic differences.
The results of Banjar Sundanese dialect classification indicate that the determining factor of the linguistic differences of inter-sub-dialect is more influenced by the interaction between Sundanese and Javanese. The Langensari sub-dialect is heavily dominated by Javanese influences. This can be seen from the phonological and lexical aspects. Many Javanese vocabularies are fully absorbed by the communities in Langensari District, such as the words /gəni?/ 'fire', /ləmah/ 'land', /urip/ 'live', /gəgər/ 'back', and /baluŋ/ ' bone'.
Banjar Sundanese dialect, especially Purwaharja and Langensari sub-dialects, has a significant difference from the standard Sundanese. One of these differences is geographically and demographically due to the influence of the Javanese since many Javanese have lived and settled in parts of Banjar City. Historically, there has been a migration of Javanese ethnicity to the Banjar City area. Ethnic migration has in a way influenced the cultural and lingual circumstances in the Banjar area. Therefore, it is acceptable that the Banjar Sundanese dialect is different from standard Sundanese.
The variation of the Banjar Sundanese dialect will denote significant differences from standard Sundanese. These variations can be viewed from the phonological and lexical aspects. Phonologically, there are lingual symptoms in the form of phonological correspondence. On the other hand, lexically, many integrated absorptions of Javanese vocabularies were adopted into Sundanese.
The dialectal variations of the phonological aspects existing in Banjar Sundanese dialects are the correspondence of Ø ~ = h / #-; the correspondence of w ~ = b / #-; the correspondence of i ~ = ε / #-; the correspondence of Ø ~ = k / #-; the correspondence of o ~ = u / #-; the correspondence of Ø ~ = h / #-. There are several Banjar Sundanese dialect vocabularies which have the correspondence of Ø ~ = h / #-. Some of these vocabularies are shown in Table I. The correspondence of Ø ~ = h / #-on Banjar Sundanese dialect occurs with various location variations, some of which occur in the position of the initial phoneme, middle phoneme, and back phoneme.
1) The correspondence w ~ = b / #-In addition to the correspondence of Ø ~ = h / #-, Banjar Sundanese dialect also has the correspondence symptom of w ~ = b / #-. Some of the vocabularies experiencing this correspondence are shown in Table II. 2) The correspondence of i ~ = ε / #-In addition, there are also other phonological symptoms, namely the correspondence of i ~ = / #-in the Banjar Sundanese dialect. This correspondence can be seen as Table III. 3) The correspondence of Ø ~ = k / #-There is also a phonological symptom of the correspondence of Ø ~ = k / #-. This phonological phenomenon only occurs in the final phoneme of each word. Here are some vocabularies experiencing the correspondence symptoms of Ø ~ = k / #-.

4) The correspondence of o ~ = u / #-
The last phonological phenomenon in the Banjar Sundanese dialect is the correspondence of o ~ = u / #. There is only one vocabulary that appears with this phonological symptom, which is given in Table V. In addition to phonological symptoms, other symptoms appear in the Banjar Sundanese dialect as the influence of the Javanese language, namely lexical symptoms. The emergence is relatively large.
The comprehensive absorption of the Javanese lexicon into Banjar Sudanese dialect occurs in several word classes, namely nouns, verbs, and adjectives. The followings are some vocabularies in the Banjar Sundanese dialect that have experienced the influence of Javanese in the form of the comprehensive absorption of the Javanese lexicon into Banjar Sundanese dialect as shown in Table VI below.

IV. CONCLUSION
Based on the results of the linguistic distance calculation, the Banjar Sundanese dialect can be classified into three sub-dialects, namely the standard Sundanese sub-dialect; the Java-influenced sub-dialect; and Java-dominated sub-dialect.
Geographical distance is not a determining factor for the linguistic differences of inter-sub-dialects. The three sub-dialects are determined by the difference of linguistic distance which the inter-sub-dialects, Banjar Sundanese dialect especially Purwaharja and Langensari sub-dialects, has a significant difference from the standard Sundanese. One of these differences is geographically and demographically due to the influence of the Javanese since many Javanese have lived and settled in parts of Banjar City.