Forensic Linguistic Inquiry into the Validity of F 0 as Discriminatory Potential in the System of Forensic Speaker Verification

In Indonesia, some court cases involve speech recordings of suspects as legal evidence. In the courts, an expert is invited to explain for the verification to find out whether the speech recordings are spoken by the suspects or not. The task is called as Forensic Speaker Verification (FSV). It is one of the areas in the application for Forensic Linguistics as the provision of linguistic evidence [1]. FSV system includes an analysis of speech recordings to verify the voice of a criminal. In the system, fundamental frequency (F0) is one of the acoustic features which are extracted from the speech data [2]. Then, they are analyzed as the discriminatory potential. It is important to note that there is always a question in the context of human speech sound and its forensic relevance as an inquiry into its validity [3]. The critical question brings the requirement for always reviewing an available system in forensic speaker verification or identification. Later on, the review can be used for the improvement of the system. In line with that, the paper aims at reviewing the method in Indonesian FSV system in terms of the extracted acoustic feature of F0, which is used as the discriminatory potential. Method


Introduction
In Indonesia, some court cases involve speech recordings of suspects as legal evidence.In the courts, an expert is invited to explain for the verification to find out whether the speech recordings are spoken by the suspects or not.The task is called as Forensic Speaker Verification (FSV).It is one of the areas in the application for Forensic Linguistics as the provision of linguistic evidence [1].FSV system includes an analysis of speech recordings to verify the voice of a criminal.In the system, fundamental frequency (F0) is one of the acoustic features which are extracted from the speech data [2].Then, they are analyzed as the discriminatory potential.It is important to note that there is always a question in the context of human speech sound and its forensic relevance as an inquiry into its validity [3].The critical question brings the requirement for always reviewing an available system in forensic speaker verification or identification.Later on, the review can be used for the improvement of the system.In line with that, the paper aims at reviewing the method in Indonesian FSV system in terms of the extracted acoustic feature of F0, which is used as the discriminatory potential.

Method
The data are derived from Indonesian speech sounds of two (2) telephone conversations with Speakers LR (f;21) and MR (m;23) in the first conversation; Speakers DS (f;22) and RD (m;22) in the second.The data were recorded in Centre for Studies in Linguistics, Bandar Lampung University.The conversation is designed as a simulation for a corruption case.The speech data are categorized as Unknown (Uk), following the scenario used in Indonesian FSV system [4].For Known (K) category, twenty (20) words spoken by each speaker are recorded to be paired with the same words in Uk sample.Praat [5] is used for the acoustic analysis of K and Uk samples.Oneway analysis of variance (ANOVA) and Likelihood Ratio (LR) approach are used to evaluate the findings statistically.sample, it is already known who the speaker is.Meanwhile, the Uk sample is derived from speech data of a recorded telephone conversation which is not known yet who is speaking.The main purpose of the comparison is to find out who is really the speaker in the recorded telephone conversation.The evaluation provides some evidence if the speaker is the suspected person or not.In presenting the evidence, there are four main steps to conduct data analysis such as pairing, tagging, acoustic features extraction, and statistical analysis.In the analysis, F0, F1, and F2 are observed to find out the patterns of habitual pitch range, minimum-maximum pitch, first-second formant, and speaking style for pitch and formant.As the evaluation of the current Indonesian FSV system, there is a claim that it "meets the demand for presenting legal evidence in Indonesian court" [6].
To review the method in Indonesian FSV system, we scrutinize F0 in the data which are used as the discriminatory potential.
For each speaker, we paired up twenty (20) words in Uk samples with those in K samples (Tables 1 & 2).Therefore, in both K and Uk samples, there are a total of one hundred sixty (160) words., there are a total of one hundred sixty (160) words.Among the four speakers participating in the telephone conversations in the simulation for a corruption case, RD is treated as a suspect.Then, each word is analyzed in terms of its acoustic feature -F0.
The following figures exemplify the acoustic features of the word rancangan 'design' spoken by Speaker RD in K (Figure 1) and Uk samples (Figure 2).It is in default pitch setting: 75 -500 Hz.F0 contours as the physical correlates to the speaker's pitch are represented in blue lines in the second window in Praat display.The red contours, in the same window, represent the speaker's formant frequencies.Since the speech sounds are spoken by the same speaker, we presume that pitch values of RD's speech in K and Uk samples will match.However, it is found that in the pitch analysis of its mean and standard deviation (SD) of 20 words spoken by RD in K and Uk samples, only few values match (Figure 3).
In the pitch analysis of minimum and maximum values, it is also found that the maximum values in the Uk samples do not match their K counterparts (Figure 4).Meanwhile, the minimum values in K and Uk samples only match at several points.In addition, in one-way analysis of variance (ANOVA), it is also found that the pitch of each word spoken by RD in K and Uk samples is significantly different (p<0.05).RD's F0s are significantly different in both K and Uk samples (Figure 5).

Journal of Forensic Sciences & Criminal Investigation
Further, for the evidence evaluation using Likelihood Ratio (LR) approach [2], we analyze the probability of the samples.The result indicates that the pitch in the data can be categorized as 'very strong evidence against' the fact that the K and Uk samples are derived from the same speaker (LR<0.0001).
From the results in one-way ANOVA and LR approach, it can be inferred that F0 cannot be used as a discriminatory potential in the experimental data.ANOVA says that the pitch in K and Uk samples is significantly different.And LR also indicates that the sounds are derived from different speakers.In the contrary, they are from the same speaker, i.e.RD.We highlight three main problems that may arise in terms of fundamental frequency (F0) used as the discriminatory potential based on the experimental data following Indonesian FSV system.The first problem is about the default setting in pitch range for analysing connected speech [7].The F0 reading with the default setting may not show the actual value of the speaker's F0 [Figure 6].The second problem is about the telephone transmission [8].The transmission could have effects [9], especially on the vowel quality [10] that may result in the discrepancy in values of the speaker's F0.The third problem is about the lack of theoretical background for the Indonesian FSV system which uses F0 as one of its discriminatory potentials.

Conclusion
F0 as the physical correlates to a speaker's pitch is analyzed to review the method in Indonesian FSV system.In the experimental data, although the speech data are derived from the same speaker (RD), only few values in pitch analysis of its mean and SD in K and Uk samples match.Maximum and minimum pitch values also show the same result.Furthermore, using one-way ANOVA and LR approach, the study proves that it fails in providing the evidence for F0s derived from the same speaker.Therefore, it is suggested that more studies should be proposed to look at another strategy if F0 is still used for Indonesian FSV system, e.g. using pitch alignment features [11], adjusting advanced pitch settings and framing sentences by using the intonation system [7], and considering the effect of pitch span on intonational plateau [12].Highlighting some functional aspects in the conversational structure in spontaneous dialogue [13] is also necessary to consider in getting the required K and Uk samples.Moreover, insights on phonological variation for discriminatory aspects in forensic speaker verification [14] and other related aspects in forensic phonetics [15,16] and forensic linguistics [17] are suggested to the system as some of theoretical backgrounds to provide linguistic evidence in legal settings.The experimental study on Indonesian FSV system leads us to propose a scenario for forensic speaker verification [Figure 7].In the system, K and Uk samples are paired for the same words.For tagging, syllables are derived from the paired words.Starting from pairing to the end of tagging, a control is conducted to scrutinize the effects of telephone transmission.Then, it moves forward to the acoustic feature extraction.Starting from the acoustic feature extraction to the end of statistical analysis, a filter is implemented to get high qualified performance.The filter is in terms of what acoustic features will be analyzed, what the theoretical backgrounds are for the analysis, and how the factors of reliability and validity can be achieved.Finally, the result is ready to present as legal evidence.

Figure 1 :
Figure 1 : Acoustic features of RD's word rancangan 'design' in K sample.

Figure 2 :
Figure 2 : Acoustic features of RD's word rancangan 'design' in Uk sample.

Figure 3 :
Figure 3 : Mean pitch and its standard deviation (SD) of RD's speech in K and Uk samples.

Figure 4 :
Figure 4 : Maximum and minimum pitch of RD's speech in K and Uk samples.

Figure 5 :
Figure 5 : F0s of RD's speech rancangan in both K and Uk samples.

Figure 6 :
Figure 6: F0 reading in default pitch setting and the actual value of the speaker's F0.

Figure 7 :
Figure 7: Steps in forensic speaker verification system.

Table 1 :
Target words for K and Uk samples in telephone conversation 1.

Table 2 :
Target words for K and Uk samples in telephone conversation 2.