Forensic Investigation of MP3 Audio Recordings

. Special aspects of MP3-recordings technical investigation are addressed. The following features of formation and research of MP3 phonograms are explained: traces of MP3 coding in time and spectral domain, special aspects of MP3-files structure analysis, detection methods of re-coding of MP3-recordings, methods of group identification of MP3-recorders and MP3-codecs. MP3 coding leaves certain traces of its usage. Due to the psychoacoustic model inaudible spectral components are deleted from the signal spectrum. Traces of psychoacoustic codecs usage are also clearly seen via dynamic spectrogram as rectangular areas of zero spectral amplitude. The methods discussed in this paper enable the investigating expert to detect the exact position of the MP3 frame in the signal by its properties even without any information from the file header. This method reveals the coding itself, multiple coding and also audio editing by the investigation of the periodicity of the extracted frames’ positions. MP3 file format specifies the structure of the frame header providing a perfect instrument to detect any periodicity of any peculiarities of MP3 frames. The tool based on this approach reveals MP3 frames disorder caused by editing in the “digital” domain – manual deletion of audio information using HEX editor.


Introduction
One of the key demands to audio recording as a case evidence is its authenticity.Authenticity analysis of speech recordings is a typical task of forensic audio examination.Even negative result of recordings' integrity gives the basis for further investigation.
Nowadays the majority of speech recordings are created by the means of digital audio recorders and the volume of recorded media is only increasing just as in all the other aspects of human life.Audio file format differs from device to device and supports different types of coding and compressing algorithms.
One of the most widespread audio file formats is MP3.Popularity of this format is historically based on psychoacoustics approach -the fact that the codec should not tamper with audio components that in any way (corresponding to particular sound environment) cannot be heard by a human's ear (frequency and time masking effects) and, thus, can be skipped during compression.So, ceteris paribus, the listener cannot hear the difference between original and compressed data.
The branch of audio codecs (audio coding algorithms) that today is called MP3 was developed at the end of XX century by MPEG GROUP.This branch includes codecs standards MPEG 1 1 , MPEG 22 and MPEG 2.5 with three Layers: I, II and III.The most widespread is Layer III -MP3 format and MP3 file extension.Due to openness of engine source producers of digital recorders proceed to develop a number of modifications, and even under new names.
This article is dedicated to revealing the traces of MP3 coding that can be helpful during authenticity analysis of evidential audio, recorded (or received by an investigator) in MP3 file format.In other words of those that can testify the integrity and authenticity of the audio.The article is based on real cases studies and represents the development of methods reported in [1].
Nowadays there are many scientific articles on forensic audio authentication.Review ar-ticles like [2][3][4] and also papers dedicated to particular methods of analysis are present.For example there is a set of articles dedicated to ENF analysis like [5][6][7][8][9] and focused on background noise research like [10].Also, acoustic environment analysis [11] and microphone identification [12] are well studied.Papers like [13,14] are dedicated to investigation of MP3, being focused on MDCT-analysis.
In this article we assume that the recording device which was used to create the audio should also be submitted for examination, though in practice it is often not the case.
This article captures those traces and aspects of MP3 recordings authenticity analysis that were detected and extracted during several years of expert practice and aims to create the source for further practical work.

Research
During the research traces of MP3 coding were divided into six groups (corresponding to the influence on different aspects of audio recording and coding).
1.Text content of MP3 files headers.The following sections describe each of these seven types of traces.

Text content of MP3 files headers
The information listed in MP3 file header can be and must be used for the means of authenticity analysis because it contains the metadata filled by the device coder and other features.The match of file header structure and content of header fields for evidential recording and testing recording is an important fact that should be mentioned in the expert's report.
The list of header fields and possibilities of their content are well known [2].The most interesting are the following.
In some of audio recordings created by Sony voice recorders "TENC" block can con- tain "SONY IC RECORDER MP3…"; Panasonic: "PanaICR", the devices' model and ID and so on.
These should be checked and compared to those in test recordings.
Besides, information about LAME codec (name and version) can be located inside data field of MP3 files.
Revealing of such circumstances helps to prove the match of evidence audio file header structure and content to those of the test files recorded with the device used to create the evidence.The difference of these features testify that the recording was re-coded and/or edited.

Traces of MP3 coding in time domain (detecting traces of repeated or "double"
MP3 coding) In case when original MP3 recording was converted and edited (by means of any software sound editor) it has to be converted back to correspond to original file format -this leads to "repeated" or "double" conversion.
Previous researches in this area showed good results only when the final bitrate was greater than the initial.If the expert has access to the recording device, the practice shows it is possible to detect double conversion in time domain.
Usage of some of MP3 codecs leads to appearance of a short fragment (usually less than 1 second) that does not reflect the real audio events.The amplitude of this fragment (Starting Coder fragment -SC-fragment) is quite low (sometimes it even contains consequent zero samples) so its border is clearly seen thanks to the rise of amplitude and appearance of actual audio environment in the signal.
Individual features of Starting Coder fragment (duration, dynamic amplitude characteristics) are different for different codecs and even for different modes of the same codec.Every next coding adds new SC-fragment to the signal.Thus, having a test recording from the investigated device, the expert can compare the SC-fragment features in test recording and in the evidential recording.
Following experiment reveals traces of double coding: Microsoft PCM file was converted to MP3, decoded back to WAV and converted to MP3 again.The result of such experiment with LAME 3.98.4codec is shown on Figure 1.
The duration of SC-fragment for LAME 3.98.4codec (11025 Hz sampling rate) is 1105 samples.Right border of SC-fragment is clearly seen and can be easily detected during accurate visual waveform analysis (Figure 2).SC-fragment of some digital recorders (suppoting MP3 file format) are very individual (Figure 3).This fact can be used to identify the recorder and, comparing to test recordings, prove the authenticity of the recording in question.
Thus, SC-fragment can be used for the means of authenticity analysis: the duration and amplitude dynamic shape of SC-fragment in test and evidential recordings must match to prove the integrity and authenticity of the recorded audio information.Double duration of SC-fragment corresponds to double coding which should have understandable reasons.

Traces of MP3 coding in spectral domain
Average spectra of investigated recordings are traditionally used to reveal traces of resampling or re-coding of the signal.In addition, the average spectrum carries information about frequency response of the recording channel and can be used for device identification.The analysis of these characteristics in case of MP3 recordings should consider the traces of the codec in spectral domain because it contains features similar to those mentioned above.
Figure 4 represents average spectrum of the recordings after it was coded with MPEG1 Layer II.The spectrum has two roll-offs (at 8 000 Hz and 16 000 Hz) which are typical for resampling traces.But in this case these characteristics have origin in codecs' frequency response limitations.The dynamic spectrogram (Figure 5) finely resolute different fragments of the signal that were coded with different frequency response limitations, though the recording is continuous.These fragments give different contribution to the average spectrum.
Average spectra of different fragments of the same recording are represented in Figure 6.Blue curve represents the fragment where the speech signal is rather quiet (so the codec leaves spectral components onto 15 000 Hz). Red curve corresponds to the fragment where the speech is comparatively loud (so "there is  no reason" for the codec to save upper components).Thus, the codec with the same settings can demonstrate different frequency range limitations depending on the energy of different components of the signal.
The particular characteristics of the codec implemented in the recording device should be revealed during device examination.Test recordings must differ not only by technical characteristics (bitrate, sampling rate, "quality") but also by audio environment of test recordings to represent the codec behavior in different acoustics.Test recordings by audio environment (noises, geometry of the room, etc.) must match the investigated recording situation to represent particularities of the codecs' impact.Only after establishing the codecs' influence in spectral domain the investigation can be proceeded to "traditional" types of analysis and interpretation.

MPEG frames header analysis
Each MPEG frame contains 4 bytes header.The table below represents MPEG frame header fields with description and commentaries.There are three frame header fields that can give additional information for comparative analysis of evidential and test recordings.CCITT J.17 There are three frame header fields containing information that do not influence the coding itself and store the information about the file: Private_Bit, Original_Bit and Copyright_Bit.
The values of these bits are set by codec depending on the recording settings and should be used for files comparison.There are 8 possible combinations that should match for both evidential and test recordings.It should be taken in consideration, of course, that those fields can be changed by the means of some software or fixed during re-coding.

Frame allocation map of MP3 files
In MP3 files with sampling rates 11025, 22050 and 44100 Hz the frames with Pad-ding_Bit and without it are queued in different sequences (with a period of 49 frames).The sequences can be finely detected over frame allocation map. Figure 7 represents frame allocation maps of MP3 files with 44100 Hz sampling rate and different bitrates (smaller frames are marked dark-grey).Files were conducted with Mp3Pro codec.
The periodicity of 49 frames is calculated from the frame size.For 64 kbps bit rate and 44100 Hz sampling rate the frame size is calculated by the following expression: where 1152 -frame shift in samples for MPEG-1 Layer III, 64 000 -bit rate (b/s), 44100 -sampling rate (Hz).
The option of frame size variation on 1 byte was introduced to provide fixed bit rate of audio stream.Thus, in sequenсe of 49 frames the size of 48 frames will be 209 bytes and the size of 1 frame will be 208 bytes.
For 64 000 bit/s and 11025, 22050 Hz the frame sizes are: The match of frame sizes for 22050 and 44100 Hz sampling rates corresponds to the fact that frame size of MPEG1 is twice bigger than MPEG2.
The frame size and analysis window size can mismatch.In MPEG1 Layer III the frame contains information about two coding windows ("granules" in standards' description); in MPEG2 and 2.5 Layer III the frame contains information only about one window.
Thus for 44100 Hz sampling rate and 64 kbps bit rate in 49 frames sequence there should be 48 frames with additional byte and 1 frame without additional byte.Figure 8   It is important to mention, that frame allocation map can represent any other features that are different for frames.So the frame allocation map itself is a powerful tool for any frame based codec investigation (e.g.see section 2.6).
The frame sequence can be described with a formula: if N is for frames without Padding_Bit and P -for frames with it, the description for different coders can be expressed as 48P + N for PanaICR, 24P + N + 24P for Mp3Pro, N + 48P for LAME 3.98.PN-formula for the same codecs must match.
Thus, PN-formula and frame allocation map can be used to identify the codecs used to create MP3 recordings.During authenticity analysis of an MP3 recording it is important to compare PN-formulas and frame allocation maps for evidential and test recordings.These features must correspond to the circumstances of the case: PN-formula and frame allocation map of LAME 3.98 (used mostly during coding with PC) differs from those implemented in digital recorders.
Stereo modes of MP3 recording.There are four main modes of stereo in MP3 codecs: • Mono -one channel signal; • Dual channel -two channels are coded independently with the same bit rate; • Stereo -signals of two channels are coded with different bit rates, but the sum of bit rates is constant; • J-Stereo -signals of two channels are coded together with two different extensions: o MS Stereo (mide/side); o Intensity stereo.
In MS Stereo stereosignal is derived from average between channels (up to a factor of (L + R) and differential (up to a factor of (L -R).The bit rate for the "average" signal is greater than for the "differential".So the same general bit rate provides better coding quality (for fragments that has the same phase for left and right channels).
In Intensity stereo mode average signal and differences of intensity by ranges are coded only, so the processed data volume decreases.Intensity stereo mode is usually used for low bit rates.
In J-Stereo each frame can have its own mode extension due to the parameters of the signal.Figure 9 represents frame allocation map where dark regions correspond to frames coded with MS Stereo extension, light frames -without it.
Stereo coding mode can be significant for authenticity analysis.Mismatch of stereo coding mode, listed in the frame header and reflected on the frame allocation map can indicate at least general editing or re-coding.Clarification of stereo coding mode can be achieved by subtraction of channels: • If left and right channels are equal the difference between them is zero; • Difference of Dual channel or Stereo coded signals does not have any peculiarities; • Difference of J-Stereo coded signal contains artifacts, represented in Figure 10.
During MP3 recording investigation the attention should be paid to correspondence of stereo coding mode listed in file header to its features, reflected in the signal and frame allocation maps.For example: • Listed "Stereo" or "Dual channel" -traces of J-Stereo in channels subtraction; These examples prove either re-coding or recorder feature, that should be established during device analysis.
Thus, investigation of stereo MP3 recordings should include analysis of stereo recording mode and its properties that can be traced by signals themselves or its frame allocation map.

Frame Offsets Check Method
Psychoacoustic codecs approach is based on the features of human hearing -the codec deletes spectral components that cannot be heard due to frequency masking.During its workflow the coder calculates signal spectrum on a frame of 1152 samples (frame shift = ½ 1152 = 576 samples); spectrum components that cannot be heard by human's hearing system are deleted (set to zero); spectral data left is stored into a memory block.These referral frames reconstruct decoded signal (during playback and visualization of the waveform).
If a test frame of 1152 samples is taken and number of zeros in MDCT spectrum is calculated being moved for one sample it provides with a Number-of-Zeros in MDCT time dependence graph.
Sharp peaks in the graph reveal to MP3frames position; average flat level correspond to test frames' positions between frames.The behavior of the graph reflects the periodicity of the frames positions and reveals editing points.
Traces of frame coding of the signal (MP3)the distance of 576 samples correspond to the 50 % frame shift during coding.
Traces of editing: creation of the audio from fragments of different MP3 files.The described procedure allows to calculate the periodical shift between positions of two consequent frames (Figure 11).Using this method the expert can solve various tasks: • Detection of MP3 codec traces; • Detection of original sampling rate; • Detection of double coding (with the same or other codec).
And the main thing is the detection of editing traces which were made after signal decoding [115,126].

Conclusions
Suggested investigation methods enhance technical analysis of MP3 recordings and provide more efficient use of contemporary methods and means for forensic audio examination.
During investigation the expert can determine MP3 files' characteristics and properties and their correspondence to characteristics and properties of test files, created by the device used for recording of the audio evidence.
To ensure the completeness of the examination, following features, properties and/or characteristics should be subjected to careful analysis: • Text content of header fields of MP3 file; • Starting Coder fragment presence and its duration; • Average spectrum and changing of the upper border of frequency range depending on audio environment; • Frame sequence, PN-formula and frame allocation map; • Stereo coding mode.
Researches reflected in this paper broaden methodological framework and technical scientific base used for forensic audio authenticity analysis and corrrespond to current stateof-the-art and experience in the field of audio forensics.

Table .
MPEG frame header structure Таблица.Структура заголовка фрейма формата MPEG protection_bit -to indicate whether redundancy has been added in the audio bitstream to facilitate error detection and concealment.Equals '1' if no redundancy has been added, '0' if redundancy has been added 5 4 bit_rate_index -indicates the bitrate.The all zero value indicates the 'free format' condition, in which a fixed bitrate which does not need to be in the list can be used.Fixed means that a frame contains either N or N+1 slots, depending on the value of the padding bit.The bit_rate_index is an index to a table, which is different for the different layers.6 2 sampling_frequency -indicates the sampling frequency, according to the following table.
-if this bit equals '1' the frame contains an additional slot to adjust the mean bitrate to the sampling frequency, otherwise this bit will be '0'.Padding is only necessary with a sampling frequency of 44.1 kHz.For MPEG Layer III this bit is used for sampling frequencies 11025, 22050 and 44100 Hz (can be '0' or '1').For all other sampling frequencies it is '0'.81 private_bit -bit for private use.Is not used for coding.