The CirCor DigiScope Dataset: From Murmur Detection to Murmur Classification

Cardiac auscultation is one of the most cost-effective techniques used to detect and identify many heart conditions. Computer-assisted decision systems based on auscultation can support physicians in their decisions. Unfortunately, the application of such systems in clinical trials is still minimal since most of them only aim to detect the presence of extra or abnormal waves in the phonocardiogram signal, i.e., only a binary ground truth variable (normal vs abnormal) is provided. This is mainly due to the lack of large publicly available datasets, where a more detailed description of such abnormal waves (e.g., cardiac murmurs) exists. To pave the way to more effective research on healthcare recommendation systems based on auscultation, our team has prepared the currently largest pediatric heart sound dataset. A total of 5282 recordings have been collected from the four main auscultation locations of 1568 patients, in the process, 215780 heart sounds have been manually annotated. Furthermore, and for the first time, each cardiac murmur has been manually annotated by an expert annotator according to its timing, shape, pitch, grading, and quality. In addition, the auscultation locations where the murmur is present were identified as well as the auscultation location where the murmur is detected more intensively. Such detailed description for a relatively large number of heart sounds may pave the way for new machine learning algorithms with a real-world application for the detection and analysis of murmur waves for diagnostic purposes.


I. INTRODUCTION
Cardiovascular disease (CVD) is an umbrella term used to define a heterogeneous group of disorders of the heart and vessels, such as coronary artery disease (CAD), valvular heart disease (VHD), or congenital heart disease (CHD) [1].
As a whole, CVD is the major cause of death worldwide, accounting for 31% of all deaths globally [2]. In addition to mortality, CVD severely increases morbidity and causes lifelong disabilities, decreasing the quality of life and potently increases the frequency of hospital admissions, ultimately increasing the economic burden of CVD in healthcare systems and populations [3].
The majority of the populations in under-developed and developing countries do not have access to an integrated primary healthcare system, as a result diagnosis and treatment of CVD can be delayed, thus potentially leading to early deaths. Furthermore and according to [4], CVD contribute to impoverishment due to health-related expenses and high outof-pocket expenditure. This imposes an additional burden on the economies of low-to-middle-income countries. Although CVD incidences in the United States and Europe are high [5], [6], over 75% of CVD-related deaths are estimated to occur in low-and middle-income countries, suggesting a strong link with the socio-economic status [7]. In contrast to developed countries, where CAD is more frequent. The CHD and VHD (mainly of rheumatic etiology) are more prevalent in developing countries, primarily due to the lack of prenatal screening programs and access to healthcare. Over 68 million cases of rheumatic heart disease (RHD) are reported every year, resulting in 1.4 million deaths. RHD is also the single largest cause of hospital admissions for children and young adults, with observations showing a high prevalence of the disease in young adults [8].
Despite the development of advanced cardiac monitoring and imaging schemes, cardiac auscultation remains an important first-line screening and cost-effective tool. Cardiac auscultation provides insights of the mechanical activity of the heart. These activities generate sounds that are recorded and saved in the phonocardiogram (PCG) signal. Furthermore, cardiac auscultation is a key screening exam [9] and when used properly, it can speed up treatment/referral, thus improving the patient's outcome and life quality [10]. However, auscultation is a difficult skill to master, requiring long and hard training accompanied by continuous clinical experience. Moreover, the emergence of new imaging techniques and the significant decrease of valve diseases in developed countries reduced the clinical application of cardiac auscultation [11]. As a result, auscultation has been recently neglected, and a great reduction of this clinical skill has been observed over the last decades [10]. In underprivileged countries, the scenario is similar, with a lack of trained professionals with these skills and the resources to train them. This motivates the use of computer-aided decision systems based on auscultation to accelerate the timely diagnosis and referral of patients to specialized treatment centers [12].
A fundamental step in developing computer-aided decision systems for CVD screening from cardiac auscultation consists in collecting large annotated datasets of heart sounds that can properly represent and characterize murmurs and anomalies in patients with different CVDs. The need for such datasets is specifically crucial when considering the design of modern data-hungry machine learning techniques.
Currently available annotated public PCG datasets have limited scope such that they either provide limited information regarding a general evaluation of the heart sound (normal vs. abnormal) [13], or the presence/absence of abnormal sounds (e.g., murmurs, clicks, extra sounds) [14]. Few attempts have been devoted to providing large public datasets with richer characterizations of heart sounds (see e.g., [15]). However, in these cases, heart sounds are only classified based on the underlying cardiac conditions without providing a full characterization of the specific sound signatures encountered in each recording.
The hereby presented dataset aims to address these limitations, thus providing the necessary means to develop novel computer-assisted decision systems, which are able to offer a rich characterization of heart sound anomalies. This is accomplished by collecting a large set of sounds and characterizing them using the same scales and parameters commonly used in clinical practice. The presented dataset was collected during two independent cardiac screening campaigns in the Pernambuco state, Brazil. These campaigns were organized to screen a large pediatric population in Northeast Brazil and to provide the basis for the further development of a telemedicine network [16], [17].
The gathered dataset includes the characterization of each murmur in the recorded heart sounds from different perspectives, including timing, pitch, grading, shape and quality. Moreover, multi-auscultation location recordings are provided for each patient, alongside with the corresponding annotations, which are helpful for inferring the specific source of each murmur [18].
The remainder of this paper is organized as follows. In Section II, a background on cardiac auscultation is provided. In Section III, a brief review of publicly available heart sound datasets is provided. In Sections IV and V, methods and the data annotation process are presented. Section VI is devoted to a discussion on the obtained results. Concluding remarks and future perspectives are drawn in Section VII.

A. Cardiac Auscultation
Normal heart sounds are primarily generated from the vibrations of cardiac valves as they open and close during each cardiac cycle and the turbulence of the blood into the arteries. The anatomical position of heart valves relative to the chest wall defines the optimal auscultation position; as such, a stethoscope should be placed at the following positions [19], for auscultation purposes (as illustrated in Figure 1): (1) Blood flowing through these structures creates audible sounds, which are more significant when the flow is more turbulent [1]. The first heart sound (S1) is produced by vibrations of the mitral and tricuspid valves as they close in at the beginning of the systole. S1 is audible on the chest wall, and formed by the mitral and tricuspid components [21]. Although the mitral component of S1 is louder and occurs earlier, under physiological resting conditions, both components occur closely enough, making it hard to distinguish between them [19], an illustration of a S1 sound is provided in Figure 2. The second heart sound (S2) is produced by the closure of the aortic and pulmonary valve at the beginning of the diastole. S2 is also formed by two components, with the aortic component being louder and occurring earlier than the pulmonary component (since the pressure in the aorta is higher than in the pulmonary artery). In contrast, unlike S1, under normal conditions the closure sound of the aortic and pulmonary valves can be discernible, due to an increase in venous return during inspiration which slightly delays the pressure increase in the pulmonary artery and consequently the pulmonary valve closure [19], [22], an illustration of a S2 sound is provided in Figure 2. Sys Dias S1 S2 S1 S2 S1 PCG signal Fig. 2. An example of a normalized heart sound recording, the position of the fundamental heart sounds S1 and S2 are displayed and identified. Furthermore, the Systolic (Sys) and the Diastolic (Dias) periods are also displayed and identified.

B. Clinical Applications
Apart from S1 and S2, additional heart sounds resulting from a turbulent blood flow and rapid movements (closures, openings and resonance) of cardiac structures, and are often associated with murmurs, "clicks" and "snaps" [23]. Most cardiac murmurs are closely associated with specific diseases such as septal defects, failure of the ductus arteriosus closure in newborns, or defective cardiac valves [24], [25]. These extra sounds may be associated to physiological or pathological conditions, depending on their timing in the cardiac cycle, intensity, shape, pitch, auscultation location, radiation, rhythm, and response to physical exam manoeuvres [19]. Auscultation is crucial in identifying and distinguishing these features. The ability to accurately describe a murmur can determine whether or not to refer a subject to a cardiologist by identifying specific patterns, isolated or in association with cardiopathies [23]. In the next paragraph, the murmur description of common valve diseases will be described.
1) Aortic stenosis: Aortic stenosis is a narrowing of the aortic valve and often as a result of valve calcification. Furthermore, aging, chronic RHD or a congenital bicuspid aortic valve are some risk factors. During auscultation, Aortic stenosis generates a harsh crescendo-decrescendo systolic murmur heard best at the right upper sternal border (RUSB), with murmur radiating to the carotid arteries [23].
2) Aortic regurgitation: Aortic regurgitation results from an inefficient Aortic valve closure, allowing the blood to flow back into the right ventricle in a turbulent manner, usually due to aortic root dilation, bicuspid aortic valve, and calcified leaflets. During auscultation, Aortic regurgitation generates a decrescendo-blowing diastolic murmur, heard best at the left lower sternal border (LLSB) [23].
3) Mitral stenosis: Mitral stenosis is a narrowing of the mitral valve, making blood passing from the left atrium to the left ventricle more difficult. It is often due to chronic RHD and infective endocarditis. During auscultation, Mitral stenosis generates a diastolic murmur, best heard at the apex [23]. 4) Mitral regurgitation: Mitral regurgitation allows the blood to flow back to the left atrium. This is mainly due to defective closure of the leaflets or rupture of the chordae tendineae in patients with infective endocarditis, chronic RHD disease, degenerative valve disease, or myocardial infarction. In auscultation, Mitral regurgitation results in a systolic murmur, best heard at the apex, with radiation to the left axilla [23]. 5) Mitral valve prolapse: Mitral valve prolapse is characterized by a bulge of the leaflets into the left atrium, stopping the valve to close evenly. It can be explained by idiopathic myxomatous valve degeneration, RHD, endocarditis, and Ebstein's anomaly. In auscultation, Mitral valve prolapse causes an early systolic click heard best at the apex, it is often followed by a late systolic murmur [23]. 6) Pulmonary stenosis: Pulmonary stenosis is a narrowing in the pulmonary valve. It is often present in patients with Tetralogy of Fallot, although it may also be present in patients with chronic RHD, congenital rubella syndrome, or Noonan syndrome. Its auscultation is described as a crescendodecrescendo systolic ejection murmur, heard loudest at the upper left sternal border (LUSB) [23]. 7) Tricuspid stenosis: The tricuspid stenosis is a narrowing in the Triscupid valve. It is often due to intravenous drugs used in infective endocarditis and in carcinoid syndrome treatments. In auscultation, Tricuspid stenosis results in a diastolic murmur, best heard at the LLSB [23]. 8) Tricuspid regurgitation: Tricuspid regurgitation is usually caused by vegetative growth under the leaflets, causing it to degenerate and rendering them incompetent, allowing blood to flow back to the right atrium. It is often explained by infective endocarditis and carcinoid syndrome. Its auscultation, Tricuspid regurgitation is described as a systolic murmur best heard at the LLSB [23]. 9) Septal defects: Septal defects are congenital in nature and are defined as "ruptures" or "discontinuities" in the interatrial septum (atrial septal defects) or in the interventricular septum (ventricular septal defects), allowing the blood to freely flow between them and mix. The auscultation of the atrial septal usually presents as a loud and a wide S1, fixed split S2 heart sound, loudest at the LUSB. Ventricular septal defects often generates a holosystolic murmur, best heard at the apex. Smaller defects are louder and have a harsher quality while large ones are quieter but more symptomatic. Therefore, atrial and ventricular septal defects in children can produce progressively louder murmurs as they close [23].
10) Hypertrophic obstructive cardiomyopathy: Hypertrophic obstructive cardiomyopathy is an inherited myocardial disease in which the myocardium undergoes hypertrophic changes. During auscultation, hypertrophic obstructive cardiomyopathy presents a systolic ejection murmur, heard best between the apex and the left sternal border (LSB) and becoming louder with Valsalva or abrupt standing maneuvers [23]. 11) Patent ductus arteriosus: In patent ductus arteriosus, a channel fails to close after birth, establishing a shunt that allows oxygenated blood from the aorta to flow back to the lungs via the pulmonary artery. Its auscultation is described as a continuous "machine-like" murmur, loudest at the LUSB [23].

III. AVAILABLE HEART SOUND DATASETS
In this section, we briefly present heart sound datasets that are open to the scientific community. We consider datasets that satisfy the following criteria: a) full availability of data for scrutiny; b) online accessibility; c) relevance to the current study; d) include relevant information regarding the study population (e.g., size, age, and/or gender); and e) include relevant information regarding the audio recordings (e.g., number, duration, or collection spots). With these criteria, we selected six datasets that will form the basis of our analysis. A summary of the selected datasets are presented in Table I. Furthermore, some features from the proposed dataset are also presented in last row of Table I, for comparison reasons. From this table, it is observed that the existing heart sound datasets have significantly different sampling frequencies. While the informative spectra of the PCG is below 1 kHz (with its power spectrum mainly concentrated below 500 Hz), the diversity in the sampling frequency of existing datasets is mainly due to the audio devices and computer-based sound cards used for PCG digitization. In fact, since the heart sounds are in the audio frequency range, many research teams have preferred to use high-fidelity audio sound cards, which sample audio signals at standard rates such as 4 kHz, 8 kHz, 11025 Hz, 22050 Hz, 44100 Hz, 48 kHz, etc. In fact, none of these rates are specific to the heart sound, but for historical reasons, they are audio rates supported by standard commercialized audio devices and software [26,Sec. 4.5]. However, from the signal processing and machine learning perspective, oversampling way above the Nyquist rate (twice the maximum frequency of the desired signal) does not provide additional information regarding the signal and would only increase the audio length and processing load. Based on this fact, with an appropriate anti-aliasing analog filter that avoids "spectral folding" effects [27], a sampling frequency of 2 kHz to 4 kHz is fully adequate for all human-and machine-based heart sound diagnosis.

A. Pascal Challenge Database [14]
This database comprises of two datasets. Dataset A was collected from an unreported size population, through a smartphone application. Dataset B was gathered using a digital stethoscope system deployed in the Maternal and Fetal Cardiology Unit at Real Hospital Português (RHP) in Recife, Brazil. It contains a total of 656 heart sound recordings from an unknown number of patients. The duration of the PCG signals range from 1 to 30 seconds and were collected at a sampling rate of 4000 Hz. The sounds were obtained from the apex point of volunteer subjects in the Dataset A. In the Dataset B, sounds were collected from four cardiac auscultation locations on healthy and unhealthy children. Up to our knowledge, no additional data regarding the auscultation location is provided.
The sounds were recorded in clinical and non-clinical environments, and divided into normal, murmur, extra heart sound, and artifact classes in Dataset A. In Dataset B, the sounds were divided into normal, murmur, and extra systole classes. Furthermore, in Dataset B, the positions of fundamental heart sounds were manually annotated by clinicians.
B. PhysioNet/CinC Challenge 2016 Database [28] This database merges nine independent databases: the Massachusetts Institute of Technology heart sounds database (MITHSDB), the Aalborg University heart sounds database (AADHSDB), the Aristotle University of Thessaloniki heart sounds database (AUTHHSDB), the Khajeh Nasir Toosi University of Technology heart sounds database (TUTHSDB), the University of Haute Alsace heart sounds database (UHAHSDB), the Dalian University of Technology heart sounds database (DLUTHSDB), the Shiraz University adult heart sounds database (SUAHSDB), the Skejby Sygehus Hospital heart sounds database (SSHHSDB), and the Shiraz University fetal heart sounds database (SUFHSDB).
As part of the PhysioNet/CinC 2016 Challenge, the data has been divided into training and testing sets, and it contains a total of 2435 heart sound records from 1297 patients. The duration of the PCG signals ranges from 8 to 312.5 seconds. Since the data was collected using different devices with different sampling rates, each PCG signal has been downsampled to 2000 Hz [13] 1 . The sounds were obtained from four different auscultation locations (aortic, pulmonary, tricuspid, and mitral) on both healthy (normal) and pathological (abnormal) subjects, with a variety of diseases including heart valve diseases and coronary artery diseases. The training and test sets are unbalanced, with the number of normal records being greater than abnormal records. The sounds were recorded at clinical and non-clinical environments, and divided into normal, abnormal and unsure classes. With the exception of the unsure class, annotations of the positions of fundamental heart sounds were provided for all records. Such annotations were provided by applying an automatic segmentation algorithm [33] and manually reviewed and corrected afterwards.

C. HSCT-11 (2016) [29]
Spadaccini et al. published an open access database on heart sounds. The dataset was designed to measure the performance of biometric systems, based on auscultation. This dataset is composed of 206 patients and a total of 412 heart sounds, collected from four auscultation locations (mitral, pulmonary, aortic, tricuspid). The data was recorded using the ThinkLabs Rhythm digital electronic stethoscope, with a sampling rate of 11025 Hz and a resolution of 16 bits per sample. No information is provided regarding the health condition of each subject.  [32], [35] The electro-phono-cardiogram (EPHNOGRAM) project focused on the development of low-cost and low-power devices for recording sample-wise simultaneous electrocardiogram (ECG) and PCG data. The database, has 69 records acquired from 24 healthy young adults aged between 23 and 29 (average: 25.4±1.9) in 30 minute stress-test sessions during resting, walking, running and biking conditions (using indoor fitness center equipment). The synchronous ECG and PCG channels have been sampled at 8 kHz with a resolution of 12 bits (with 10.5 effective number of bits). In some records, the environmental audio noises have also been recorded through additional auxiliary audio channels, which can be used to enhance the main PCG channel signal quality through signal processing. This data is useful for simultaneous multi-modal analysis of the ECG and PCG, as it provides interesting insights about the inter-relationship between the mechanical and electrical mechanisms of the heart, under rest and physical activity. No manual segmentation has been made, but a MATLAB code is provided that automatically and accurately detect all the R-peaks from the ECG. Such peaks can be used to locate the S1 and S2 components of the PCG.

IV. METHODS
The presented dataset was collected as part of two mass screening campaigns, referred to as "Caravana do Coração" (Caravan of the Heart) campaigns, conducted in the state of Paraíba, Brazil between July and August 2014 (CC2014) and June and July 2015 (CC2015), see Figures 3 and 4. The data collection was approved by the 5192-Complexo Hospitalar HUOC/PROCAPE institution review board, under Real Hospital Portugues de Beneficiencia em Pernambuco request. The CC2014 and CC2015 screening campaigns were promoted by a non-governmental organization (NGO) from Pernambuco, "Círculo do Coração" (Heart Circle -CirCor) and funded by the Health Secretary of the neighbouring State, Paraíba. These caravans occurred during a seven-year partnership program, which established the Pediatric Cardiology Network, an integral line of care -from screening to surgery and follow-up -for under-served children with heart diseases in Paraíba. The collected dataset is provided online on PhysioNet [18]. Note that, in order to manage the organization of public data challenges on murmur grading and classification, only 70% of the gathered dataset was publicly released in the PhysioNet repository (randomly selected through stratified random sampling). The remaining 30%, which is currently held as a private repository for test purposes (as unseen data), it will be released on the same PhysioNet repository after the corresponding data challenge. The procedure of data gathering and the properties of the dataset are detailed in the following sections.

A. The Caravana do Coração Campaigns
The studied population included participants who volunteered for screening within the study period. Patients younger than 21 years of age with a parental signed consent form (where appropriate) were included. A total of 2,061 participants attended the 2014 and 2015 campaigns, with 493 participants being excluded for not meeting the eligibility criteria. Furthermore, 116 patients attended both screening campaign. In our database, these patients are identified and highlighted.
All participants completed a socio-demographic questionnaire and subsequently underwent clinical examination (anamnesis and physical examination), nursing assessment (physiological measurements), and cardiac investigation (chest radiography, electrocardiogram, and echocardiogram). Data quality assessment was performed and all entries were screened for incorrectly entered or measured values, inconsistent data, or for the presence of outliers, and omitted in case of any inconsistency. The resulting entries were then compiled in a spreadsheet to reflect the socio-demographic and clinical variables used in our dataset. Subsequently, an electronic auscultation was performed and audio samples from four typical auscultation spots were collected; all samples were collected by the same operator for the duration of the screening, in a real clinical setting. Two independent cardiac physiologists individually assessed the resulting PCG audio files for signal quality. As a result, 119 participants did not meet the required signal quality standards. In another words, the patient recordings do not allow a safe and trustworthy murmur characterization and description, see Table V .
The mean age (± standard deviation) of the participants is 73.4 ± 0.1 months, ranging from 0.1 to 356.1 months, as summarized in Table III. The majority of the participants attended the study with no formal indication (444; 27.0%), while (305; 18.5%) pre-  The mean weight and height (± standard deviation) of the sample is 24±15 kg and 111±29 cm respectively, with a mean BMI of 18 ± 7. The mean heart rate is 102 ± 20 bpm, ranging between 47 bpm and 193 bpm. A Mann-Whitney U hypothesis test on the heart rate distributions was performed, a p-value of 0.04 was observed, see Figure 8 for more details. The mean oxygen saturation (measured in the arm) is 95% ± 5%. With regards to tympanic temperature, the average temperature is 36.7 ± 0.4 C, ranging from 34.7 C to 39.2 C. Finally, the average systolic and diastolic blood pressure is 91±17 mmHg and 55 ± 11 mmHg, respectively. The maximum systolic and diastolic blood pressure is, respectively, 170 mmHg and 101 mmHg.

C. Heart Sounds and Annotations
The collected dataset includes a total number of 215780 heart sounds, 103853 heart sounds (51 945 S1 and 51 908 S2 waves) from CC2014 and 111927 (56449 S1 and 55478 S2 waves) from the CC2015.
In the CC2014 screening campaign, 540 recordings were collected from the Aortic point, 497 from the Pulmonary point, 603 from the Mitral point, 461 from the Tricuspid point and 5 from an unreported point. Between 1 to 10 records have been recorded per patient, with an average of 3.2 recordings per patient. In the CC2015 screening campaign, 817 recordings were collected from the Aortic point, 793 from the Pulmonary point, 812 from the Mitral point, 754 from the Tricuspid point, and one extra sound from an unreported point. Overall, between 1 to 4 records exist per patient, with an average of 3.5 recordings per patient.
The heart sound signals were collected using a Littmann 3200 stethoscope embedded with the DigiScope Collector [34] technology, an illustration is presented in Figure 5. A diagram of the graphical user interface is presented in Figure 6. The signal was sampled at 4KHz and with a 16-bits resolution. Furthermore. the heart sound signals are normalized within the [−1, 1] range. The PCG files from the CC2014 and CC2015 campaigns, had an average duration of 28.7 seconds and 19.0 Fig. 6. Screenshots of the DigiScope Collector's Graphical User Interface (GUI), during the different data acquisition stages. In the left panel, the patient creation layout; in the middle panel, clinical data from the patient is inserted; in the right panel, acquisition of a heart sound signal, from the Pulmonary spot. For more details see [37]. seconds, respectively. A Mann-Whitney U hypothesis test on the signal duration distributions was performed, a p-value 0.001 was observed. For more details refer to Figure 7. Murmurs were present in 305 patients within the collected dataset. Out of these, 294 patients had only a systolic murmur, 1 patient had only a diastolic murmur, and 9 patients had both systolic and diastolic murmurs, as summarized in Table V.

V. DATA LABELLING
The acquired audio samples were automatically segmented using the three algorithms proposed by [33], [30] and [38]. These algorithms detect and identify the fundamental heart sounds (S1 and S2 sounds) and their corresponding boundaries. As a result, a set of annotation recommendations for each heart sound signal were provided to the two cardiac physiologists. The two cardiac physiologists independently inspected the resulting algorithms' output on mutually exclusive data. In other words, they did not over-read each other's annotations. Nevertheless, experts were free to use or neglect the generated recommendations. The expert first inspected the automatic segmentation annotations and re-annotated the misdetections. For each heart sound signal, the expert started by randomly selecting one of the corresponding annotation recommendation, and one of two actions follows: • In case of agreement with the algorithm's segmentation output, the audio file and its corresponding annotation file were included in the dataset. • In case of disagreement, the expert randomly opens another recommended annotation and the procedure is repeated. If the expert disagreed with all recommended annotations, then he/she proceeded to manually segment at least five heartbeat cycles. The new audio file and its corresponding annotation file were then saved.
Labels were retains for sections of the data for which the cardiac physiologists indicated were high quality representative segments. The remainder of the signal may include both low and high quality data. In this way, the user is free to use (or not) the suggested time window, where the quality of signal was manually inspected and the automated labels were validated.
This methodology was applied to both the 2014 and 2015 screening campaigns, resulting in two independent folders (CC2014, CC2015), as present on the project repository on PhysioNet [18].
The audio and the corresponding segmentation annotation file names are in ABCDE_XY.wav format, where ABCDE is a numeric patient identifier and XY is one of the following codes, corresponding to the auscultation location where the PCG was collected on the body surface: PV corresponds to the Pulmonary point; TV corresponds to the Tricuspid point; AV corresponds to the Aortic point, MV corresponds to the Mitral point, and finally PhC for any other auscultation location. If more than one recording exists per auscultation location, an integer index proceeds the auscultation location code in the file name, i.e, ABCDE_nXY.wav, where n is an integer. Furthermore, each audio file has its own corresponding annotation segmentation file. The segmentation annotation file is composed by three distinct columns: the first column corresponds to the time instance, where the wave was detected for the first time; the second column correspond to the time instance, where the wave was detected for the last time; the third column correspond to an identifier, that uniquely identifies the detected wave. All the collected heart sound records were also screened for presence of murmurs at each auscultation location. Each murmur was classified (at the auscultation point where it is most audible according to the details in Section II-B) ac- cording to its timing (early-, mid-, and late-systolic/diastolic) [39], shape (crescendo, decrescendo, diamond, plateau), pitch (high, medium, low), quality (blowing, harsh, musical) [39], and grade (according to Levine's scale [40]). Since not all patients have auscultation sounds recorded from all the four main auscultation locations, the strategy adopted to provide grading annotations is described in Table VI Accordingly, the grade annotations can diverge from the original definition of murmur grading, when applied to cases for which not all the auscultation locations are available. In such cases, murmurs were classified by default as grade I/VI. Moreover, the cases classified as grade III/VI, actually include murmurs that could potentially be of grade III/VI or higher, since discrimination among grades III/VI, IV/VI, V/VI, and VI/VI are associated with palpable murmurs, also known as a thrills [40], which can only be assessed via physical in-person examination. A cardiopulmonologist manually classified and characterized murmur events blindly, and independent of other clinical notes.
The sounds were recorded in an ambulatory environment. Different noisy sources have been observed in our dataset, from the stethoscope rubbing noise to a crying or laughing sound in the background. Thus, the automatic analysis of CirCor DigiScope dataset is indeed a hard task. On the other hand, the proposed dataset is a representative sample of real case environments where CAD systems must operate.

VI. DISCUSSION
Given the large number of participants from the CC2014 and CC2015 screening campaigns, the CirCor DigiScope dataset is a representative sample of the rural and urban populations in Northeast Brazil, Paraíba. The dataset focuses on a pediatric population, 63% of the patients were children, 20% of the patients were infants. Furthermore, and due to the challenges associated with the care of individuals with complex CHD, exceptionally, a few young adults that voluntarily asked to participate in the screening campaigns were also examined.
Overall, the screened population demonstrates a generally good clinical condition, with most clinical and physiological parameters within the normal range for the age. Despite the generally good clinical condition of the participants, a variety of congenital and acquired diseases have been found. Furthermore, not only simple cardiopathies, but also severe and complex cardiopathies that required specialist referral for advanced treatment were also found. In contrast with CC2014, in the CC2015 campaign, most participants were referred for follow-up (61.6%), while 32.5% were discharged. Seven patients (0.8%) had a severe pathology with, requiring surgery or intervention. Further details are provided in Table IV.
The high number of pathologies encountered within the screened population also translated into a significant number of patients presenting murmurs in their auscultations. Most of the murmurs were observed in the systolic period (96.8%), and only a few cases were reported in the diastolic period (3.2%), see Table V. This is due to the fact that the pressure gradient felt in the aortic and pulmonary valves, during the systolic ejection is very high when compared to the pressure gradient felt in the Mitral and Tricuspid valve, during the ventricular relaxation [1]. As a result, stenosis in the Aortic and Pulmonary valves are more often than in the Mitral and Tricuspid valves [41], [42], [43]. This is consistent with the observation that most of the complex congenital cardiopathies are observed in the systolic period [44]. Furthermore, innocent murmurs, which are very common in infant populations, are mostly observed at the beginning of the systolic period [44]. Finally, from the technical perspective the auscultation of diastolic murmurs is a difficult procedure, since these murmurs are faint and harder to detect [45]. For example, in the recordings of the patient with the identifier 49824 and 66421, early and middle diastolic murmurs were detected, respectively.
Regarding the murmur timing analysis, the most common murmur is Holosystolic, which is a murmur that lasts over the full systolic period, see Table V. This kind of murmur is commonly observed in ventricular septal defect pathologies [46]. Holosystolic murmur waves were detected, for example in the recordings of the patient with the identifier 49628. Early-systolic murmurs are also common in children, these murmurs happen immediately at the beginning of each systolic period and disappear shortly before the mid-systolic period. Usually such murmurs are innocent [47]. Early-systolic murmur waves were observed, for example in the recordings of the patient with the identifier 49691. Note that in the provided dataset, the murmur waves can be easily extracted by using the segmentation data (S1 and S2 wave locations) provided for each heart beat.
The majority of the detected murmurs have a Plateau shape, i.e., their intensities are approximately constant over time. Murmurs with such shapes can be associated to ventricular septal defect, a very common pathology in our database [46]. Murmur waves with a Plateau shape were detected, for example in the recordings of the patient with the identifier 50159. Note that the diamond-shaped murmurs (18.5%) are usually more prevalent within a pediatric population than murmurs with plateau shape (59.8%) [48]. The large amount of plateau murmurs observed in the dataset might be explained by the fact that digital filters can attenuate or modify the murmur shape. Murmur waves with a Diamond shape were observed, for example in the recordings of the patient with the identifier 50724.
Some innocent murmurs are "musical" in quality (1.3%, which is extremely a rare event) [49]. A "harsh" murmur is described by a high-velocity blood flow from a higher to a lower pressure gradient [50]. The term "harsh" is appropriate for describing the murmur in patients with significant semilunar valve stenosis or a ventricular septal defect (52.5%). Murmur waves with a Harsh quality were detected, for example in the recordings of the patient with the identifier 49628. A "blowing" murmur is a sound caused by turbulent (rough) blood flow through the heart valves. In the Apex, it might be related to a mitral valve regurgitation (46.2%) [50].Murmur waves with a Blowing quality were observed, for example in the recordings of the patient with the identifier 49754.
The variable "grade" refers to intensity, i.e., the loudness of the murmur. The location of the lesion and the distance to the stethoscope affects the listener's perception. Usually, louder murmurs (grade ≥ 3) are more likely to represent cardiac defects than silent ones [51]. Regarding systolic murmurs, the Levine scale is a numeric score used to characterize the intensity or the loudness of a murmur [40]. In the presented dataset, 27% of the systolic murmurs and 10% of the diastolic murmurs were grade III or greater, see Table V. Murmur waves greater or equal than grade III were detected, for example in the patient with the identifier 50308.
The analysis of the pitch is another important variable. The higher the pitch, the higher is the pressure gradient between the heart chambers. For example, aortic stenosis has a higher pitched murmur than a mitral stenosis. Furthermore, and in agreement with past observations, murmurs generated from a ventricular septal defect have usually a low pitch [46]. We should note however that pitch analysis from auscultation is not trivial or easy to describe; since filtering effects on a listener's perception are not completely characterizable. The pitch quality also varies across different stethoscopes. This is technically due to the difference between the transfer functions of different stethoscopes (and the preprocessing filters, in digital stethoscope front-end). This issue can be partially mitigated in digital auscultation, by designing digital equalizers that compensate for the gain losses and amplitude/phase distortions of the analog front-end. Equalizers can also be applied to make the auscultations sound more similar to their analog counterparts, which are more familiar for expert annotators. The theory and practice of sound equalization have been extensively studied in the context of audio signal processing [52], [53]. Low, medium and high pitch murmur waves were detected, for example in the recordings of the patient with the identifier 49693, 49825 and 49712, respectively.
The location where the systolic, diastolic or systolicdiastolic murmurs were detected with the highest intensity is also an important feature to analyze. A damaged valve usually generates a murmur louder in its corresponding auscultation area. This is mainly due to the proximity and vicinity between the auscultation point and the damaged valve. For example, a murmur caused by aortic stenosis is often best heard at the upper sternal border, and usually on the right side [23]. In this dataset, most of the murmurs are detected with a highest intensity in the pulmonary point, see Table V. Note that in infants and children (up to three years old), the pulmonary and tricuspid spots partially overlap and, in most cases, a single recording auscultation location is used instead [54]. Consequently, heart sounds collected from these locations in children can be very similar to each other. Therefore it is not surprising that the pulmonary and tricuspid are the best auscultation locations to detect cardiac murmurs in our dataset. For example, the patient with the identifier 50052, murmur waves were detected more intensively in the aortic spot.
The observed murmurs do radiate and spread homogeneously through the thorax (cf. Table V). Thus easily detected (almost equally) in many auscultation locations.
As a final remark, the CirCor DigiScope dataset is by far the largest publicly available heart sound dataset (5282 recordings), including recordings collected from multiple auscultation locations on the body. Moreover, the majority of the heart sounds (215,780 in total) were manually segmented and their quality was assessed by two independent cardiac physiologists. Furthermore, the study resulted in a very detailed murmur characterization and classification database (including timing, pitch, grading, shape, quality, auscultation location, etc.), which can be used in future research.

VII. CONCLUSIONS
The CirCor DigiScope dataset represents a unique cohort from a pediatric population, and a pregnant population, with significant congenital and acquired cardiac diseases. The age distribution is also homogeneously distributed in our dataset, thus potentially paving the way to the design of robust decision support systems for different target populations, from neonates to adults.
Given the rich annotations and characterizations provided with CirCor DigiScope dataset, the current work can be leveraged in various ways, including the design of multichannel systems for multi-site PCG analysis, detection and classification of heart murmurs; and the automatic generation of murmur reports. For future work, we intend to make a comparative study, concerning the segmentation and classification of heart sound signals, in the CirCor DigiScope dataset. In future screening campaigns a broader population from different age groups shall be screened for CHD. Furthermore, new data collection protocols and technologies are going to be proposed and developed, aiming to collect data from the four auscultation spot in a synchronous or asynchronous manner.

VIII. ACKNOWLEDGMENT
This work is a result of the Project DigiScope2 (POCI-01-0145-FEDER-029200-PTDC/CCI-COM/29200/2017) funded by Fundo Europeu de Desenvolvimento Regional (FEDER), through Programa Operacional Competitividade e Internacionalização (POCI), and by national funds, through Fundação para a Ciência e Tecnologia (FCT). This work is also financed by National Funds through the Portuguese funding agency, FCT -Fundação para a Ciência e a Tecnologia, within project UIDB/50014/2020. F. Renna also acknowledges national funds through FCT -Fundação para a Ciência e a Tecnologia, I.P., under the Scientific Employment Stimulus-Individual Call-CEECIND/01970/2017. A. Elola receives financial support by the Spanish Ministerio de Ciencia, Innovación y Universidades through grant RTI2018-101475-BI00, jointly with the Fondo Europeo de Desarrollo Regional (FEDER), and by the Basque Government through grant IT1229-19. G. D. Clifford is partly funded by the National Institute of Biomedical Imaging and Bioengineering (NIBIB) under grant # R01EB030362.
The authors would like to acknowledge the support and efforts of the 2014 and 2015 Caravana do Coração screening campaign members.