Evaluator effect on the ultrasound measurement of subcutaneous fat deposition and loin eye area from weaning to slaughter lambs

Background and Aim: Ultrasound is a non-invasive technique that enables animal evaluation and body condition classification of animals. Although it is not difficult to obtain an image, the analysis of this image can influence results quality. This study aimed to evaluate the repeatability and reproducibility of the technician trained in image interpretation obtained using different ultrasound frequencies. Materials and Methods: Ninety-six lambs were used, ranging in weight from 15 to 40 kg. The images were captured using a linear probe of 13 cm, with a 3.5 megahertz (MHz) frequency and an acoustic couplant aid “standoff” or using a multifrequency transducer (6 and 8 MHz), on B mode, with a linear probe and 8.0 MHz frequency. All measurements were performed by the same technician, on the left side, between the 12th and 13th rib. Five different evaluators, at two different times, with aid of Image J software measured the loin eye area (LEA; only for images obtained with 3.5 MHz), Longissimus thoracis et lumborum depth (DLM), subcutaneous fat thickness (SFT), subcutaneous fat thickness plus skin (SFST), and skin thickness (ST). Results: For LEA, DLM, SFT, SFST, and ST, variation was observed (p<0.01) between evaluators; however, there was no difference (p>0.05) between the 2 times of evaluation. Images measurements obtained with a frequency of 8.0 MHz had better repeatability indices and reproducibility indices. Accordingly, the identity test demonstrated that measurements performed on images obtained using 3.5 or 8.0 MHz were not equivalent. Conclusion: Ultrasound image measurements obtained using an 8.0 MHz frequency were more accurate and precise. It is important to use only one evaluator or providing the simultaneous training for all evaluators.


Introduction
Ultrasound is a non-invasive technique that enables animal evaluation and body condition classification of animals into those for slaughter and those for reproduction. Livestock production systems have started to assess subcutaneous fat thickness (SFT) using ultrasound imaging to predict carcass tissue composition of animals in vivo and to indicate slaughter time [1]. Ultrasound also helps in the breeding stock selection and can indicate precocity and earning potential of weights, feed efficiency, and income from contemporary animal cuts. Sheep from different genetic groups can be classified as early, intermediate, or late, depending on the SFT deposited as the animal matures [2]. Furthermore, it can measure the energy reserves on reproduction stages; ultrasound measurements allow the producer to make the appropriate decisions for proper management to conditioning animals according to their physiological stage.
According to McManus et al. [3], the SFT measured between 12 th and 13 th ribs has a high and positive correlation with carcass fat. Loin eye area (LEA) measure indicates the amount of marketable meat, and the Longissimus thoracis et lumborum (DLM) depth can predict the amount of muscle in the carcass [4,5]. Lambs that had a higher LEA were more efficient and showed a better performance in confinement, resulting in heavier castings [6]. The skin thickness (ST) could be used in the equation to estimate warm and cold carcass weights [7]. These variables can be measured using ultrasound. Therefore, precise measurement of these characteristics is crucial for production estimate accuracy, as well as for decision-making regarding the choice between reproduction and slaughter. An ultrasound image can be obtained by different frequencies ranging from 3.5 to 10 megahertz (MHz), which allows for greater accuracy and precision when examining the target anatomical region. Frequencies higher than 5 MHz generate high-resolution images but have a lower penetration, and therefore do not allow LEA visualization [8]. Frequencies lower than 5 MHz allow a deeper view but generate worse quality images [9]. Although it is not difficult to obtain an image, the analysis of this image also seems to influence results quality [10].
For cattle, it has been reported that the structures that divide the tissues often have variable dimensions with different acoustic impedances, which can result in differences between operators when interpreting the images [11]. According to Silva [8], anatomy knowledge; prior involvement in carcass work (especially dissection); and familiarity with the equipment, image acquisition, and interpretation are some of the factors that pose potential problems related to the operator. Thus, this study aimed to evaluate the influence of image capture frequency (MHz) on the repeatability and reproducibility of the technician trained (image evaluator) in the interpretation of lamb ultrasound images.

Ethical approval
Experimental protocols were approved by the Committee of Ethics in Animal Experimentation (CEUA; protocol no. 018/2013) of the Federal University of Grande Dourados (UFGD), Dourados, Mato Grosso do Sul, Brazil.

Study period and location
The experiment was carried out in September 2013, at the Animal Science sector of the Faculty of Agricultural Sciences of the Federal University of Grande Dourados -FCA / UFGD, located in the municipality of Dourados, Mato Grosso do Sul, Brazil (22°11′55″S, 54°56′7″W and 452 m altitude).

Animals and Images capture
We used 96 male uncastrated lambs of the Pantaneira breed, with weight varying from 15 to 40 kg. As treatments, 3.5 or 8.0 MHz frequencies were used to collect ultrasound images. Images were captured using two types of ultrasound equipment: One of the brand Aloka (SSD-500v Aloka Co., Ltd, Mitakashi, Tokyo, Japan), with a linear probe of 13 cm, with 3.5 MHz frequency and the support of acoustic coupling "standoff," and another of the brand Pie Medical (410477 Falco 100 rev A, California Prop 65 Warning, US) with a multifrequency transducer (6 and 8 MHz), using B mode, with a linear probe and 8.0 MHz frequency. To perform the measurements, lambs were manually immobilized, and with the aid of a comb, the wool was separated in the measuring areas and mucilage was applied for the best transducer coupling to the skin [12]. All the measurements were performed by the same technician, on the left side, between 12 th and 13 th ribs, 4 cm from the spine median line. Images generated by ultrasound were digitally stored for further analysis using a video capture card [13].

Image evaluation
Images were analyzed by five different evaluators at 2 different times. The five evaluators were trained to use Image J software (National Institute of Mental Health, Bethesda, Maryland, USA -http://rsb.info. nih.gov/nih-image/), all evaluators had experience in evaluating ultrasound images from other experiments, but no simultaneous training was performed with the evaluators before starting these evaluations.
Image J software was used for evaluation of the ultrasound images by each evaluator. For all images, a scale adjustment of 30 pixels/cm was performed. Measurements of LEA (only for images obtained with 3.5 MHz), DLM, SFT, SFT plus ST (SFST), and ST were performed. LEA was determined by muscle area contour on the images, DLM was obtained by measuring muscle thickness between the fat layer and the muscle end, SFT was obtained by measuring adipose tissue that was between Longissimus thoracis et lumborum muscle and skin, and SFST was obtained by measuring SFT plus ST (Figure-1).

Statistical analysis
Data were evaluated using the Minitab program 17.x. Measurement repeatability and reproducibility taken by five different evaluators was determined, considering measurement system as: Acceptable (<1% variation in the process), ponderable, use is conditioned to the ultrasound evaluation applicability (between 1 and 9 % variation in the process), or unacceptable (more than 9 % variation in the process) [14]. To compare the results between frequencies (3.5 or 8.0 MHz) used to collect the images, Pearson's correlation was calculated in addition to identity test proposed by Leite and Oliveira [15] using Mann-Whitney and Wilcoxon statistical tests.

Results
For the parameters of LEA, DLM, SFT, SFST, and ST obtained with 3.5 MHz frequency, a variation (p<0.01) was observed among evaluators, with no difference (p>0.05) between assessments performed at different times by the same evaluator (Table-1 When 8.0 MHz frequency was used for image collection, measurements of DLM, SFT, SFST, and ST showed a difference (p<0.01) between evaluators, with no difference (p≥0.05) between assessments performed at different times by the same evaluator (Table-2), similar to the results obtained from images collected using a 3.5 MHz frequency.
For LEA measurement, repeatability was unacceptable (above 9% variation) and reproducibility was ponderable (between 1 and 9% variation) depending on the application. When repeatability or reproducibility is considered ponderable, it indicates that this evaluation can be used depending on its application, in situations such as scientific research, the use would not be recommended, but in field situations, for lot division, diet adjustment, and breeding season beginning, this evaluation does not need much accuracy, so the tool could be used. For DLM obtained from 3.5 MHz images, repeatability was ponderable depending on the application, and the reproducibility was considered acceptable (<1% variation). For SFT and SFST measurements obtained from 3.5 MHz images, repeatability indices were considered acceptable, and reproducibility indices were ponderable, indicating that measurement system is acceptable depending on the application. For ST evaluation, both repeatability and reproducibility were considered acceptable.
When repeatability and reproducibility test was applied to the measurements taken from images obtained with 8.0 MHz, a higher precision and accuracy of assessments were observed. For DLM, repeatability and reproducibility were ponderable depending on the application. For SFT, SFST, and ST measurements obtained from images using 8.0 MHz, repeatability and reproducibility indices were considered acceptable.
Correlations between DLM, SFT, SFST, and ST measurements performed on images obtained with 3.5 or 8.0 MHz frequency were low but significant, ranging from 0.11 to 0.49 (Table-3). When DLM, SFT, SFST, and ST data were plotted to generate a linear equation (Figure-2), a low coefficient of determination was observed. Dispersion between maximum and minimum values for LEA, DLM, SFT, SFST, and ST is dependent on the variation in physiological state of evaluated animals, since animals varied in weight from 15 to 40 kg. Since our aim was to evaluate a representative population, we decided to use animals  Available at www.veterinaryworld.org/Vol.14/January-2021/33.pdf that presented dispersion in measured parameters. The relationship between data obtained using different frequencies (3.5 and 8.0 MHz) does not fit first, second, or third-degree regression equations (Figure-2).  Available at www.veterinaryworld.org/Vol.14/January-2021/33.pdf

Discussion
The difference of LEA, DLM, SFT, SFST, and ST indicates that evaluators were consistent in their assessments; however, measurements were not consistent between evaluators, regardless of which variable was measured. According to Mercadante et al. [11], implementation of systems for carcass evaluation by ultrasonography is dependent on the availability of a high number of trained technicians, both to collect images in the field and to measure images in the laboratory. The ability to interpret ultrasonographic image depends on the operator's experience [8]. All evaluators in this study had previous experience in ultrasound images evaluation. However, the training received by them was not simultaneous, which may have caused consistency in the evaluations of the same technician (when they were repeated at different times), but variation in result quality between evaluators.
Differences in measurements between evaluators might be due to anatomical points that are difficult to visualize, for example, in ultrasound image of Longissimus thoracis et lumborum muscle, its lateral and inferior borders often have poor resolution [9]. These difficult-to-visualize points will interfere mainly with LEA and DLM measurements, which had the least satisfactory results regarding repeatability and reproducibility.
A low-frequency probe has a low resolution of surface tissue layers, for example, for subcutaneous fat measurement, whereas a high-frequency probe has a higher resolution at the surface and lower penetration capacity [8]. Therefore, it is not possible to evaluate LEA obtained from images using an 8.0 MHz frequency. Furthermore, image quality collected in small ruminants can be affected by narrow space between ribs and also by muscle small area [16] and wool presence of, which needs to be removed at the image collection site [17].
Repeatability and reproducibility test evaluate not only the difference between assessments but also assessment precision and accuracy. Repeatability and reproducibility results indicate that when assessment accuracy and precision are fundamental, for example, in scientific study cases, the use of 8.0 MHz frequency is the most appropriate.
When identity test was performed, the significance of Mann-Whitney and Wilcoxon tests, which compare correlation coefficients and mean errors, was used to determine the similarity or identity between methods. This evaluation demonstrated a significant effect for all comparisons between 3.5 and 8.0 MHz frequencies. Observing the criteria established by Leite and Oliveira [15], we can conclude that measurements performed on images obtained using 3.5 or 8.0 MHz are not equivalent.
In the present study, we can observe that both linear correlation and identity test obtained from different frequencies (3.5 and 8.0 MHz) presented different results, and it is not possible to use equations to predict equivalence between measurements performed on images obtained with 3.5 or 8.0 MHz frequencies. Considering the observed variations, it seems that it is fundamental to scan the images for further evaluation in a quiet environment and with appropriate software, since the greatest evaluation accuracy can be obtained using specific software depending on the image resolution [8]. It helps to explain some of the results, where the ultrasound measurements were unsatisfactory compared to subsequent assessments obtained directly on the carcass [18].

Conclusion
Ultrasound can be an asset for producers since they can predict the carcass and meat characteristics, facilitating their management. However, it is important to highlight that the ultrasonographic image measurements obtained with a frequency of 8.0 MHz are more accurate and precise than those from images obtained using 3.5 MHz. Depth variables of Longissimus thoracis et lumborum muscle, fat thickness, and fat thickness plus skin were more accurate and precise when 8.0 MHz was used. The identity test indicated no equivalence between measurements obtained with 3.5 and 8.0 MHz frequencies. A single evaluator or a set of trained technicians that are constantly evaluated is required for technical standardization and enough training to reduce the evaluation dispersion and to be a reliable work.