Exploring of the MobileNet V1 and MobileNet V2 models on NVIDIA Jetson Nano microcomputer

This article is devoted to the study of the accuracy of MobileNet V1 and MobileNet V2 models when recognizing pedestrians at different times of the year, with changing distance and position. According to the test results, the pros and cons of the models were identified, as well as their capabilities and dependencies. Models were compared directly to each other. All studies were performed on an NVIDIA Jetson Nano microcomputer.


Introduction
Today it is difficult to imagine a world where it is safe on the roads. Tragedies happen every day due to human carelessness, negligence and banal stupidity. Therefore, the creation of self-driving cars is one of the most important and promising areas. Corporations are investing billions in the development of this field. The prospect of protecting citizens, giving them the opportunity to rest and work on the road, as well as getting rid of traffic jams and being able to safely deliver goods in case of emergencies is very tempting. Therefore, it is so important to conduct research in this field.
In recent decades, computer vision researchers have made tremendous efforts to enable computers to understand the world from video data. The most important task of computer vision is the automatic detection of objects. The most demanded, but at the same time difficult task is the recognition of people. Accurate detection and localization of pedestrians could significantly help road safety. It could also help take the next step in such fields of expertise as video surveillance, robotics automation and more. This topic is of interest to both researchers and industry. In video surveillance, pedestrian detection helps provide basic information for people counting, event recognition and crowd monitoring. For intelligent driver assistance systems, pedestrian detection is an important part of the system for recognizing objects in the surrounding space [1][2][3].
The main purpose of this work is to compare the models MobileNet V1 and MobileNet V2 running on the NVIDIA Jetson Nano microcomputer, in order to define the accuracy of pedestrian recognition under various conditions, and also to make conclusions based on the results obtained.

NVIDIA Jetson Nano Developer Kit
The algorithm for pedestrian recognition was implemented and run on the NVIDIA Jetson Nano microcomputer [4]. This development kit provides the computing power and functionality to leverage modern artificial intelligence (AI) in a low-cost, low-power, easy-to-use platform.
This kit is supported by the comprehensive NVIDIA® JetPack ™ SDK, which includes L4T (Linux for Tegra, Linux distribution for Ubuntu Desktop with NVIDIA drivers), AI and computer vision APIs and libraries, development tools, documentation, and code samples. The complete kit consists of a Jetson

Accuracy of pedestrian detection based on figure resolution and season
Investigated how distance and time of year affect recognition accuracy. The resolution of the figure steadily decreases with increasing distance. The study was carried out by filming the immediate distance from pedestrians to the camera. Noteworthy data are resolution, accuracy, and distance. In a stationary position at a distance of 5 meters, as shown in Figure 1, the human figure has dimensions of 104 × 353 px in frontal and 73 × 345 px in profile shooting in summer. In winter photos, the resolution of the figures is 98 × 325 px for frontal shooting and 94 × 323 px for profile shooting. As can be seen from the received data MobileNet V1 model recognizes a pedestrian well at a distance of 5 m. Recognition accuracy in summer is close to 100% for both frontal and profile shooting. On winter shots, the accuracy is slightly lower than on summer shots and is 96.9% for frontal shots and 88.4% for profile shots.
Similar studies on the same videos were carried out for the MobileNet V2 model Figure 2. The MobileNet V2 model performed similarly, also doing well in pedestrian recognition in the summer. The recognition accuracy in winter is slightly better than that of MobileNet V1 and amounted to 95.1% and 90%. It can be concluded that recognition of pedestrians in winter is more difficult for both models. Most likely this is due to the image of a pedestrian, a down jacket and a hat reduce the recognition accuracy.
After this, the distance gradually increased. Thus, Figure 3 shows  The figure shows how the accuracy decreases significantly with increasing distance. On summer shots, already at a distance of 25 m, it is equal to 72.2%, and at 30 m MobileNet V1 did not detect a pedestrian. On winter frames, the recognition accuracy at 25 m is 89.5%. This is higher than in the frames in the summer, and the pedestrian, albeit bad, was recognized at a distance of 30 m. The same studies were carried out for the MobileNet V2 model, Figure 4. The MobileNet V1 model does a better job of recognizing pedestrians in winter, it showed much better accuracy and was able to recognize a person at a distance of 30 m.

Accuracy of detecting pedestrians crossing the road depending on the season
According to statistics, a huge number of road accidents with pedestrian collisions occur when crossing the road in the wrong place. As a consequence, the algorithm was tested for the ability to recognize a running person. Figure 6 shows a running pedestrian at a distance of 5 m, 10 m in summer and winter, detected by the MobileNet V1 model.  Similar studies were carried out for the MobileNet V2 model Figure 8. The result of such testing was uninterrupted recognition on summer frames with a very high accuracy at a distance of 5 m. The frame-by-frame accuracy of a person in motion at a distance of 5 m is shown in Figure 9. However, as can be seen from the figure, the recognition accuracy in winter frames is significantly lower than in summer ones. For winter shots at a distance of 5 m, the average accuracy is 88.3% versus 94.9% for summer ones. With an increase in the distance to 10 m, the average accuracy drops rapidly in frames of both seasons. Thus, winter shots have an average accuracy of 66.6%, and summer ones -60.3%. As you can see, the increase in distance significantly affected both accuracy and stability with the overall dynamics of the graph. The recognition accuracy has noticeably decreased, in several places it even turned out to be below the threshold value equal to 50%. It can be concluded that distance plays its role in recognizing a person crossing the road.

Conclusion
At the end of the study of the implemented algorithms, one can come to the following conclusions. The resolution of the figure image significantly affects the recognition accuracy. With an increase in the distance to 30 m, the algorithm was no longer able to recognize a pedestrian in the summer. The MobileNet V2 model showed a similar result. The older model showed slightly better accuracy results at several distances at once. In winter, at a distance of 30 m, only MobileNet V1 recognized a pedestrian. However, this recognition has a poor level of accuracy. Undoubtedly, what kind of clothes a person is wearing and what background has a huge impact on the accuracy of recognition. Winter clothing is new to models and reduces accuracy. With an increase in the distance, its influence will decrease, and the effect of the background will increase.
Both models do a pretty good job of recognizing a running pedestrian in the summer at a distance of 5 m. However, in winter shootings, the recognition becomes less effective. The difference is especially big for MobileNet V1, where the average accuracy of summer and winter differs by almost 14%. MobileNet V2 has a less noticeable difference, the average accuracy of winter frames is less by 6.6%. With an increase in the distance to 10 m, the average accuracy of winter frames turned out to be even higher than summer ones by 6.3%. While the average recognition accuracy of MobileNet V1 for winter frames at a distance of 10 m was a modest 57.8%. MobileNet V2 does a much better job at recognizing pedestrians running across in winter, and in summer frames the accuracy of the models is approximately equal.
Summing up, we can say that the MobileNet V1 model is still relevant for practical use. It is practically in no way inferior to its new version, especially in performance. The model is very promising for further improvements.