Mobile Terminal Video Image Fuzzy Feature Extraction Simulation Based on SURF Virtual Reality Technology

Extracting the fuzzy feature of the mobile video image can effectively improve the low illumination image quality. Traditional methods are used to construct fuzzy feature indexes of mobile terminal video images, and the detailed information of video images is divided, but the bidirectional matching of feature points is ignored, which leads to low extraction accuracy. Therefore, this paper proposes a method for extracting fuzzy features of mobile terminal video images based on SURF-based virtual reality technology. First, perform video image grayscale extraction on the input mobile terminal video image, and detect the closed area in the mobile terminal video image as the radiation invariant area of the terminal video image. Secondly, Hessian matrix is used to detect the feature points of the image, and the non-maximum suppression method and interpolation operation are used to find and locate the extreme value points. Then, the main direction of feature points was determined, and SURF description operator was used for matching to obtain initial matching point pairs. Finally, the obtained fuzzy feature one-way matching result of the video image is matched in two directions, the closest distance ratio is used to match the feature points, and the full constraint condition is used to filter out the wrong matching point pairs, thereby completing the mobile terminal video image fuzzy feature extraction. The experimental results show that the proposed algorithm is effective in feature extraction and matching, stability and speed. The misrecognition rate of the algorithm in this paper is 0.101, and the time used is only 0.41 s, which fully meets the real-time requirements.


I. INTRODUCTION
Virtual reality technology can combine processed information with images in real life, giving people an illusion of being in a simulated environment and meeting people's requirements [1], [2]. As a synthesis of various technologies, virtual reality is widely used in medical science, entertainment, military aerospace, interior design and other fields [3], [4]. Computer video image processing technology lays a solid foundation for the complex analysis, decision-making, and management of mobile devices with intelligent operations to extract reality technology [5]. Since the mobile device has designed a strong video image processing system in The associate editor coordinating the review of this manuscript and approving it for publication was Zhihan Lv . the development stage, it has embedded operating structure, touch display, GPS positioning and video recording functions, which provides the basic conditions for the development of the extraction of the reality system to the mobile terminal [6], [7]. In the process of using current video image extraction methods to extract video images of mobile terminals, pixels with low probability may be excessively merged due to insufficient lighting, which easily leads to a reduction in the gray level of the video image and loss of detailed information of the video image of the mobile terminal. In this case, how to effectively improve the low-illumination of image quality has become a major problem to be solved urgently in the field of mobile device video image processing research and development, and has received widespread attention.
Literatures [8]- [10] proposed the use of wavelet transform to achieve fast image matching, which has good timefrequency localization, multi-resolution analysis and suppression of high-frequency noise advantages, but the image feature extraction is subject to the selected wavelet base greater impact. Tang et al. [11] proposed Harris corner detection feature matching algorithm, which is a kind of corner extraction directly based on gray image, with high stability and sensitivity to l-shaped corner detection. But the calculation speed is slow, there are corner information loss, position shift, and clustering phenomenon. On this basis, Seada et al. [12], [13] improved the Harris matching algorithm, but it cannot adapt to the problem of image scale changes. Wang et al. [14] proposed SIFT algorithm. The method finds the extremum point in the scale space and extracts the invariant features of position, scale, and rotation. However, because the SIFT algorithm does not consider the geometric constraint information of the space, the mismatch rate is high, and obvious mismatches and mismatches are prone to appear. Wang et al. [15] proposed a SURF extraction algorithm. The method is improved from the SIFT feature. Through the combination of Harris feature and integral image, the running speed of the program is greatly accelerated. It can maintain invariance under both image scale and affine transformation, better robustness, but also has the problem of high false match rate. He et al. [16], [17] combined with other algorithms to further improve SURF, and achieved good results. Li et al. [18] proposed a fuzzy feature extraction method of mobile terminal video image based on video image matching. Hashemzadeh et al. [19] proposed a method for extracting fuzzy features of video images of mobile terminals in the form of video image denoising. This method first divides the detailed information of the video image into smooth areas, edges, corners, and isolated noise points by constructing the fuzzy feature index and gradient variation index of the mobile terminal video image. The fuzzy feature value of the diffusion tensor is set according to the result of the detailed information division. This method has a good denoising effect on video images, but there is a problem that the extraction process takes too long. Al-wswasi et al. [20] proposed a method for extracting fuzzy features of mobile terminal video images based on video image fuzzy feature block. This method constructs a mobile terminal video image fractional differential mask operator, defines the fractional order according to the result of the mobile terminal video image fuzzy feature block, and forms a fractional order matrix. On this basis, the fractional order the matrix is substituted into the mask operator to complete the extraction of fuzzy features of the video image of the mobile terminal. This method only extracts the details of the terminal video image to a greater extent, and the extraction effect is not ideal for the problem of holes in the video image of the mobile terminal.
A method based on SURF-based virtual reality technology for mobile terminal video image fuzzy feature extraction is proposed. First, perform video image grayscale extraction on the input mobile terminal video image, and detect the closed area in the mobile terminal video image as the radiation invariant area of the terminal video image. Secondly, the Hessian matrix and the SURF description operator are used for matching to obtain the initial matching point pair. Finally, the obtained fuzzy feature one-way matching result of the video image is matched in two directions, thus the fuzzy feature extraction of mobile video image is completed.

II. THE PRINCIPLE OF VIDEO IMAGE EXTRACTION FOR MOBILE TERMINALS A. VIDEO IMAGE FEATURE EXTRACTION
Image feature recognition refers to the use of computers to recognize iconic attribute information in an image. These attributes include natural features and man-made features [21]. Natural features refer to direct image features such as the color, brightness, and texture of the image, and artificial features refer to image features that are indirectly obtained through calculations such as the histogram and frequency spectrum of the image. Feature extraction is to analyze and process the iconic attribute information of the image, and extract the relatively stable information that has nothing to do with random factors, such as illumination, angle, field of view and other factors, as image features [22].
In the process of extracting fuzzy features of mobile terminal video images under virtual reality technology, a detection algorithm is used to detect local fuzzy features of mobile terminal video images. After obtaining the video image fuzzy feature points, the mobile terminal video image fuzzy feature description is performed. The conventional maximum suppression method is selected to eliminate the non-blurring feature points of the mobile terminal video image. Finally, the fuzzy feature vectors of these mobile terminals are analyzed for classification and matching. Select the corresponding parameters for the fuzzy feature extraction processing of the mobile terminal video image to achieve the purpose of adaptive matching of the extraction algorithm. The specific process is as follows: 1) Assuming that the gray value of the mobile terminal video image pixel of the point to be detected is quite different from the gray value of the candidate point. In the center coordinates of a certain point in the two-dimensional mobile terminal video image, use the following formula to describe the fuzzy feature point of the mobile terminal video image: I gray ≤ I center − threshold similar I center − threshold ≤ I gray ≤ I center + threshold bright I center − threshold ≤ I gray (1) In the formula, t represents the threshold, which defines the gray level at any point of the video image of the mobile terminal as I gray and the gray level at the center of the circle as I center . When the gray value I center of the central pixel is smaller than the gray value I x + threshold of the pixel at the surrounding circle point x, the gray pixels of the video image of the mobile terminal are considered to be darker, and similar I center →I gray = d is satisfied. Use this as a basis to derive s and the brighter grayscale pixel b.
2) Obtain candidate fuzzy feature points by comparing the gray values of pixels on the video image radius of the fixed mobile terminal. Assume that I center represents the fuzzy feature point as true, otherwise false. Therefore, whether a pixel of a mobile terminal video image is a fuzzy feature point is determined.
3) Finally, train the fuzzy feature points according to the decision tree, calculate the corner response function F V , and define a response function using the following formula: Usually, the conventional maximum value suppression method is used to eliminate the non-blurring feature points of the mobile terminal video image. Figure 1 is a framework based on feature extraction of mobile terminal video images.

B. SIFT DESCRIPTION
SIFT features are based on the extrinsic view of the object of interest, independent of the size and rotation of the image [23], [24]. It also has a high tolerance for light, noise, and micro-angle changes. Based on these characteristics, they stand out and are relatively easy to capture. In a large database of features, objects are easily identified and rarely misidentified. The detection rate of partially obscured targets using SIFT feature descriptions is also high, and even more than three SIFT target features are sufficient to calculate location and orientation. Under the condition of fast hardware speed and small feature database, the recognition speed can be calculated in real time. SIFT features have a large amount of information, which is suitable for fast and accurate matching in massive databases. Figure 2 is a diagram of the SIFT feature extraction process. SIFT feature detection mainly includes the following four steps [25], [26]: (1) Extremum detection in scale space (2) Positioning of key points (3) Direction determination (4) Description of key points

C. SURF DESCRIPTION
The Harris feature point detection is sensitive to rotation, scale, and illumination [27]. SIFT feature extraction method has the advantages of good robustness, rotation transform, scale transform and illumination invariant. However, disadvantage of the SIFT feature extraction method is that it is slow and not suitable for real-time processing of moving target images. Compared with SIFT feature extraction, SURF feature point detection method maintains the accuracy and robustness of SIFT feature point extraction, improves the speed, and is widely used in feature extraction. Therefore, this paper adopts the SURF feature point extraction method [28]. The Hessian matrix is as follows: Among them, the elements of the Hessian matrix are the partial derivatives of x and y respectively. Set the weight value of 0.9 to calculate the value of the determinant of the Hessian matrix, namely: This is taken based on experience, and setting the weight value of 0.9 to calculate the value of the Hessian matrix determinant can well maintain the gradient characteristics of the image.
The size of the determinant value reflects the significance of the feature in the image, and a threshold can be set to filter out points less than the threshold. Prior to this, multiple scales of Gaussian filtering should be performed on the input image, which is, corresponding to different degrees of blur, to build an image pyramid. Similar to the non-maximum suppression in the SIFT algorithm, the SURF descriptor also detects feature points through non-maximum suppression. Next, we will describe the feature points to facilitate subsequent registration and other processing. To process each detected feature point, first take the current pixel as the center, establish an N×N window in the neighborhood, and detect the main direction of each pixel. Secondly, the window is divided into multiple subwindows, and the descriptor of the SURF feature is obtained. The schematic diagram of this process is shown in Figure 3.

III. A SURF-BASED VIDEO IMAGE FUZZY FEATURE EXTRACTION ALGORITHM
This paper proposes a method for extracting fuzzy features of mobile terminal video images based on SURF-based virtual reality technology. First, perform video image grayscale extraction on the input mobile terminal video image, and detect the closed area in the mobile terminal video image as the radiation invariant area of the terminal video image. Secondly, the Hessian matrix and the SURF description operator are used for matching to obtain the initial matching point pair. Finally, the obtained fuzzy feature one-way matching result of the video image is matched in two directions, thereby completing the mobile terminal video image fuzzy feature extraction. Figure 4 shows the algorithm flow chart of this article.

A. RADIATION-INVARIANT CLOSED AREA EXTRACTION OF MOBILE TERMINAL IMAGES
Firstly, gray histogram equalization method is used to extract the input mobile terminal video image, which emphasizes the In the formula, variable I FR , variable I FG , variable I FB represent the R, G, and B components. Probability of appearance of pixel with gray level i in the video image of the mobile terminal can be expressed as: In the formula, the variable N represents the number of all gray levels in the mobile terminal's gray-scale video image, and the variable Num represents the number of all pixels in the mobile terminal's video image.
Assuming that the variable A represents the cumulative probability distribution function corresponding to the variable pro, you can use the following formula to define A: For the mobile terminal gray-scale video image IG, the gray-scale range of the mobile terminal video image after gray-scale processing is in the interval I G min ∼ I G max , and the output video image Ig after the histogram equalization of IG according to the cumulative probability distribution VOLUME 8, 2020 function can be expressed for: After equalization, the gray distribution of the mobile terminal video image is relatively uniform, and the histogram corresponding to the mobile terminal video image also shows a relatively regular trend of change. After the mobile terminal video image extraction process, most of the interference area on the original terminal video image is removed, and the effect of noise is effectively reduced to a certain extent. Applying different thresholds t i to the mobile terminal video image Ig can obtain multiple binarized mobile terminal video images Ib i . The conversion formula from the mobile terminal gray-scale video image Ig to the binarized video image Ib i can be defined as: In the formula, the variable t i determines the number of regions in Ib i . Different gray levels can be selected to obtain different mobile terminal video image blocks Ig(x, y). The number of closed regions in the corresponding terminal video images is also different, and the radiation invariant region can be extracted from Ib i .
Remove the range connected with the mobile terminal video image frame in Ib i , that is, remove the area smaller than δ, and mark the remaining mobile terminal video image as Ib i . Obviously the number of regions in Ib i is less than bi, which defines m regions in Ib i . By calculating the area area j of the j − th area, the total area of the mobile terminal area is area, which is expressed as: In the formula, the variable area represents the area of the m − th area. Set Weight i to be the area weight of i − th area of the video image of the mobile terminal, then the variable Weight i can be defined as: The average area avg_area of m areas can be calculated according to the area weight of each area of the video image of the mobile terminal. Use the following formula to express avg_area = Weight j area j = area 2 j area (12) In the formula, the variable Weight j is the area weight of the j−th area of the video image of the mobile terminal. Variable δ value is calculated for the remaining part, and the noise less than δ or the closed area with less area are eliminated.

B. SURF FEATURE POINT DETECTION
The Hessian detection operator is used to detect the fuzzy feature points in the radiative closed region of mobile terminal video images, and generate the SURF fuzzy feature description vector. Secondly, the initial matching point pair of the video image of the mobile terminal is obtained through the fast approximate nearest neighbor search algorithm. Then, two-way matching is performed on the obtained one-way matching result of the fuzzy feature of the mobile terminal video image. Finally, the method of counting the distance mean error between fuzzy feature points is used to eliminate mismatched points.
Feature extraction and matching relationship whether feature matching can reflect the true position of a point in space. Extraction of feature points is related to the characteristics of the obtained image and also related to the matching method of feature points. Commonly used feature extraction mainly includes corner features, line features, local area features, invariance features, etc. [29]- [31]. Considering the influence of complex background environment and lighting factors, this paper adopted SURF algorithm to better extract feature points in the case of light changes and image changes. The scale invariance of this algorithm is superior to Harris algorithm, and the time complexity is relatively low. In this paper, the SURF algorithm is used, which can extract feature points better when the illumination changes and the image changes to a certain extent, and the scale invariance is better than Harris, and the time complexity is lower than SIFT.
The maximum value of matrix determinant is used to describe the location of block fuzzy feature structure of mobile terminal video image. The Hessian matrix is composed of partial derivatives of functions, which can be expressed as: Use the following formula to give the Hessian matrix discriminant: All fuzzy feature points of mobile terminal video images are classified by using the symbol of judgment result. According to the discriminant, positive or negative are taken to judge whether the point is an extremum point. The second order standard Gaussian function is selected as the mobile terminal video image filter, and the second order partial derivative of the mobile terminal video image is calculated by convolution between specific cores. Thus, calculate the three matrix elements on the scale Y, and calculate the matrix with the following formula In the formula, the variable X' represents the Gaussian function, and variable ∂ xx (X , Y ) represents the convolution of second derivative of Gaussian ∂ 2 ∂x 2 X and the mobile terminal video image IF at point X. Two-dimensional movement of the mobile terminal video image in the y, and x directions after Gaussian filtering convolution of terminal video images.
The approximate value of the matrix discriminant can calculate with the following formula: In the calculation, the weight coefficient ω is defined as 0.9. According to the Taylor expansion, we can obtain: When Hessian(X ) = 0 is satisfied, we can get: The derivative of the function can be approximated by the difference between neighboring pixels of the video image of the mobile terminal. To cover all scales, there will be overlap between groups, as shown in Figure 5(a). Figure 5(a) shows the division of each group corresponding to different scale spaces. Figure 5(b) corresponds to the 9 × 9 filter template in the first group of scale space, and the scale increases from the bottom to the top of the inverted pyramid.
The Hessian matrix feature point is compared with 26 other values in a 3 × 3 × 3 scale space centered on the Hessian matrix. It is a local extreme point only if it is greater than or less than some other value. Then interpolate in scale space and image space to get the final feature point position and scale value. In order for the feature to have better rotation invariance, it is necessary to give each feature point a principal direction. Then, centering on the feature points, rotate the coordinate axis to the main direction, select a square area of 20s×20s (s is the proportion of the feature points), and divide it into 16 square sub-windows with each side length of 5S. Then the Gaussian weighted wavelet is obtained by using the Haar wavelet template with the scale of 2s. Finally, in each square matrix, the Haar wavelet responses in x and Y directions are summed to form a four-dimensional vector, which is normalized to form a 16 × 4 with a total of 64 dimensions SURF description operator.  Table 1 compares the positions of the SURF feature points in the original image obtained after rotating the image by 0 • , 90 • , 180 • , and 270 • . Target feature points extracted after rotation and the target feature points extracted without rotation are intuitively almost overlapped, and the numbers are the same. It can be seen that the SURF description operator not only has scale and rotation invariance, but also has the same feature vector when there is no noise interference.

C. MATCHING OF FUZZY FEATURE POINTS IN VIDEO IMAGES OF MOBILE TERMINALS
The accuracy of feature point extraction directly affects the matching result. On the other hand, in image matching, due to the influence of many factors such as distortion and feature inconsistency, mismatches may occur in the matching process. In order to weaken the mismatch, the corresponding measurement method needs to be selected. This article uses similarity measurement method. Let n1 and n2 be the number of feature points of images A and B respectively, and variable A j and variable B j are any feature points of images A and B respectively, then the distance similarity between variable A j and variable B j is equation (20): (20) In formula, the variable m is the dimension. The smaller the variable Similar(A j , B j ), the higher the similarity of the feature points, but this is likely to cause two mismatches.
For this reason, this paper uses nearest distance to next nearest distance to find the most obvious feature points within the set threshold. The method is defined as: Let Similar1 be shortest distance (dimensionless), Similar2 be the sub-close distance, η is the set threshold (0< η ≤1), and must satisfy condition shown in equation (21).
In this paper, local epiploic constraints, left and right coordinate constraints, unique constraints, are integrated to improve the correct matching rate.
In practical applications, the x-axis of parallel binocular camera is not collinear, and there is a slight deviation. Therefore, the epiploic constraint cannot be directly used when matching the left and right target points. Therefore, local limit constraints are proposed. That is, there is a left and right target point matching search area on longitudinal y axis, as shown in equation (22).
Among them, variable y left and variable y right are the ordinates of the left and right matching points respectively. If the value of threshold is too small, it will degenerate into a polar constraint. This causes a large number of correct matching point pairs to be filtered out; if the threshold value is too large, incorrect matching point pairs cannot be filtered out, and it will increase time consumption.
The left and right coordinate constraint is that the x coordinate of the target point in the left image is greater than the corresponding x coordinate in the right image [32], and the difference can be expressed as equation (23).
Among them, the variable L is the z coordinate of the target point in the camera coordinate system, the variable Z is the length of the baseline, the variable V is the focal length, the variable s is the physical size of the pixel, and the variable y left is the target point The x coordinate in the left image, and the variable y right is the corresponding x coordinate of the target point in the right image. The physical quantities on the right side of equation (23) are all positive. Therefore, the formula (24) is established, which is the coordinate constraint of the left and right target points. y left > y right (24) The principle of uniqueness constraint: a feature point in the left image, if a matching point exists in the right image, it is unique. Combining formulas (22), (24) and uniqueness constraints, the matching area will be further reduced. Obviously, this greatly reduces the area of the target point matching search.

D. IMAGE FEATURE RECOGNITION BASED ON HMM MODEL
The hidden Markov model is selected for feature classification, and the more obvious boundary features in the image are selected as image features. Aiming at the image boundary, a method based on the Euclidean distance from the image centroid to the boundary is proposed, which divides the boundary from 0 to 180 degrees into 90-dimensional features. In this process, the centroid coordinates need to be calculated, as shown in the following formula: The HMM model is selected to classify the image features. The HMM model is a collection of transferred states. Each step of the transition corresponds to the transition probability and the output probability. The output probability is the probability of forming an output value from a large number of values in a specific state. The transition probability refers to the probability of each state transitioning to each other. Set the HMM model as: The number of N state sets is {R 1 , R 2 , . . . , R N }, and the state at time u is expressed by a random variable H u . The set of M intuitive values is {r 1 , r 2 , . . . , r N }, the intuitive sequence at time u is expressed by the random variable J u , and the intuitive sequence corresponds to the output value of the HMM obvious state variables.
An N×N state transition movement probability distribution matrix is P = {O ij }, and the probability expression is: An intuitive probability distribution matrix is S = {O j (k)}, and the variable O j (k) represents the probability of outputting O j (k) in the R j state at time u: An original state distribution matrix M = {M i }, the variable M i is the probability that the original state is R i : In summary, the hidden Markov model can be expressed as: The purpose of image feature recognition through the HMM model is to make the image feature recognition more accurate, while also reducing the time consumed by redundant steps and improving the computational efficiency.

IV. EXPERIMENTAL VERIFICATION A. EXPERIMENTAL DATA DESCRIPTION
The experiment was carried out on the virtual machine of ARM CPU, 343M, and Android 4.4 system in Eclipse software.
The experimental data selects a total of 500 pictures collected on the Internet and taken by actual mobile terminals, which are used as experimental reference sample data, covering the multi-view, and multi-color changes of the mobile device terminal. 300 pictures formed by transformations such as rotation, brightness, adding Gaussian white noise and 50 pictures taken by the actual mobile terminal are used as the test set, which are divided into two video image set types A and B according to the video image size type. As shown in Table 2, simulation parameters are shown.

B. THRESHOLD SELECTION FOR FEATURE POINT MATCHING
When the ratio increases, the correct matching rate decreases faster, and the result is shown by the dotted line in Figure 6. Therefore, it is necessary to further filter out incorrect matching points. Figure 6 shows the statistical results of comparing the correct matching rate with the above three constraints and full constraints under different ratio conditions. It can be seen that in this experiment, the correct matching rate using local constraints, left and right coordinate constraints, and uniqueness constraints is sequentially reduced. At the same time, the correct matching rate using full constraints does not change much within the range of ratio, and the correct rate can still reach 90% under full constraints. On the other hand, considering that the ratio is smaller, the number of detected feature points will be reduced, so in order to ensure a sufficient number of feature points, the ratio needs to be as large as possible. Therefore, ratio = 0.9 is more appropriate in this experiment.
As shown in Figure 7, it is a comparative analysis of filtering out wrong matching points and filtering out correct matching points under different ratio conditions for images. Obviously, the filtering curve represented by different ratios in Figure 7(a) is close to the level when the local constraint range is 5-30, indicating that the mismatching has no effect on filtering in this local range. Figure 7(b) shows the change curve that uses different ratios to filter out the correct matching points within the local constraint range. Obviously, the curve shows a downward trend, indicating that the correct matching points filtered out within the local constraint range gradually decrease, and it is close to 0 at 20. In summary, it is more appropriate to take the threshold of the local constraint range threshold = 20 in this experiment.

C. ANALYSIS OF THE RESULTS OF DIFFERENT ALGORITHMS FOR IMAGE FUZZY FEATURE EXTRACTION
In this paper, fuzzy feature extraction experiments are carried out on mobile terminal video images using the method presented in this paper, the method in literature [19] and the method in literature [20]. The mobile terminal video image A and B are used to extract the fuzzy feature. Specific results are shown in Table 3. Table 3 shows that with the gradual increase in the size of mobile terminal video images, the time required to extract fuzzy features of mobile terminal video images using the method in this paper is much shorter than that in the literature [19] and the literature [20] method, which can fully satisfy the virtual reality technology under the real-time requirements of mobile terminals.
The paper, literature [19], and literature [20] were used to carry out fuzzy feature extraction experiments of mobile terminal video images. In order to verify the extraction effects of different methods on fuzzy features of mobile terminal video images, Figure 8 shows the extraction effects of different methods. As can be seen from Figure 8, the fuzzy features of the mobile video image are shown in Figure 8(a), and there are many fuzzy features. As can be seen from the experimental results, Figure 8(b) shows the effect of the fuzzy feature extraction method in reference [19]. Figure 8(c) is the result of the extraction method in this paper. The contrast of local texture details of mobile terminal video image increases, which is obviously better than the method mentioned in the literature. Figure 9 shows two test images using different methods for feature extraction and matching experiments. Figure 9(a) and Figure 9(b) are the test images using SIFT algorithm to extract image features. Figure9(c) is the result of matching the image features extracted by SIFT. Figures 9(d) and 9(e) are the experimental results obtained by the SURF feature extraction algorithm used in this paper, and Figure 9(f) is the result of matching the features extracted by the SURF algorithm. From the perspective of visual effects, these figures all extract the feature information of the two images to varying degrees, but the feature information extracted by SURF in this article with more points, the amount of information expressed is richer. Figure 9(c) and Figure 9 (f) after the feature information is quickly matched, the matching speed and success rate are also ideal.

D. ANALYSIS OF THE RESULTS OF DIFFERENT ALGORITHMS FOR FEATURE MATCHING POINTS
In this paper, four indicators of SIFT algorithm and SURF algorithm were counted. These indexes are the number of feature points, the number of matching points, the matching success rate, and the running time respectively. In this paper, these indexes are counted and the quality of two different algorithms is evaluated. The number of feature points reflects the feature extraction capability of the algorithm. The more feature points, the more detailed image information. The number of matching points and the power of matching results reflect the quality of image matching. The running time reflects the efficiency of the algorithm. Table 4 shows that method proposed by paper has more feature points and matching points, higher matching success rate, and shorter running time.
Using the histogram affine method, Harris method, SIFT method, literature [19], literature [20] and the proposed method to carry out two sets of comparison experiments for the repeated feature rate, and the comparison result is shown in Figure 10.  Analyzing the experimental results of Figure 10, it can be seen that Figure 10  calculating feature repetition rate of group B. Combining the two sets of image feature repetition rates, for images with different pixels, the larger the pixel, the greater the number of features, and the higher the repetitive feature rate. It can be seen that features with low pixels are less robust in recognition. The histogram affine method used in the A group of images to identify the repetitive feature rate is higher, but in the B group of images, the feature repetition rate recognized by this method is lower, indicating that this method is not suitable for images with higher pixels. For group A, the Harris method has a poor recognition effect. In group B, although the highest value of the recognition feature repetition rate is between 0.75% and 0.80%, the recognition effect of this method fluctuates. From an intuitive point of view, the trend of broken line change is too frequent, indicating that the recognition effect of this method is unstable. However, the proposed method performs well in both sets of experiments. It can be seen from the figure that the variation of the feature repetition rate recognized by the proposed method is small and the change is relatively stable, indicating that the proposed method is more robust and the recognition effect is excellent and feasible.

E. ANALYSIS OF RECOGNITION RESULTS AFTER DIFFERENT ALGORITHM FEATURE EXTRACTION
Perform the next experiment on the taken experimental images, using histogram affine method, Harris method, SIFT method, literature [19], literature [20] to compare with the proposed method to verify the credibility of the proposed method. The purpose of preprocessing the collected experimental objects is to improve the identification efficiency. First select an image, convert the image to a grayscale image, and perform binarization processing on the grayscale image to obtain an image to be recognized with a size of 32 * 32. The histogram affine method, Harris method and the proposed method are used to perform image feature recognition experiments. The experimental comparison results are shown in Table 5 and Figure 11.
Experimental results show that number of recognition of the histogram affine method is 7, which is higher among the three methods, and the misrecognition rate is relatively low, which has a better recognition effect, but the time used  by this method is three The most of the methods indicates that the operating speed of the identification method is low. Although the Harris method has shorter operation time than the histogram affine method, the number of recognition and the misrecognition rate are the lowest, and there is a problem of lower recognition accuracy. The method proposed by paper has the highest recognition times and the lowest error recognition rate, indicating that this method has a high recognition accuracy rate of 0.41s, which meets the requirements for computing speed. According to the analysis of the experimental results, it can be proved that the proposed method has a better recognition effect and is short in time.

V. CONCLUSION
Mobile terminals can be used to realize voice feature recognition, image feature recognition and other operations. Most mobile devices operate directly through gesture commands. As the most mainstream technology of mobile terminals, image feature recognition technology mainly uses the high-speed processing capabilities of computers to help people process large amounts of information and recognize various targets.
According to the actual requirements for image feature recognition of mobile devices and the existing problems in existing image feature recognition methods, a method for extracting video image features of mobile terminals based on SURF virtual reality technology is proposed. Through in-depth research on the reality technology of mobile terminals, we selected sample video images that have obvious differences in color, light, angle, etc., and included a variety of mobile devices, and performed video image blur feature extraction tests on two smart phones. First, perform video image grayscale extraction on the input mobile terminal video image, and detect the closed area in the mobile terminal video image as the radiation invariant area of the terminal video image. Secondly, Hessian matrix and SURF description operator are used for matching to obtain the initial matching point pair. Finally, the obtained fuzzy feature one-way matching result of the video image is matched in two directions. Results show that the proposed method has higher computational efficiency and better recognition effect, can meet the actual requirements of image feature recognition, and has practical application advantages.