Directional Coherence-Based Scrolling-Text Detection for Frame Rate Up-Conversion

This article proposes a new scrolling-text detection method that uses directional coherence for frame rate up-conversion (FRUC). Most previous methods use either gradient information or motion vector (MV) distribution of the frame for scrolling-text detection. Edges can be generated by non-text components and the number of MVs to determine the scrolling-text decreases in each row of the frame. Thus, they incorrectly detect the non-text regions as scrolling-text and cannot accurately detect the start or end of text scrolling at the frame boundary. The proposed method overcomes these problems using coherence values of edge directions for each pixel and scrolling-text-aware refinement processes. The key idea of the proposed method is to use the directional coherence of edge directions and use texture patterns analysis-based refinement to improve the accuracy of the scrolling-text detection. For refinement processes, the proposed method extracts texture patterns as bit codes. Then, it computes the diversity of the texture patterns around the detected text edges. In addition, the proposed method extracts the representative value of the MV for the detected region to correct the regions falsely detected as the scrolling-text. With these refinement processes, the proposed method can also accurately detect the start or end of text scrolling at the frame boundary. In the experimental results, the proposed method increased the average F1 score to 0.504 (a 131.25% improvement) compared with previous methods. The average computation time per pixel of the proposed method also decreased to $18.571~\mu \text{s}$ (an 80.80% reduction) compared with previous methods.


I. INTRODUCTION
Frame rate up-conversion (FRUC) is a technique that increases the frame rate of videos by inserting interpolated frames between two consecutive frames [1]- [4]. Interpolated frames are generated using motion vectors (MVs) between two consecutive frames. FRUC has been used for film-tovideo conversion to increase the frame rate of films [5]. It is also used in liquid crystal displays (LCDs) [6] to reduce motion blur and TV standard conversion with different frame rates [7]. Because the fps of the input frame is varies, FRUC is an essential technique for display systems with a predetermined fps.
FRUC consists of two primary steps [1]- [4]: motion estimation (ME) and motion-compensated interpolation (MCI). ME calculates MVs of an object between two consecutive frames and corrects the outliers in a set of MVs. MCI The associate editor coordinating the review of this manuscript and approving it for publication was Zhen Ren . produces a new interpolated frame between two original frames using the calculated MVs. ME is the most important of the FRUC steps because FRUC performance is highly dependent on the accuracy of the MVs calculated by the ME. FRUC can generate interpolated frames with artifacts in the regions where MV is not accurately measured. Many cases exist in which conventional FRUC cannot generate accurate MVs. In these cases, conventional FRUC often fails to extract accurate MVs in the scrolling-text regions. Artifacts in the scrolling-text regions are easier to recognize than artifacts in other regions (Fig. 1). This is because the scrolling-text provides vital information to the viewers. It is also easily recognized by the human eye compared with other regions. Therefore, the scrolling-text detection process is essential for FRUC to perform correction of the MVs in the scrolling-text regions to improve the quality of interpolated frames.
Generally, scrolling-text detection method consists of the following steps. In the first step, it generates a text map. Most previous methods use the edge magnitude of pixels and motion information in consecutive frames to quantify the scrolling-text position in the video contents. Existing edge detection methods [8]- [13] are widely used to extract the edge magnitude of pixels. In the second step, it refines the text map to determine the final scrolling-text regions.
In this article, we propose a directional coherence-based scrolling-text detection method for FRUC. It generates a text map using the calculated directional coherence values for each pixel in the frame. We exploit directional coherence values to estimate the degree of coherency of gradient orientations. Our method generates an initial scrolling-text map by calculating the difference between the current and previous text maps. Then, the proposed method refines the initial scrolling-text map by analyzing the text edge density and Local Directional Texture Pattern (LDTP) [14] of the detected initial scrolling-text map. The mixture of edge direction and texture patterns is used to determine the candidate regions as the scrolling-text regions. Furthermore, we use the MV distribution-based refinement method to correct the regions falsely detected as the scrolling-text regions. The three contributions of this article are as follows: 1) The proposed method uses a directional coherence concept to detect the text edges. The use of directional coherence can distinguish the text edges from highlytextured regions or uniformly-textured (flat) regions. 2) We use a scrolling-text-aware refinement process, which is based on the texture patterns analysis, to improve the accuracy of the scrolling-text detection. Based on the observation that the luminance variation around the text pixels is large, the proposed method can detect the scrolling-text regions using the diversity of the texture patterns. 3) We verified the performance of the proposed method by comparing the interpolated frames generated by the conventional FRUC algorithm [15].
The remainder of this article is organized as follows. Section II shows the brief review of previous methods for text detection. Section III describes the proposed scrolling-text detection method. In Section IV, we compared the scrollingtext detection accuracy of previous methods with that of the proposed method. Furthermore, Section IV presents a quality evaluation of the interpolated frames generated by conventional FRUC. Finally, Section V concludes the paper.

II. RELATED WORK
Text, which are found on natural scenes or video sequences, can be categorized into two types: static-text and scrollingtext. Various static-text detection methods in natural scenes or video sequences have been widely investigated [16]- [20].
To identify text regions, Delaunay triangulation-based text detection method [16] that uses symmetrical features was proposed. It detects the text edges using the fact that text edges have many parallel edges. Then, this method detects the candidate text regions by forming Delaunay triangulation for corners of the edge map. Ring Radius Transform (RRT)-based method [17] detects the multi-oriented text in natural scenes. Histogram Oriented Moment (HOM)-based method [18] extracts connected components and identifies text components using a Support Vector Machine (SVM) classifier. Then, Recurrent Neural Network (RNN)-based classifier is used for recognition of text. Fractals-based text detection method was proposed [19]. It uses fractal properties in the gradient domain and separates text components from non-text components. Fourier-Laplacian transform-based text detection method [20] that includes a verification technique using Hidden Markov Model (HMM) extracts the candidate text region. After extracting the text candidate regions, it verifies the final text regions using the HMM-based classification. The above existing methods are mainly related to static-text detection in natural scenes or video sequences. Unlike these methods, this article focuses on developing the scrolling-text detection method with emphasis on the FRUC application.
Scrolling-text detection is process of detecting the scrolling-text from a video to reduce the artifacts that are frequently caused by scrolling-text in FRUC. Several scrollingtext detection methods have been proposed [21]- [25]. The edge information-based scrolling-text detection method [21] detects scrolling-text depicted on the horizontal or vertical regions of the frame boundary by projecting the edge information obtained from a Sobel edge detector [12] on the horizontal or vertical axis. This method exploits the concept that scrolling-text regions have a high edge density. It detects scrolling-text accurately if the scrolling-text exists entirely in the horizontal or vertical direction on a simple background of the given frame. However, this method incorrectly detects highly-textured regions as scrolling-text because the edges generated by non-text components can be included in the projection results. Furthermore, it cannot accurately detect scrolling-text when it begins to scroll from the frame boundary or when it ends scrolling at the frame boundary because edge density described by the projection for these types of text is low.
The MV distribution-based scrolling-text detection method [22] extracts the MVs in each row of the frame and finds dominant MV directions for each row of the frame to detect scrolling-text. This method is straightforward because it only extracts the MV direction in each row of VOLUME 8, 2020 the frame boundary and can be applied easily to FRUC. However, because the number of MVs available to determine the scrolling-text regions decreases in each row of the frame boundary, it is difficult to extract the MV direction of scrolling-text that appears or disappears at the start and end points in the row region of the frame. Therefore, this method cannot detect scrolling-text that appears or disappears at the start or end points of the frame boundary.
An adaptive temporal differential-based detection method [23] has been proposed to improve the detection accuracy of scrolling-text detection. It calculates the difference between the edge map of the previous and current frames obtained from a Roberts edge detector [13]. Then, the densities of the edge difference map at the four frame boundary regions (top, bottom, right, and left) are calculated to detect the scrolling-text. If the edge density of each frame boundary region is high, the method determines the detected edges as scrolling-text in the frame boundary region. This method provided high performance when the background of the entire horizontal scrolling-text was simple. However, it requires ten previous frames to calculate the edge difference map of the current frame. Moreover, because the edge densities for four frame boundary regions are not high when scrollingtext appears or disappears at the start or end point of the frame boundary, this method cannot detect these types of scrolling-text.
The Histogram Oriented Moments descriptor-based method [24] for scrolling-text detection identifies the direction of the edges and detects scrolling-text. The central concept of this method is that the number of dominant orientations that point towards the centroid of the connected components is larger than the number of dominant orientations that point away from the centroid of the connected components. Furthermore, it uses the Combined Local-Global optical flow method [26] to extract MV information of candidate scrolling-text regions in two consecutive frames. It detects the final scrolling-text regions by comparing the motion direction of the candidate regions with the typical direction in which scrolling-text flows (horizontal or vertical). This method demonstrated improved the performance of the scrollingtext detection compared with previous methods [21]- [23]. However, this method is insufficient for detecting text edges because text edges exist that do not satisfy the new hypothesis used in this method [24]. Therefore, it does not provide robust performance for various types of scrolling-text.
Most recently, a text edge detector method [25] based on the concept of a region-adaptive threshold has been proposed. The central concept is that text edges are more likely to exist in a region with a higher luminance variation. Therefore, this method increases the probability of determining the given pixel as a text edge pixel by setting the threshold value for text edge detection inversely proportional to the luminance variation for a given region. This method detects the text edges in the current frame and the previous frame using a region-adaptive text edge detector. Then, it calculates the difference between the text edge maps of the previous and the current frames and detects the final scrolling-text area using edge density, edge orientation, and MV distribution analysis of the detected text regions. This method successfully detects the scrolling-text that appears or disappears at the start and end points of the frame boundary and the entire horizontal scrolling-text compared with previous methods [21]- [24]. However, the region-adaptive text edge detector can falsely detect highly-textured regions, which have sharp edges, as scrolling-text. Consequently, the accuracy of this method can be further improved.

III. PROPOSED METHOD
The FRUC architecture that uses the proposed scrolling-text detection method consists of four steps ( Fig. 2 (a)): 1) RGB-to-YCbCr conversion: FRUC converts the RGB color space of the input frames to the YCbCr color space. 2) ME and MV correction for the scrolling-text region: The FRUC method [15] extracts the Y image (luminance) and calculates MVs information of the current Y image for each block using the previous and current Y images. Then, FRUC uses the proposed scrollingtext detection method and performs MV correction on the scrolling-text regions obtained from the proposed method to make the initial MVs of the detected region to the MVs of the scrolling-text if they are different. 3) MCI: The MCI generates the interpolated frame using the final MVs in the YCbCr color space. 4) YCbCr-to-RGB conversion: The YCbCr color space of the interpolated frame is converted to the RGB color space. The proposed scrolling-text detection method consists of three steps ( Fig. 2 (b)): 1) Generation of the text edge map 2) Generation of the initial scrolling-text map 3) Refinement of the scrolling-text map. The detailed operations of the proposed method are described in this section.

A. GENERATION OF THE TEXT EDGE MAP
The purpose of scrolling-text detection is to detect the position of text moving horizontally or vertically in the boundary area of the given frame. For a region with text, the spatial variation in luminance of the text regions is larger than in other regions. Furthermore, the text region generally has a dominant orientation of edges. Based on these observations, the text regions can be extracted by analyzing the spatial variation in luminance and the dominant orientation component of the region. However, the method uses a first-order gradient that considers the relationship between only two adjacent pixels, which often detects highly-textured regions as text.
Therefore, we adopt directional coherence to extract regions with a dominant edge orientation and significant spatial change in luminance (Fig. 3). We focus on directional coherence rather than directly using gradient information, which is unreliable for highly-textured regions. The patterns of direction generated from the center and surrounding regions can provide a suitable approximation of the underlying image structure, which is coincides with text edges. The proposed method extracts the text edges using a structure tensor, which efficiently summarizes the dominant orientation and the energy along this direction based on the local gradient field, defined as follows: where T k s (i) denotes the structure tensor matrix of pixel i at the k-th frame, and I k x and I k x denote the gradient in the horizontal and vertical directions at the k-th frame, respectively. B i denotes the neighbor region centered at the i-th pixel position. The block size B i was set to 5 × 5 pixels experimentally.
The effectiveness of the structure tensor defined in (1) for our task stems from the fact that the relative discrepancy between two eigenvalues (λ 1 ≥ λ 2 ≥ 0) of T k s (i) indicates how intensively gradients in the local region are distributed along the dominant direction (the degree to which those directions are consistent). Gradients with text edges are strongly distributed along the dominant direction compared with uniform or highly-textured regions (Fig. 3). Therefore, the proposed method defines directional coherence at each pixel position as follows: (2) The value ξ represents the degree of coherence of the edge directions. Therefore, the larger the ξ value, the higher the directional coherence. If directional coherence for VOLUME 8, 2020 the (i, j)-th pixel is larger than a threshold value T 1 , we determine the (i, j)-th pixel as the text edge, defined as follows: where TM i,j denotes the text edge for the (i, j)-th pixel, and T 1 denotes the predefined threshold value, set to 1500 experimentally to maximize the F 1 score, which is an evaluation metric used to measure the accuracy of the scrolling-text detection. The T 1 value in (3) is important role to generate a robust text edge map. The proposed method uses the concept of directional coherence to generate a more desirable text edge map while suppressing textured regions in the given frame (Fig. 4).

B. GENERATION OF THE INITIAL SCROLLING-TEXT MAP
The proposed method generates an initial scrolling-text map using the previous and current text maps obtained from the previous step. Before the generation of the initial scrollingtext map, because scrolling-text generally exists within the frame boundary, the result of text map detection in the frame boundary regions is considered to detect the scrolling-text, as in previous papers [21]- [25]. A temporal difference exists between the scrolling-text regions of consecutive frames. The proposed method captures such difference by comparing the previous and current text maps. Accordingly, we calculate the text map difference between the previous and current frames. By calculating the text map difference, the proposed method can remove the static edge regions and preserve the regions with only moving or scrolling-text pixels. The result of the text edge difference map for the current and previous frames highlights the gaps between two different text edges in the scrolling-text regions (the gaps represent zero points between two text edges in Fig. 5(a)).
The proposed method generates the connected components of the text edges in the result of the text edge difference map by performing a hole-filling process [25], which fills the edges on the gap between two different edges in the result of text edge difference map. In this process, the hole is the gap between two different edges in the same row on the result of the text edge difference map. If the hole is smaller than a predetermined threshold (T 2 ), we fill the edges with the corresponding gap ( Fig. 5 (b)). The threshold value T 2 was set to 32 pixels, as in [25]. The concept of the hole-filling process is to connect the gaps between two different sets of text so that the proposed method can detect consecutive scrolling-text as one block with emphasis on the use of FRUC.

C. REFINEMENT OF THE SCROLLING-TEXT MAP
After generating the initial scrolling-text map, we need to remove the falsely-detected regions using the proposed refinement process. The refinement process consists of the analyses of the connected text edge size, text edge density and LDTP [14], MV distribution, and text map projection.
For the first refinement, the size of edges containing scrolling-text is larger than that of other regions. The proposed method uses this concept for the first refinement process by removing the scrolling-text region if its length is shorter than a predefined length (T 3 ). The threshold value T 3 was set to 64 empirically by observing the minimum size of scrolling-text region [25].
For the second refinement, the scrolling-text region generally has a higher edge density than other regions. Furthermore, it has both various principal directions and a large luminance difference. Based on these observations, we estimate the text edge density and texture patterns. We use LDTP analysis [14] to efficiently describe the texture patterns of the text edges and the luminance variation in the candidate scrolling-text regions. We compute LDTP by calculating the principal directional numbers of the neighborhood using the Kirsch compass masks [27] in eight different directions: where P 1 dir is the principal directional number, and C i is the absolute value of the convolution of the image I , with the i-th Kirsch compass mask, M i , defined as follows: where '' * '' denotes convolution operation of the image with the Kirsch compass mask (filter). Note that ''|. |'' denotes absolute operation of multiplication between two variables. In [14], the absolute value of the eight Kirsch mask's responses in computed accordingly. The second directional number P 2 dir is computed in the same way by extracting the second maximum response in (4).
For each principal direction, LDTP calculates the luminance difference between the pixels in the principal direction as follows: where D n i,j is the n-th difference for the pixel (i, j) in the n-th principal direction, and p n i,j and q n i,j are the luminance values of the principal direction and opposite principal direction positions among the eight neighborhood pixels with respect to pixel (i, j), respectively. The method for calculating these local differences is equivalent to that of thresholding in Local Binary Pattern (LBP) [28].
In contrast to the binary coding of LBP, LDTP codes the difference using three levels (negative, equal, and positive). If D n i,j is larger than a predefined threshold value, ε, the LDTP method encodes a 2 whereas if D n i,j is smaller thanε, it encodes a 1. If D n i,j is between the -ε and ε values, LDTP encodes a 0. By representing these three levels, LDTP can represent a more distinctive code for the neighborhood. A threshold value ε was set to 15, as in [14].
Next, we compute the number of different LDTPs to consider both principal direction and luminance variation around the text edge pixel. Because LDTP considers Kirsch compass masks [27] in eight different directions and codes luminance difference for two principal directions using three levels, the total number of potentially different LDTPs is 8 × 3 × 3 = 72. The proposed method enlarges the code length by considering the third principal directional and luminance difference for more diverse and potential LDTP representation. The length of the code used in the proposed method is 8 × 3 × 3 × 3 = 216. Then, we combine the text edge density and diversity degrees of the LDTPs to define the candidate region characteristic for use in the second refinement, as follows: where RC i , TED i , and LDTP i denote the i-th candidate region characteristic, the text edge density, and degree of different LDTPs for the candidate region. The proposed method removes the i-th candidate regions if its RC i is smaller than the predefined threshold value T 4 . The threshold value T 4 , was set to 0.08 based on various experimental results.
For the third refinement, the MVs for the scrollingtext region have the same direction as the typical motion of scrolling-text. Based on observation, we can eliminate false detection. The proposed method removes the candidate scrolling-text regions using the results of the MV distribution analysis. The MVs of each candidate region are accumulated in an MV histogram. Then, the proposed method detects the peak value of the MV histogram to represent the MV direction in the candidate region. The proposed method removes the candidate region if the motion direction of peak value in MV histogram for each candidate region is distinctively different from the typical direction of the scrolling-text (horizontal or vertical).
For the fourth refinement, we use a projection of the candidate regions obtained from the third refinement because most of scrolling-text is placed horizontally in the video, the vertically longer candidate regions can be eliminated. The horizontal projection is performed to accumulate all the candidate region pixels in each row to form a histogram of the number of detected pixels. If the number of candidate region pixels among the pixel rows is small, the detected row is removed to refine the detected scrolling-text region. Based on this refinement process, the proposed method can improve the accuracy of the scrolling-text detection further.
Finally, it is reasonable to consider that a scrolling-text area is generally rectangular. Consequently, the proposed method generates each of the remaining candidate areas into the smallest rectangle by linking four points of the remaining candidate regions to contain all candidate regions (Fig. 6).

IV. EXPERIMENTAL RESULTS
We conducted experiments to evaluate the performance of the proposed and the previous scrolling-text detection methods.
First, we visually evaluated the quality of the interpolated frames generated by FRUC using the proposed and the previous scrolling-text detection methods. We focus on developing the scrolling-text method to be used with FRUC. The block size of 8 × 8 pixels (standard for FRUC applications) and the search range size (a range in which the search can be performed around the current block) of 16 pixels are most widely used in the FRUC applications. Therefore, we set the FIGURE 7. Interpolated frames generated using two consecutive frames from JVC sequences generated by: (a) TSTD [21], (b) GSTD [22], (c) HSTD [23], (d) KSTD [24], (e) LSTD [25], and (f) proposed scrolling-text detection method. JVC.
Second, we assessed the accuracy of the proposed and the previous scrolling-text detection methods using Precision (P), Recall (R), and F 1 score (F 1 ) [32]- [35]. These evaluation metrics are defined as follows: where D and GT denote a scrolling-text region detected by each method and the ground truth rectangle region of scrolling-text, respectively. Symbol ∩ represents the intersection of two groups. Num(·) denotes the number of pixels in a group. We evaluated the performance of scrolling-text detection at the rectangle level to ensure its usefulness with with FRUC techniques. F 1 is an evaluation metric that considers P and R. The range of F 1 is from 0 to 1, where 1 is the best score. Third, we measured the computation times of the proposed and previous scrolling-text detection methods. As the performance of previous methods to compare the performance of the proposed method, we used five previous methods: Tsai's scrolling-text detection (TSTD) [21], Gim's scrollingtext detection (GSTD) [22], Hsia's scrolling-text detection (HSTD) [23], Khare's scrolling-text detection (KSTD) [24], and Lee's scrolling-text detection (LSTD) [25]. LSTD is the most state-of-the-art scrolling-text detection method.
For all previous methods, we optimized various parameters and set to the values guided by the corresponding papers [21]- [25]. Various parameters used in the proposed method were also optimized to values based on various experiments. For the test sequences, we used video sequences [25] containing various types of scrolling-text: scrolling-text beginning or ending at the frame boundary, scrolling-text that occupies the entire row region of the frame boundary, blurry scrolling-text, and well-distinguished scrolling-text from the background.
In the first experiment, we visually compared the quality of the interpolated frames generated by conventional FRUC [15] using the previous and proposed methods (Figs 7-8). TSTD [21], GSTD [22], HSTD [23], and KSTD [24] could not detect the scrolling-text starting at the frame boundary. Hence, because these methods cannot correct incorrectlyestimated MVs in the scrolling-text regions, they generate severe artifacts at the scrolling-text region in the interpolated frames during FRUC. For LSTD [25], the accuracy of the scrolling-text detection was higher than those of TSTD [21], GSTD [22], HSTD [23], and KSTD [24]. However, when this method was applied to FRUC, it generated artifacts in the scrolling-text regions (Fig. 7 (e), Fig. 8 (e)). In contrast, the proposed method was able to detect various types of the scrolling-text that occupies the entire row region of the frame boundary (Fig. 7) or that begins to scroll from the frame FIGURE 8. Interpolated frames generated using two consecutive frames from Secret Garden 2 sequences generated by: (a) TSTD [21], (b) GSTD [22], (c) HSTD [23], (d) KSTD [24], (e) LSTD [25], and (f) proposed scrolling-text detection method. Secret Garden SBS.

TABLE 1.
Comparison of the scrolling-text detection accuracy of the proposed and previous methods using precision, recall, and F1 score. boundary (Fig. 8). Thus, FRUC with the proposed method can generate high-quality interpolated frames in the scrolling-text regions (Fig. 7 (f), Fig. 8 (f)).
In the second experiment, we evaluated the accuracy of scrolling-text detection of the previous method and the proposed method using P, R, and F 1 [32]- [35]. We counted the number of pixels in the scrolling-text regions and the number of pixels in ground truths for each video sequence and calculated F 1 . TSTD [21], GSTD [22], and HSTD [23] use edge or MV information of the image for scrollingtext detection but have difficulty detecting the scrolling-text that appears or disappears at the start or end point of the frame boundary. Therefore, these methods [21]- [23] had a lower F 1 than the other methods [24], [25]. KSTD [24] and LSTD [25] were more accurate at detecting scrolling-text and preserving a high F 1 compared with TSTD [21], GSTD [22], and HSTD [23].
The proposed method improved the accuracy of the scrolling-text detection even further when compared with previous methods [24], [25] (Table 1). F 1 of the proposed method was 0.504 (a 131.25% improvement), 0.229 (a 34.78% improvement), 0.210 (a 21.40% improvement), 0.147 (a 19.84% improvement), and 0.032 (a 3.74% improvement) higher than those of TSTD [21], GSTD [22], HSTD [23], KSTD [24], and LSTD [25]. The improvement was calculated by dividing the increment of F 1 (F 1 of the proposed method minus F 1 of the previous method) by the original F 1 for the previous method. This improvement by the proposed method could be attributed to the use of directional coherence values for each pixel in the given image to distinguish text regions from highly-textured regions. Moreover, the proposed method used four types of refinement methods to eliminate non-scrolling-text regions from the initial scrolling-text map. With these refinement processes, the proposed method could accurately detect the start or end of text scrolling at the frame boundary.
In the third experiment, we compared the computation time per pixel (C T [µs]) of the previous and proposed methods. The proposed method reduced the average C T by 18.571 µs (an 80.80% reduction) and 4.873 µs (a 52.48% reduction) compared with the KSTD [24] and LSTD [25], respectively ( Table 2). TSTD [21], GSTD [22], and HSTD [23] are relatively simple methods, so the average C T is fast, but the F 1 score is low when compared with that of the proposed method.

V. CONCLUSION
In this article, we proposed a new scrolling-text detection method that uses directional coherence with a scrolling-textaware refinement technique for FRUC. The central concept of the proposed method is to use the directional coherence of each pixel to distinguish the scrolling-text pixels from textured regions and use refinement processes to detect text appearing and disappearing at the start and end points of the frame boundary.
The proposed method calculates coherence values of edge directions for each pixel to represent the text map. It generates an initial scrolling-text map by calculating the difference between the current and previous text maps. Then, the proposed method refines the initial scrolling-text map by analyzing the texture patterns and MV distribution for the detected text regions. Furthermore, it uses the text map projection to enhance the accuracy of the scrolling-text detection.
The benefits of the proposed method were evaluated in terms of scrolling-text detection accuracy and processing time on the various video sequences.
The experimental results demonstrated that the average F 1 of the proposed method was up to 0.504 (a 131.25% improvement) higher than those of previous methods. The average C T of the proposed method was up to 18.571 µs lower than those of the previous methods (an 80.80% reduction). Furthermore, FRUC using the proposed scrolling-text detection method could generate the highest-quality interpolated frames for the scrolling-text regions compared with previous methods.