Elsevier

Pattern Recognition

Volume 43, Issue 5, May 2010, Pages 1907-1916
Pattern Recognition

Shape detection from line drawings with local neighborhood structure

https://doi.org/10.1016/j.patcog.2009.11.022Get rights and content

Abstract

An object detection method from line drawings is presented. The method adopts the local neighborhood structure as the elementary descriptor, which is formed by grouping several nearest neighbor lines/curves around one reference. With this representation, both the appearance and the geometric structure of the line drawing are well described. The detection algorithm is a hypothesis-test scheme. The top k most similar local structures in the drawing are firstly obtained for each local structure of the model, and the transformation parameters are estimated for each of the k candidates, such as object center, scale and rotation factors. By treating each estimation result as a point in the parameter space, a dense region around the ground truth is then formed provided that there exist a model in the drawing. The mean shift method is used to detect the dense regions, and the significant modes are accepted as the occurrence of object instances.

Introduction

Shape representation and matching have received considerable attention in many areas in the past few decades, and a huge number of approaches have been reported in the literature. One of the earliest attempts in this field is based on image features extracted from the whole image, such as Zernike moment [1], edge directions histogram [2], the curvature scale space (CSS) [3], and the angular radial transform (ART). While the global features have the advantages of being compact and efficient for matching, local information and spatial configuration of image objects are lost. Furthermore, these descriptors usually fail in case of occlusion and cluttered background. Hence, a more desirable shape descriptor should be investigated to capture the structure of the shapes and provide capability for portions of one shape to be compared with another.

Alajlan used a structured representation called curvature tree (CT) to model both shape and topology of image objects [4]. The hierarchy of the CT reflects the inclusion relationships between the image objects, and the similarity between two multi-object images is measured based on the maximum similarity subtree isomorphism between their CTs. Shock graph, firstly proposed by Siddiai et al., has been recognized as a richer descriptor of shape than the boundary itself [5], [6]. This descriptor encodes information about the interior of the shape by pairing shape boundary segments; moreover, it captures the hierarchical relationships between parts of a shape, making it less sensitive to occlusion, articulation and other visual transformations. However, the comparison of two shock graphs is usually time consuming due to the complexity of the Shock Grammar and the NP complexity of graph matching. Bai [7] argued that visually similar skeleton graphs may have completely different topological structures, which brings big challenge to shock graph and skeleton based descriptors. To solve this problem, the author proposed to omit the topological graph structure in comparison by focusing only on the skeleton endpoints.

Psychological studies show that the shape of an object may be well approximated by line segments, and that human recognizes line drawings as quickly and almost as accurately as gray-level pictures [8], [9]. Moreover, a line/curve based shape representation has many advantages: it is compact and perceptually meaningful; it allows one to use middle level cues such as adjacency or parallelism; and it is consistent in detection [10]. These advantages have motivated many line/curve based approaches in computer vision. At a time when lines and curves are the primary objects of study, line/curve representations are popularly used and big success is achieved, especially in the area of technical drawings management [11], [12], [13], [14]. Line/curve representations have also been used in other problems, such as object detection in cluttered environment [15], [16], natural scene classification [17], and so on.

Matching line/curve representation is a challenging problem. A line segment itself is non-distinctive: a line in the model can be matched to any line of the image if affine invariance is allowed. The discriminative power of a line/curve representation lies in the structural information between line/curve segments. This naturally leads to graph matching approaches. In these methods, both the model and the image are represented by attributed graphs, where graph nodes correspond to the primitives, such as extracted lines and curves, while the spatial relationships between the primitives are described by graph edges. Thus, the object detection problem is converted into one of the matching two attributed graphs, which can be solved by an optimization process [18], [19]. Although theoretically reasonable, the graph matching methods are difficult to be applied in real applications due to the high computational complexity.

On one hand, many attempts have been made to improve the matching speed. Huet and Hancock [20] proposed to integrate the geometric attributes of line/curve pairs into a histogram, and measure the similarity of two histograms by the Bhattacharyya distance. Similar to [20], histogram was also adopted in [14] to conglomerate the local structures of the image, and histogram intersection operations were employed for indexing. Different from other methods, the local structure representation was constructed therein under the guidance of Gestalt psychology laws. While efficient in matching, much structural information is lost in the histogram based representations. To solve this problem, a hypothesis-test scheme was used in [13] to realize the graphics recognition. The attributed graph of the model was firstly reduced into a spanning tree and a fixed traversal path was thus determined; next, this tree was used to direct the examination process to find all required components of this model class.

On the other hand, the matching burden is considerably alleviated by the progress of integer quadratic programming. Berg et al. [17] employed the Geometric Blur descriptor of individual edge as the primitive, and realized object detection by building the correspondence of primitives between image pairs. The correspondence was solved by a linear approximation to the integer quadratic programming. Ren [10] relied on relative geometric cues of line segments to detect articulated objects from video data. Pairwise cues lead to again a quadratic problem, which was solved therein using a spectral approximation followed by a greedy discretization. Generally speaking, the primitives in the image and the model do not correspond one-to-one; hence, shape detection is in fact a set-to-set contour matching problem, which potentially requires an exponential searching process. To simplify this task, Zhu et al. [22] utilized the control point to encode the shape descriptor algebraically in a linear form of contour selection variables, which was solved by the linear programming techniques.

Recently, some researchers proposed to construct the local structures of the shape objects by a learned codebook of contour segments, and realize detection under the spatial constraints of the local structures. Inspired by the idea of bag-of-features, Ferrari et al. [15] adopted contour segments to detect and localize objects from cluttered images. Under the criterion of contour segment network, k adjacent segments (kAS) are extracted from the images, and a codebook is built by clustering. With a SVM classifier trained on the bag of kAS features, object instances in a test image are localized using a simple sliding-window verification mechanism. Recently, Shotton et al. [16] and Opelt et al. [21] proposed to learn class-discriminative contour fragments for object detection. The idea is to explicitly construct a codebook of fragments which occur frequently in positive training images of a class, whereas seldom in negative ones. Both works employ boosting to generate a strong detector over a set of weak detectors built on contour fragments.

Fig. 1 illustrates a flowchart of our proposed approach, which consists of four main modules: pre-processing, local neighborhood structure construction, local neighborhood structure matching, and mean shift detection. The pre-processing module deals with the extraction of lines and smooth curves from the row image data. These lines and curves, with the advantages of line/curve representation, are then adopted as primitives to construct local neighborhood structure by grouping several nearest neighbor primitives around one reference. In this paper, all the primitives except the very short ones will be used as the reference such that no information will be lost. The local neighborhood structure, also called as local patch, is described by two kinds of attributes: the appearance of the primitives, and the spatial relationship between neighbor primitives and the reference. The local patch representation is equivalent to the nearest neighbor graph structure, i.e., the combination of a graph node with its directly connected neighbor nodes forms a local patch. With this representation, both the appearance and the structural information of the primitives are encoded, which largely improves its discriminative power.

Object detection is realized by local neighborhood structure matching and the mode detection scheme. For each local patch of the model, we find the top k nearest patches in the drawing image, and estimate the transformation parameters for each of the k candidate patches. If one candidate patch is actually in correspondence with the model patch, the estimation result will be close to the ground truth; otherwise, these values will be irregularly distributed. By treating each estimation result as a point in the parameter space, a dense area around the ground truth is then formed provided that there exist a model object in the drawing; otherwise, no dense area can be found. The denser the points are, the more likely they are near the correct answer. These points are finally accumulated in a circular search window using the mean shift technique [23]. After a verification process, the modes that are above a threshold are taken as the occurrences of object instances.

The rest of this paper is organized as follows: In Section 2, we introduce the concept of the local neighborhood structure and its attributes. Section 3 presents how to compute the distance of two local neighborhood structures, and estimate transformation parameters from them. Based on the estimated parameters, the occurrences of the model in the input image are detected in Section 4. In Section 5, some experiments are given to verify our algorithm.

Section snippets

Local neighborhood structure

Lines and curves are the primary components of line drawings, thus, it is a natural thing to adopt them as the primitives. Due to the non-distinctive characteristic of a single primitive, a local neighborhood structure, also called as local patch, is designed to enhance their discriminative power.

The local neighborhood structure is constructed using the nearest neighbor criterion. Given a primitive as the reference, we find its neighbor primitives whose minimum distances to the reference are

Matching local neighborhood structures

Let PM={SM,(Ti,RLi,RDi,RMDi,RAi)M,i=1u} and PG={SG,(Tj,RLj,RDj,RMDj,RAj)G,j=1v} be local neighborhood structures of the model and the drawing image, respectively. For the reason of simplicity, NiM and NjG are used to denote the neighbor primitive in PM and PG, respectively. We will introduce in this section how to compute the distance of PM and PG, and to estimate the transformation parameters from PM to PG.

Object detection

Detecting the model in the drawing image is implemented by three steps: (1) build local neighborhood structures and extract their attributes; (2) for each local neighborhood structure of the model, find the top k nearest structures in the drawing, at the same time, the transformation parameters are estimated, including the object centroid, scale and rotation angle and (3) with the estimation results, the mean shift mode detection technique is applied to get the occurrences of the model in the

Experiments

In this section, we present some experimental evaluation of the proposed shape detection method. There are two aspects to this study. We commence with a sensitivity study using synthetic data. The aim here is to evaluate how this method performs under controlled structural corruption. The second part of the study evaluates the method on real-life engineering drawings and the ETH-80 database.

Conclusion

In this paper, we present an object detection method to search shapes from line drawings. This method is in fact a hierarchical matching scheme, where the local neighborhood structure acts as the middle level representation. By grouping the elementary lines/curves into local neighborhood structure, both the appearance and the geometric structure of the line drawing are well described. For each local neighborhood structure of the model, its k-nearest neighbors in the drawing image are obtained

About the Author—RUJIE lIU received his B.s., M.s., and Ph.D. degrees in electronic engineering from Beijing Jiaotong University in 1995, 1998, and 2001, respectively. Since then, he worked as a researcher in Fujitsu Research and Development Center Co. Ltd., Beijing, China. His research interests are in the areas of content-based image retrieval, pattern recognition, and image processing.

References (27)

  • F. Attneave

    Some information aspects of visual perception

    Psychological Review

    (1954)
  • X. Ren, Learning and matching line aspects for articulated objects, in: Proceedings of the IEEE Conference CVPR, 2007,...
  • J. Llados et al.

    Symbol recognition by error-tolerant subgraph matching between region adjacency graphs

    IEEE Transactions on PAMI

    (2001)
  • Cited by (16)

    • Development of laser-vision system for three-dimensional circle detection and radius measurement

      2015, Optik
      Citation Excerpt :

      One of the most significant advantages of laser triangulators is their accuracy, and their relative insensitivity to illumination conditions and surface texture effects. During the last decades, many 2D shape detection methods have been widely proposed for circles, ellipses and arbitrary shapes [9]. In which, 2D circle detection methods have been developed by many researchers.

    • Individual strategies in the tasks of graphical retrieval of technical drawings

      2015, Journal of Visual Languages and Computing
      Citation Excerpt :

      The approach proposed in [11] represents drawings as feature vectors of pixel blocks and calculates their similarity by the linear weighted cosine similarity method. In the method described in [12] an elementary descriptor of a drawing image is a so-called local neighborhood structure (for simplicity, also called as patch), which is obtained by grouping lines/arcs around some reference line. Matching is based on identification of dense regions in a transformation parameter space (object center, scale, rotation) using the mean shift method.

    • PLDD: Point-lines distance distribution for detection of arbitrary triangles, regular polygons and circles

      2014, Journal of Visual Communication and Image Representation
      Citation Excerpt :

      The detection of shape occurring often in artificial scenes is of importance in the fields of pattern recognition and computer vision, such as man-made object detection, traffic signs detection, location or tracking of specific objects, and so on. In the past decades, shape detection [1], such as circle detection or polygon detection, has been given increasing importance, and quite a number of algorithms have been proposed. Next, we briefly review some related methods in literature.

    • Intelligent region-based thresholding for color document images with highlighted regions

      2012, Pattern Recognition
      Citation Excerpt :

      The projection profile, connected component, and the proposed methods in Section 3 (foreground region extraction) were fed with the monochrome image to compare the foreground regions separation performances. Fig. 22 shows an example of monochrome document image, which is obtained from the journal of Pattern Recognition [23]. It contains text lines, machine drawings, and graphs.

    View all citing articles on Scopus

    About the Author—RUJIE lIU received his B.s., M.s., and Ph.D. degrees in electronic engineering from Beijing Jiaotong University in 1995, 1998, and 2001, respectively. Since then, he worked as a researcher in Fujitsu Research and Development Center Co. Ltd., Beijing, China. His research interests are in the areas of content-based image retrieval, pattern recognition, and image processing.

    View full text