A generalisable framework for saliency-based line segment detection

Here we present a novel, information-theoretic salient line segment detector. Existing line detectors typically only use the image gradient to search for potential lines. Consequently, many lines are found, particularly in repetitive scenes. In contrast, our approach detects lines that de ﬁ ne regions of signi ﬁ cant divergence between pixel intensity or colour statistics. This results in a novel detector that naturally avoids the repetitive parts of a scene while detecting the strong, discriminative lines present. We furthermore use our approach as a saliency ﬁ lter on existing line detectors to more ef ﬁ ciently detect salient line segments. The approach is highly generalisable, depending only on image statistics rather than image gradient; and this is demonstrated by an extension to depth imagery. Our work is evaluated against a number of other line detectors and a quantitative evaluation demonstrates a signi ﬁ cant improvement over existing line detectors for a range of image transformations.


Introduction
Line segments are an important low-level feature, particularly where man-made structures are present. In many situations they may be used in a similar manner to points, e.g. pose estimation [5], stereo matching [9], or structure from motion [8]. This may often be helped by using the duality between lines and points, resulting in similar registration approaches for the two types of feature [26]. Further, there are tasks especially suited to lines, e.g. vanishing point estimation for camera calibration [10], image resizing [17], or structural graph matching [19].
Existing line detection methods either first use a derivativebased edge detector and detect lines from the edges (e.g. [4] or via the Hough Transform [6]), or they directly group pixels in the image into line segments based on the magnitude and direction of their derivative [49,14]. However, these all act locally on the image, detecting a large number of lines, particularly in repetitive scenes. This limitation is illustrated 1 in Fig. 1: state of the art line detection detects all lines regardless of their significance, whereas, ideally, the non-repetitive lines denoting the geometrically significant edges would be preferentially detected.
To address this, we propose to detect only the salient line segments, an area that, to the best of the authors' knowledge, has not been addressed in the literature. Instead, saliency detection commonly refers to the computation of a saliency map (e.g. [31]), with some work addressing salient edge detection [28] and salient point detection [32]. In detecting only the salient line segments, we propose an approach that is fundamentally different from existing methods for line segment detection in that it is not derivative-based: instead, it seeks informational contrast between regions and thereby favours non-repetitive edges. The information is expressed in terms of distributions of pixel intensities taken from rectangles of a variable width, meaning our approach operates over a larger scale than other detectors and so naturally avoids repetitive parts of a scene.
We measure the contrast between the two distributions on either side of the line using the information-theoretic Jensen-Shannon Divergence (JSD). This measure has been used elsewhere for edge detection [39], unlabeled point-set registration [50], and DNA segmentation [25]. It has many interpretations, e.g. it may be expressed in terms of other information-theoretic quantities such as the Kullback-Liebler Divergence and Mutual Information, having further interpretations in both statistical physics and mathematical statistics [25], and is the square of a metric.
Our measure of line saliency may further be used as a saliency filter on existing line detectors. This allows it to cull the nonsalient line segments computed by other detectors and localise the position of salient lines under our saliency measure. It furthermore increases the speed of salient line detection by orders of magnitude over the naive approach of determining the saliency measure of every possible line segment on the image.
This distribution-based approach to line detection we propose is highly generalisable, being applicable to any situation where informational contrast can be found. As such, we implement an extension for line detection in depth images, whereby lines that jointly delineate changes in surface orientation or texture are detected. These are reprojected, allowing for 3D salient line detection and hence potential multi-modal applications.
The contributions of this paper are as follows: firstly, a distribution-based salient line segment detector is formulated and implemented: the first known method for salient line segment detection. Secondly, the notion of saliency-based filtering is applied to existing line detectors for efficient salient line detection. Thirdly, an extension to depth imagery is implemented, allowing for the detection of salient lines in 3D structures. An evaluation shows that, when considering that we detect only a small number of lines, our approaches significantly outperform the others in terms of repeatability and homography estimation. It demonstrates that they are representative of the underlying aspects of the scene, with potential use for problems that benefit from fewer, but more reliable, features e.g. [20].
The structure of the paper is as follows: in Section 2, we review related work in line detection, edge detection, and line detection in depth imagery. In Section 3 the methodology is described for both salient line detection and saliency filtering, with the extension to depth imagery (and subsequently 3D by reprojection) described in Section 4. In Section 5 a range of qualitative and quantitative results are given, and in Section 6 our conclusions and ideas for future work are presented.

Related work
Since we are unaware of any research into salient line detection (or any line detection method that does not act locally on the derivative of the image) we firstly review line segment detection, before reviewing relevant edge detection methods. Finally, line detection in other modalities (depth images, 3D data) is reviewed.

Line detection
Most early methods of line detection relied upon the Hough Transform (HT) [6] to determine a set of lines from a set of edges (typically extracted from the image by the Canny edge detector [15]). The HT exhaustively searches the space of all possible infinitely long lines, determining how many edge pixels are consistent with each line; lines with a suitably large number of edge pixels lying on them are returned as the output of the algorithm. In its naive form there are many drawbacks, for example it only depends on the magnitude of the gradient and not the orientation, and leaves a problem of how to accurately determine the endpoints of the lines. However, there are many variants of the Hough Transform [30] that seek to solve some of these problems.
Regardless of the approach to line detection, early methods particularly suffered from the problem of setting meaningful thresholds. This was addressed by the Progressive Probabilistic Hough Transform (PPHT) [41] by Matas et al. where it is achieved in a probabilistic manner: the threshold is expressed in terms of the probability of the line occurring by chance. The idea was extended by Desolneux et al. [21] who exhaustively search every line segment on the image and define an a contrario model to control the number of false detections. The latter part is a straightforward extension: if there are N possible line segments on an image and p is the probability of that line segment occurring by chance, then accepting the line if p o ϵ=N guarantees, on average, ϵ false detections per image.
However, Grompone von Gioi et al. [48] note that this model, in its current form, is too simple. Given an array of line segments, the model tends to interpret it as one long line, leading to unsatisfactory results. This is not a fault of the a contrario model, but rather that it is applied to each line individually. If instead it is applied to groups of lines at a time it will segment a line into its components in the correct manner, known as 'multi-segment analysis'. However, this adds another layer of complexity, becoming OðN 5 Þ for an N Â N image.
Grompone von Gioi et al. subsequently implemented a lineartime Line Segment Detector (LSD) [49]. It is based on both the a contrario model and an earlier line detection algorithm by Burns et al. [14]. It is a spatially based approach, starting from small line segments and growing them. Furthermore, each segment has its own line support region, constructed by grouping nearby pixels that have a similar gradient, thus detecting lines of variable width. The a contrario model has also been implemented in the EDLines detector by Akinlar and Topal [4]. The approach performs similarly to LSD but up to ten times faster due to its very fast edge detection algorithm that simultaneously detects edges and groups them into connected chains of pixels. Less processing time is required for subsequent line detection, resulting in a real-time line segment detector.
All line detection methods reviewed above are unable to detect lines based on their significance or surroundings. Consequently, they tend to return a large number of lines which does not capture the general structure of the scene.

Edge detection in images
Similarly to approaches to line detection, many approaches to edge detection act locally on the image. One of the earliest algorithms, the Canny edge detector [15], convolves the image with a Gaussian filter before computing the magnitude of the gradient at each pixel. Variants have been proposed in particular for the convolution stage; notably Liu and Feng [38] use an anisotropic Gaussian filter that only operates perpendicularly to an edge. It is combined with a multi-pixel search to detect longer edges than other approaches, culminating in the detection of short edge-line segments. Their results indicate superior performance compared to existing edge detectors in the presence of different levels of Gaussian noise. However, both approaches are fundamentally derivative based, acting locally on the image regardless of the structure of the scene. In contrast there exist some non-local edge detection methods. For example, Holtzman-Gazit et al. [28] determine salient edges by combining an edge preserving filter with a regional saliency measure in a multi-scale manner. It complements existing approaches to salient feature detection, e.g. [32,46] where salient point detection is formulated in an information theoretic manner. Other approaches to edge detection are more similar to ours in that they are distribution-based, in fact, the JSD has already been used for edge detection [39] via a sliding window approach. For each pixel and each orientation, the JSD is evaluated between a distribution from one side of the pixel and the other. However, the method is not scaleinvariant, with the sliding window being a fixed size throughout the algorithm. Further parameters have not been discussed e.g. how to determine a probability distribution from a sliding window, and the algorithm is only tested on one image.
An advantage of distribution-based edge detection is its natural extension to colour images, as Ruzon and Tomasi [45] do. They use the Earth Mover's Distance (EMD) between distributions which formulates distance as a transportation problem; it represents the minimum amount of work to 'transport' one distribution into the other. Their method obtains good qualitative results but is very time consuming (about 10 min per image) and furthermore is not scale-invariant. The distribution-based approaches to edge detection reviewed here have proved promising, however they have never been used for the detection of salient straight lines.
Edge detection is a very important low-level operation with numerous applications. For example, the HT [6] may be used on an edge map to detect line segments, however this has its own limitations e.g. it does not take into account the orientation of the edges detected. Edges may be locally chained into line segments directly [23] however the approach detects many false positives since a meaningful threshold has not been set. Computing a polygonal approximation to a contour (i.e. a connected set of edge pixels) will determine a representative set of lines for the curve. A range of algorithms have been proposed [3,11,43], for example Parvez [43] relaxes a condition that vertices of the approximating polygon need to lie on the contour, and Bhowmick and Bhattacharya [11] relax the definition of a digitally straight line: both of these allow for a polygon approximation formed by fewer, more meaningful line segments. Bhowmick and Bhattacharya [11] furthermore propose a very quick algorithm relying only on primitive integer computations; efficient algorithms have also been proposed by Aguilera-Aguilera et al. [3] who use a concavity tree to more quickly determine the vertices of the polygon approximation. However, polygon approximation algorithms inherently approximate curves by a set of lines, in contrast to our approach to detecting straight lines that avoids curved segments completely.

Line detection in other modalities
In the 3D or depth imagery domain, there has been much research on edge detection (e.g. [44]) with relatively little on straight line detection.
Some research focuses on detecting lines in textureless 3D data. For example, Stamos and Allen [47] detect planes in 3D and determine lines as the intersection of these planes. Lin et al. [37] convert a 3D model into a set of shaded images using a nonphotorealistic rendering technique that provides a strong perception of edges. 2D lines are detected on the shaded images using the LSD algorithm [49] and reprojected to 3D where a line support region is constructed. However, both approaches detect lines based purely upon the geometry of the 3D scene. With respect to textured 3D data, Chen and Wang [18] detect lines in 3D point clouds that have been reconstructed by Structure-from-Motion (SfM) by detecting and reprojecting lines in the images used to generate the point cloud. It is difficult to apply this approach to general 3D point clouds without manually creating a set of camera locations with which to detect 2D lines from. Buch et al. [13] extend the RGB-D edge detection method presented in [44] by approximating each edge by a line segment. However, the length of the line segment is a parameter of the algorithm, meaning all detected line segments have the same length (in the image space).
The key novelty of our contribution lies in the distributionbased approach to line detection that is proposed, resulting in a method that naturally avoids the repetitive parts of a scene and returns only the salient line segments present. This generalisable approach allows for natural extensions into other modalities, which is demonstrated via a depth extension. We are not aware of any other methods that explicitly detect straight lines in depth images in such a way that jointly delineates changes in surface orientation or texture. Additionally, saliency filtering is proposed to cull non-salient line segments obtained from other approaches, thereby achieving significantly faster processing times.

Methodology
The methodology can be broadly split into two stages. The first stage searches all possible lines on the image, calculating the saliency value (S val ) of each line, and accepting it according to a certain set of conditions. To do so requires the estimation of the Jensen-Shannon Divergence (JSD) between two sets of data, as outlined in Sections 3.1 and 3.2 details the first stage: how the JSD relates to the saliency value of a line and how to compute a putative set of lines from this. In the second stage the most representative set of lines is determined from this putative set, as outlined in Section 3.3. This is achieved using affinity propogation [24], a fast clustering algorithm that works particularly well for larger numbers of clusters. Hence the resulting algorithm returns a representative set of lines for the scene. A flowchart of the algorithm, along with intermediate results from each section, are shown 2 in Fig. 2.
The above algorithm has a high complexity (for an N Â N image it has OðN 5 Þ complexity) because it performs an exhaustive search over all lines at all scales in an image. Therefore in Section 3.4, we propose an alternative approach: a saliency filter on top of existing line detectors that filters out non-salient lines. This allows it to detect only the salient lines within seconds: orders of magnitude faster than the above approach.

Estimating the Jensen-Shannon divergence
Let P and Q be discrete probability distributions, taking one of K values (i.e. P ¼ fp 1 ; …; p K g; p i Z 0 8 i; The entropy of a probability distribution is defined as The JSD between two probability distributions P and Q is subsequently defined as It is closely related to other information-theoretic quantities such as the Mutual Information (MI) or the Kullback-Liebler Divergence (KLD) [25] and shares similar properties and interpretations.
Indeed, for discrete random variables M and Z the MI is defined as where Prðm; zÞ denotes the joint probability of the event ðM ¼ m; Z ¼ zÞ and Pr(m), Pr(z) are the marginal probabilities of the events ðM ¼ mÞ and ðZ ¼ zÞ respectively. Then, if M is the random variable associated with the distribution P þ Q =2 and Z is a binary indicator variable denoting which of P or Q a sample of M was generated from, one sees that, by a little algebraic manipulation [25], JSDðP; Q Þ ¼ MIðM; ZÞ. The MI has an information theoretic interpretation: it represents the average number of extra nats (bits taken to base e) that need to be transmitted to encode the product distribution PrðM; ZÞ from a code using only the marginal distributions. Subsequently, the relationship between the JSD and MI shows that JSDðP; Q Þ is bounded between 0 (when P and Q are the same) and ln 2 (when P and Q are completely different). Grosse et al. [25] give further statistical interpretations of JSDðP; Q Þ and Endres and Schindelin [22] show it is the square of the metric.
In reality, one is never able to exactly know the distributions P and Q, and instead it must be estimated from samples of data. Assume there are N samples of data from both P and Q with counts represented by n ¼ fn 1 ; …; n K g and m ¼ fm 1 ; …; m K g; respectively, hence Then JSDðP; Q Þ may be estimated by calculating the observed JSD: JSD obs ðn; mÞ≔JSD n=N; m=N À Á . However, this is only an estimate, and it suffers from two important limitations.
Firstly, there is a systematic bias in this naive approach (see [25]), with JSD obs expected to be higher than the JSD of the true, underlying distribution of pixel intensities (JSD true ). This is particularly evident when P and Q are uniform distributions: JSD true is zero but measurements n and m will most likely cause JSD obs to be non-zero. The bias is particularly large when N is small, and tends to zero as the sample size becomes arbitrarily large. Furthermore, when N is small, there is a high probability that a given value of JSD obs could have occurred by chance, and this needs to be reflected in any estimate of JSD true .
Both of these problems may be solved by computing a Bayesian estimate of JSD true . This directly avoids the problem of systematic bias, and naturally accounts for smaller sample sizes N by assuming and integrating over a symmetric Dirichlet prior. The Dirichlet prior defines a prior probability for distributions P (respectively Q) by a parameter α as It is a general prior since it is parameterised by α: α¼1 corresponds to a uniform prior, α¼0.5 to Jeffrey's prior, etc. Informally, the magnitude of α corresponds to the size of the prior, with larger values of α representing a large prior belief that P (resp. Q) is evenly distributed. With this prior, the Bayesian estimate for JSD true is defined as follows: where the integral is taken over the space of all probability We employ the results of Hutter [29] who calculates a Bayesian estimate of the MI between two random variables M and Z given a finite set of samples, and then modify his solution for the JSD. The result for the MI is as follows: firstly, denote s 0 m;z as the number of samples taking the joint value ðm; zÞ and let s m;z ¼ s 0

Then Hutter computes the Bayesian estimate as
where ψ is the digamma function defined as ψðxÞ ¼ Γ 0 ðxÞ=ΓðxÞ. As previously stated, JSDðP; Q Þ may be reformulated in terms of IðM; ZÞ, where M represents the mixture distribution P þ Q =2 and Z is a binary indicator variable denoting which of P or Q a sample of M was generated from. Using these substitutions, Hutter's result may be re-written to compute a Bayesian estimate of JSDðP; Q Þ given a finite set of samples ðn; mÞ as follows: where we define zðxÞ≔ P K i ¼ 1 ðx i þ αÞψðx i þ αþ1Þ and α is a K-vector of all α's. Note that Eq. (7) applies directly to the data n and m and hence may be computed with similar efficiency to the naive JSD obs computation. Hence, small sample problems and systematic bias may be naturally avoided as easily as directly computing the observed JSD.

Computing a putative set of lines
The saliency of a line segment can be related to the previous section in the following way. Suppose a line segment L has start and end points at ðx 1 ; y 1 Þ and ðx 2 ; y 2 Þ with length J L J . Let T be a rectangle adjoining L lying on one side of L and B a rectangle on the other side. We define the scale, s, of the line to be the width of each rectangle, where s is allowed to vary, taking any value up to J L J (See the left of Fig. 3 for an illustration). Subsequently, n represents a histogram of pixel intensities from T and respectively m from B. Denote the estimated JSD between the two regions as JðL; sÞ≔JSD est ðn; mÞ, calculated according to the previous section. The left of Fig. 4 shows two typical lines, their rectangles and their histograms. Particularly evident in this image is how similar distributions either side of a line in a repetitive structure (e.g. brickwork) are, and hence how these are naturally avoided by our approach.
A significant problem with simply using the estimated JSD of regions taken from either side of the line as a saliency measure is its poor localisation. For example, let L be a line whose JSDðL; sÞ value is particularly high. Then any line L 0 & L also has a high estimated JSD value, making it very difficult to determine the endpoints of L.
We observe however that beyond the endpoints of a line JSD est should be very low (since, by definition, there is no line there). Thus, let L L and L R denote line segments taken from beyond left and right endpoints of L, with adjoining rectangles T L , B L , T R and B R , respectively (see the right of Fig. 3). Motivated by the desire to keep JðL; sÞ high and JðL L ; sÞ and JðL R ; sÞ low, we use the following measure of line saliency: where β is empirically determined to be 0.25. The right of Fig. 4 shows an example of a localised line segment, where the distributions on either side of the line beyond its endpoints are similar, but are very dissimilar on either side of the segment itself. A formulation of line saliency has now been defined, allowing for the detection of salient lines under this saliency measure. In a similar manner to [21], our measure of line saliency is evaluated on all lines of the image by first evaluating it on all horizontal lines of the image with pixel level precision (i.e. the start and end points of the line are integer valued with the same y-coordinates). The process is repeated on r evenly spaced rotations of the image. However, computing all lines on an image whose saliency value is higher than a given threshold does not, on its own, give meaningful results. Fig. 2(b) demonstrates this: over 6 million segments are returned across multiple scales-far too many to be of practical value. Three further principles are employed in order to decide whether to accept a given line segment: Local Maxima: The line has to be more salient than its immediate neighbours. The neighbours are in five dimensions corresponding to the scale s and the coordinates of the two endpoints of line L.
Maximally Salient: Maximal saliency is defined as in [21]: a line segment is maximally salient if it does not contain a strictly more salient segment and it is not contained in a more salient segment.
Scale Selection: Many feature detectors (e.g. [40,32]) search for features that have a high response value at a particular scale. However, we observe for our case that lines that are salient across a range of scales are more desirable. Consider a line segment along a jagged edge (e.g. Fig. 3). It may have a high saliency value Salðl; sÞ, particularly so if s is large, however for small s the line is not salient due to the jagged nature of the line. Kadir and Brady [32] note this is an appropriate approach to scale selection for edges since there is no associated scale in their tangential direction. Hence, we introduce a lower threshold J min and only accept lines of (x ,y )  Thus, the algorithm proceeds by finding all lines ðL; sÞ on an image with SalðL; sÞ 4S thresh and satisfy the three criteria given above.

Determining a representative set of lines
From the algorithm in the previous section many overlapping line segments remain (See Fig. 2(c)). We wish to cluster them and determine the most representative set of lines. For this, we employ the affinity propogation algorithm [24] for two main reasons: Firstly, it does not require the number of clusters to be specified beforehand. Secondly, it has been shown to be particularly effective for situations where many clusters ( 4 50) are required classical approaches such as k-means or Expectation-Maximisation (EM) clustering require an unfeasibly large number of restarts to obtain similar results [24].
For a given set of N lines L, affinity propogation finds a subset R & L that is representative of L. Each line L A L is mapped to its representative line by f, i.e. f ðLÞ A R. Let d be a given distance measure between two line segments. Then the objective of affinity propogation is to find the mapping f that minimises the following: from which R may be immediately deduced. However, Eq. (9) may trivially be solved by setting f equal to the identity map and R ¼L, since then each of the summands is equal to zero. This is solved by where c is a parameter of the algorithm. Eq. (9) is efficiently approximated by the max-sum algorithm (see [24] for more details). It remains to define a distance measure d between two line segments L i and L j . d needs to address the subsetting issue correctly: if L i is close to a subset of L j , d should be small to reflect the large likelihood of L i occurring if L j does. Conversely, if L j is close to a subset of L i , d should be large. We use a variant of the parameter-free distance measure presented in [33]: denote the endpoints of L i as x 1 and x 2 and denote the closest points on the line segment L j to these points as y 1 and y 2 respectively. Then define the distance from L j to L i as Eq. (11) is thus the integral of the squared distances between corresponding points on the two lines defined by (x 1 ; x 2 ) and (y 1 ; y 2 ).

Saliency filtering
The algorithm outlined in the previous sections exhaustively searches across all lines in an image and widths of rectangle: for an N Â N image it has OðN 5 Þ complexity. Thus, an alternative approach is proposed which consists of using existing line segments determined by a fast line segment detector (e.g. [41,49]) and returning only those that are salient under our definition of line saliency. We furthermore localise the position of detected line segments under our formulation of saliency. Thus, our saliency filtering algorithm can be summarised as follows: Inputs: Set L of line segments, parameters S thresh , J min , s min . Outputs: Set L 0 of line segments. For each line segment L i A L: 1. Determine s A f1; …; J L i J g that maximises SalðL i ; sÞ. 2. If SalðL i ; sÞ 4 S thresh 4 ððJðL i ; tÞ 4 J min Þ8 t A ½s min ; sÞ, add L i to L 0 and continue. Otherwise, go to the next line segment in L. 3. Perform hill-climbing on L i to localise its location and width.
The hill-climbing method ensures all detected lines are local maxima, and five parameters of the line are altered to test for an increase in SalðL i ; sÞ. There are two for each endpoint of the line, which are altered parallel to and perpendicular to the direction of the line segment; the other parameter is s. Each parameter is altered separately and it proceeds iteratively until a more salient position can no longer be found.

Depth imagery extension
The line detection algorithm described in the preceding section is not derivative-based; instead seeking informational contrast between regions from either side of a line segment. It results in a highly generalisable approach that may be applied to any situation where informational contrast can be found. Hence, it may potentially find applications in other modalities (e.g. colour or infra-red imagery). Here, we implement an algorithm for line detection in textured depth images, seeking lines that jointly delineate changes in surface orientation or texture in the same natural framework. Alternatively, if there is no texture associated with the depth imagery (as is the case for many 3D scanners), the proposed approach may detect lines that simply delineate changes in surface orientation.
For our implementation, we detect lines in a 3D structure that has been generated by multiple 'Light Detection And Ranging' (LiDAR) scans. In our case, these are coloured depth scanners that obtain the depth by measuring the time delay of a signal as it is transmitted and reflected off a 3D structure. The left of Fig. 5 shows an example of a 3D structure obtained by a LiDAR scanner. It is clear that multiple LiDAR scanners are required to recover the structure of the scene since only points that are visible by the scanner are obtained.
It is initially tempting to detect lines directly from the LiDAR data itself. Since this is a spherical scanner, data is implicitly stored in a spherical image, similarly to the rendering in the middle of Fig. 5. However, it is evident that lines are not straight in spherical images, causing great implementation issues for our approach. Instead, the data is reprojected into a cubic image (right of Fig. 5) for each LiDAR scanner, with the centre of each cube at the same location as each LiDAR scanner. There is still some distortion of the lines at the edges of the cube, and to be robust to this, the cubic projection is modified slightly so each face has a field of view of 1051, providing some overlap between faces.
The implementation for LiDAR scans proceeds as follows: for each LiDAR scanner and for each face on its cubic projection, lines are detected based on both the projected texture and surface orientation. Any line that goes off the edge of the face is extended onto its neighbouring face. Subsequently, these lines are reprojected back to the 3D structure, using the approach proposed by Buch et al. [13]. Finally, 3D line segments are combined from multiple LiDAR scans using a similar affinity propogation approach as outlined in Section 3.3.

Line detection in textured depth imagery
For each face of the cube the algorithm proceeds in the same way as in Sections 3.1-3.3 except for the representation of the distributions m and n. Since our aim is to detect lines that jointly delineate changes in texture or surface orientation (or just surface orientation, if there is no texture data available), they need to represent both the direction of the normals and optionally the intensity of the projected depth image. The normals are estimated from the depth data by a least squares plane-fitting approach from a small neighbourhood about each point.
In constructing m and n, b i and b n bins are used to represent the intensity and direction of the normals respectively, with an extra bin when there is no data present, resulting in b i b n þ 1 dimensional histograms. The b i intensity bins are the same as in the 2D implementation, while the normals are binned uniformly across the surface of the sphere. The latter is a challenging problem for general b n , so it is restricted to determining which vertex of a given Platonic solid it is closest to. We use the regular icosahedron (b n ¼ 12), however b n ¼ 8 and b n ¼20 also gave good initial results. In the case where there is no intensity data present, lines are detected based purely on the direction of the normals, resulting in a b n þ 1 dimensional histogram.
From these constructions, lines are detected in the same way as for the 2D implementation. However, if a resulting line may be extended by a small amount such that it is partly off the image, it is considered as being part of two faces. In this case, its endpoint is extended along the neighbouring face and its saliency value is computed here in pixel intervals. The new (cubic) position of the line is deemed to be where this attains its maximum value. Note that the area either side of the line is still well-defined in this case (as the union of areas on each face) meaning its saliency and its reprojection (in the next section) operate in the same manner.

Line reprojection
Line reprojection is required in order to convert lines detected in the previous subsection into 3D line segments. This is not as trivial as simply reprojecting the endpoints back since it may cause large errors when the endpoints are slightly misaligned and fall on different planes, or fail completely when there is no depth data available at one point. Therefore, we use the approach proposed by Buch et al. [13] which, for completeness, is briefly outlined here. They propose to reproject lines according to the type of line it iswhether the line is caused solely by a change in image intensity, a change in orientation of the normals, or a change in depth. Fig. 6 shows these three cases. Each case relies on locally approximating two planes (P 1 ; P 2 ) from rectangular regions either side of the line, or a plane P all from a rectangular region surrounding the line, each by a RANSAC approach to plane estimation. The back-projected plane of the 2D line needs to be considered here, which will be denoted by P L .
If the distance between the centroids of the points in P 1 and P 2 is large, it is likely that the line is caused by a depth discontinuityin this case, the reprojected line is the intersection of P L and the closest plane to the camera between P 1 and P 2 . If the angle between the normals of P 1 and P 2 is larger than a given threshold, then the line is due to an orientation discontinuity. Here, P L is intersected with both P 1 and P 2 and the mean is selected as the reprojected line. Alternatively if the angle is sufficiently small the line is due to a change in image intensitiesthe reprojected line is thus the intersection of P L and P all .

Line clustering in 3D
In this stage, reprojected lines from multiple LiDAR scans are combined and clustered. This may be done using affinity propogation as previously definednote that the distance between line segments (10) is well-defined in any dimension. However, from multiple LiDAR scans, some reprojected lines are more accurately located than others (due to the relative positions between the lines and the scanners). Hence, the distance dðL i ; L j Þ (10) is redefined asdðL i ; L j Þ ¼ dðL i ; L j Þ=AðL j Þ, where AðL j Þ denotes the accuracy of line L j , in order to favour more accurate line segments.
To compute the accuracy, first define the vector from the camera centre to the midpoint of L j as v. Let n denote the normal to the plane that L j is on (If L j lies on the intersection of two planes, compute the accuracy with respect to each plane and take the average). Denote the angle between v and n as θ and denote the field of view per pixel as ϕ. Then the 3D distance subtended by one pixel is given by Subsequently the accuracy is defined as A ¼ 1=d 2 P , measuring how many (square) pixels subtend a square metre from the image.

Experiments
In this section we evaluate the performance of our proposed approaches against other line detectors. We compare against the Progressive Probabilistic Hough Transform (PPHT) [41] a classical method for line detection, and the state-of-the-art LSD algorithm  [49] by Grompone von Gioi et al. Three variants of our approach are used: the full saliency detector Sal; a pure filtering approach applied to LSD lines referred to as LSDF; and a filtering approach with subsequent localisation using our saliency measure, referred to as LSDF-Loc. We start by giving implementation details of our approaches in Section 5.1 and describe the evaluation measures used (repeatability and registration accuracy) in Section 5.2. Subsequently results are presented, in Section 5.3 for 2D line detection and in Section 5.4 for 3D line detection.

Implementation details
It was stated in the methodology section that the algorithm Sal goes through each possible line segment ðL; sÞ, determining its saliency value and accepting if it is above a given threshold and satisfies a number of other conditions. Affinity propogation is subsequently used to determine the most representative set of lines.
In the first instance, all line segments are considered by evaluating the saliency measure across all horizontal lines, then repeating this process r times on evenly spaced rotations of the image, where we take r ¼ 45. To do so requires JSD est ðn; mÞ to be determined from a set of pixels. Here the pixel intensities are bilinearly interpolated into 16 bins. The line segments beyond the end of the line (L L and L R ) are of a fixed length of 6 pixels. We use S thresh ¼ 0:3, J min ¼ 0:15 and s min ¼ 2. In the affinity propogation stage, we have found the parameter dðL i ; L i Þ ¼ 700 to be effective.
Since it is a particularly slow algorithm (OðN 5 Þ for an N Â N image) the image is initially downsampled to a width of 200 pixels, and detected lines are subsequently refined using the algorithm outlined in Section 3.4 at its true size (in a coarse-to-fine approach).

Evaluation measures
In this subsection the terms repeatability and registration accuracy are defined. They are both measures that are defined between sets of line segments detected on a pair of images under a known homography.

Repeatability
For a pair of images with known homography relating them, the repeatability for a set of line segments detected on each image by a given detector is computed as follows: first, the known homography is applied to one of the sets of lines. Define the distance between two lines as the minimum Euclidean distance between the lines' endpoints. Then, for each line projected under the known homography, its nearest neighbour (NN) is computed in the other set. If the distance between the two lines is less than a given threshold, this is deemed a correspondence. Then the repeatability is the number of correspondences divided by the minimum of the number of lines in each set.
However, Hauagge and Snavely [27] note that this measure is biased towards detectors that produce a lot of features, and a measure that is invariant to the number of lines detected is proposed. We proceed as follows: for each set of lines on an image, order them in decreasing value of saliency. For LSD, the lines are ordered in decreasing order of another response valuethe probability of detection in random noise. For PPHT the lines are simply ordered by length. Then, for given natural numbers k up to a specified limit (we take 150 here), take the first k lines in each set. The repeatability of these k lines is subsequently calculated, and a graph can be plotted of repeatability against the k most responsive lines in each set (here, the repeatability is determined within a distance threshold (t) of 5, 10, 15, and 20 pixels).

Registration accuracy
Here, a pair of images are registered by computing the homography between them. The registration accuracy gives an indication of the similarity between this and the ground truth homography. Again, we perform this in a way that is invariant to the number of lines detected, plotting the proportion of homographies recovered within a threshold against the most responsive k lines. To compute a homography, we implement the MSLD [51] descriptor for line segments, allowing us to determine putative correspondences between line segments in different images by the similarity of their descriptor. The homography is subsequently recovered using the Direct Linear Transform (DLT) with small sets of corresponding endpoints, and using RANSAC to determine the homography with the largest number of inliers.
The homography could have instead been calculated using line correspondences, where a line is defined to be infinitely long and information about its endpoints is discarded [52]. However, we observed the results were poorer for this approach than pointbased homography estimation with line endpointsthis could be for two reasons. Firstly, infinitely long lines discard valuable information and are redundant in cases where a continuous line is segmented in many places (as is often the case in urban scenes). Secondly, we have good reason to assume the endpoints of the lines are matched up reasonably accurately since the MSLD descriptor has already matched line segmentsthis would not be the case if the endpoints were not sufficiently aligned.
To determine the registration accuracy we aim to give a measure of how accurate the recovered homography is against the ground truth homography. To do so, one might decompose the homography into rotation and translation parameters and compare their errors, however this can only be done if the intrinsic parameters are known (which they are not). We therefore resort to other measures. Our measure of goodness-of-homographyestimation is as follows: take a pixel on the first image and apply the known homography and estimated homography to it and find the squared distance between these two projected points. Take the average of this over all pixels in the image. Then, do the same, but in the other direction (i.e. with the inverse homography), and square root the final result (to give an RMS error). Thus our measure is an approximation to the following: where G and H are homography transformations and X and Y are the number of rows and columns respectively in the pair of images. We are unable to find a closed form solution to Eq. (13) (note that Hðx; yÞ, Gðx; yÞ are non-linear since computations are done via projective space), hence we resort to the approximation outlined in the above paragraph. Finally, so as to be robust to outlying homography estimates, we determine the proportion of homography estimates such that ðdðG; HÞ otÞ where t is equal to 5, 10, 15, and 20 pixels.

2D line detection results
In this section, both qualitative and quantitative results are presented across a range of imagery, with qualitative results presented in Section 5.3.1. For a quantitative evaluation, the performance of the line detectors is tested on a set of images of building facades from [16] (Section 5.3.2); their robustness to Gaussian noise from the same set of images (Section 5.3.3), and their robustness to a range of image transformations from the dataset presented in [53] (Section 5.3.4). Finally the performance of existing line detectors at different scales is tested (Section 5.3.5).
The repeatability and registration accuracy is determined between pairs of images under their known homography (which has been calculated manually for the building facade dataset [16], but is known from the dataset presented in [53]).

Qualitative results
Qualitative results for 2D line detection are shown 3 in Fig. 7. It is noticeable that Sal naturally avoids repetitive areas in the brick facades for the top two images, and detects the geometric structure of the scene in the third image. In the fourth and fifth images, Sal further avoids repetitive areas in the scene, while LSDF and LSDF-Loc avoid them to a lesser extent. The sixth and seventh images show the effects of compression and occlusion on line detection respectively [53], where it can be seen that Sal detects the broad underlying structure of the scene. This implies our approach has potential applications for compression tasks, further demonstrated by quantitative results in Section 5.3.4. The bottom two images are of building facades from the experiments presented in Section 5.3.2.
Across the range of images, PPHT detects many erroneous lines, largely due to the fact that it does not take into account the direction of the gradient of pixels in its lines. LSD detects all line segments on the image based purely on the local image derivative, whereas Sal tends to detect the structurally important lines. LSDF and LSDF-Loc avoid some of the repetitive areas and cull many non-salient lines detected by LSD.

Quantitative evaluation on building facades
In this section, the performance of the line detectors is tested on a set of 12 image pairs of building facades taken from the dataset presented in [16], see the top of Fig. 9 for examples of the dataset and Fig. 7 for some qualitative results. The average number of line segments detected per image for this dataset is as follows: PPHT -634.9, LSD -1738.7, Sal -274.3, LSDF and LSDF-Loc -1137.6; with the average execution times: PPHT -0.167 s, LSD -0.182 s, Sal -325.96 s, LSDF -6.05 s and LSDF-Loc -14.02 s. The detection of a large number of lines is potentially problematic in a registration context since it can lead to fragmentation of prominent lines; the detection of many similar, repetitive lines that are difficult to match; or a significantly slower registration process if correspondences between lines also need to be established. Therefore, qualitative results for the top-50 lines are shown in Fig. 8 to take account of the number of lines per detector. Here it can be seen that, while PPHT and LSD detect the longer lines, there is more repetition in their detections: Sal on the other hand provides a more complete description given the same number of lines.
The quantitative results are shown in Fig. 9. The left-most graph simply shows repeatability against threshold, without taking into account the number of features produced by each detector. LSD performs the best, with LSDF and LSDF-Loc performing similarly for smaller thresholds, but slightly worse for larger thresholds. The repeatability results for various thresholds are shown on the first row of Fig. 9, where it can be seen that LSDF-Loc performs the best, with LSDF close behind. For k o 100, Sal performs better than LSD. The second row shows results for registration accuracy, where similar conclusions can be drawn: regardless of the threshold used, all three of our proposed methods (LSDF, LSDF-Loc and Sal) perform better than other methods, while PPHT consistently performs poorly.

Robustness to noise
Here the performance of the line detectors in the presence of Gaussian noise is tested. The same dataset of building facades as used in the previous section are used here, with varying levels of Gaussian noise added to each image. The top section of Fig. 10 shows qualitative results of line detection in increasing noise. With the exception of PPHT, all methods detect fewer lines in more noisy images.
Again, the repeatability and registration accuracy are measured for increasing levels of noise. In the first case, the repeatability of the top k lines of the line detectors is measured for a threshold t of 10 and 200, and where k is equal to 50 and 100 (thus producing four graphs), see the top four graphs in Fig. 10. For smaller levels of noise, LSDF-Loc performs best, with Sal performing better for higher levels: Sal records very little drop in performance in increasing noise.
In the second case, the proportion of homography estimates less than a threshold t are measured in increasing noise. Again, four graphs are produced, by varying t and k in the same manner. It is observed in the bottom four graphs of Fig. 10 that Sal and LSDF-Loc outperform the other methods, with Sal performing better when only the top 50 lines are used rather than 100. This shows the strength of salient line segment detectionits ability to detect segments indicative of the underlying geometry of the scene, unaffected by local perturbations of the image.

Robustness to image transformations
In this section, the performance of the line detectors is tested across a range of image transformations, according to the dataset by Zhang and Koch [53]. They include eight groups of transformations with six images in each group, with a known underlying homography between images for each group. Three of the groups are taken from [42]. Two example images from each group are shown at the top of Fig. 11 with the results at the bottom. Qualitative results for a compressed image and an occluded image from the dataset are shown in Fig. 7 it is observed that Sal more easily detects the salient line segments than other approaches, and explains its strong quantitative results (Fig. 11).
We solely test the repeatability for the top-50 and top-100 lines here. It is observed that our approaches consistently outperform PPHT and LSD. The only exceptions are in low texture and with scale changes, where they obtain a similar performance. Particularly for low texture this is not surprisingour approach is beneficial due to its ability to naturally avoid textured areas, clearly giving no benefit for low textured scenes. Sal performs particularly well for both compression and blurringtransformations that remove fine details but preserve the broad structure of the scene; consistent with the idea that it detects the salient aspects of the image. Again, LSDF-Loc often outperforms LSDF, however it will never perform as well as Sal for some transformations (e.g. compression) where the initial set of lines obtained by LSD are poor. Furthermore, the results demonstrated here are, overall, better than those given in the previous section, where Sal obtained a similar performance to LSD with the top-100 lines selected.

Scale variant evaluation
In this section we compare the existing state-of-the-art line detector, LSD, at different scales and compare to our proposed approach Sal. To do so, an image is downscaled by a given percentage and the LSD algorithm is run on the downscaled image. Results are shown in Fig. 12, where LSD is tested on downscaled images of 25%, 50%, 75%, and 100% (i.e. full resolution). The quantitative results are performed on the building facade dataset presented in [16]: exactly the same quantitative evaluation is performed as in Section 5.3.2. The qualitative results show that, at the higher scales, LSD detects fewer lines in repetitive structures (particularly evident in the first image of Fig. 12). This is to be  [7,2]; sixth and seventh from [53]; eighth and ninth from [16]. expected, as downscaling typically results in an image without fine detail. It is further demonstrated on the quantitative results where LSD at 75% and 50% perform slightly better than LSD at 100% (but Sal still performs significantly better). However, LSD is scalevariant, and it is difficult to know a priori the optimal scale. The results suggest that a downscaling to 25% is too high a scale to be of use, yet LSD at 50% still detects some fine-detail structures (e.g. the windows of the church). Our approach, Sal, is scale-invariant, allowing it to naturally avoid repetitive structures while detecting lines of variable widths. Furthermore its generalisable formulation, dependent on image statistics rather than image gradient, allow it to be naturally extended to depth imagery.

3D line detection results
In this section we evaluate our method for line detection for LiDAR data as described in Section 4. The parameters used are the same as for the 2D saliency detector, with the exception of the prior α and the parameter dðL i L i Þ used in the affinity propogation stage. In the first case α is decreased to 0.25 because the distributions are split into many more bins and α ¼ 1 is noticed to favour uniform distributions too strongly for such a large number of bins. For the second case, dðL i ; L i Þ is in proportion to the size of the model: 0.002 times the diameter of the bounding box of the model is used for this. Whilst the approach described in Section 4 describes line detection from an {intensity þ depth} image, it can be just as easily implemented by reprojecting lines using just the intensity or just the depth data separately. We shall refer to results from these three cases as both, intensity, and depth respectively. There are four datasets used: three of them from [35] (Courtyard, Plaza, Reception) 4 and one from the SCENE project [1] (Room), all of which are shown in Fig. 13. They have been generated from multiple LiDAR scans, with the largest, Plaza, generated from seven scans.

Input Image
PPHT LSD Sal LSDF LSDF-Loc  4 Courtyard refers to the Outdoor capture in Section 2 of the dataset presented in [35]. Plaza and Reception are both in Section 5 of the dataset of [35].

Qualitative evaluation
Qualitative results for these are given in Fig. 14   complete spherical vision, and the same happens when lines from other line detectors are reprojected to 3D (see Fig. 15). Secondly there is, for the most part, a reasonably high overlap between lines from intensity and lines from depthtypically due to depth discontinuities in the data. This may be observed particularly on the windows of the Courtyard dataset where there is no data

Quantitative evaluation
Here, we give multi-modal results for when just the intensity component and just the depth component are considered. Fig. 16 gives an example of such images: the depth component is rendered in such a way that the colour represents the direction of the normal.
Any other 2D line detector, as used in the previous section, may be used to detect lines on each face of a cubic image when just the intensity or depth component is considered. Hence, for a single LiDAR scan, we may consider only one of the components and detect 2D lines using any other approach (e.g. PPHT, LSD) on each face of its cubic image and backproject to 3D. However, using other approaches, lines should not be combined from multiple LiDAR scans using affinity propogation; this is designed to find a representative set of clusters, rather than to cull a small number of repeated segments from multiple views. Hence, for a fair qualitative comparison, we compare reprojected line segments taken from one component of just a single LiDAR scan.
Qualitative results from four LiDAR scans (one from each dataset) are shown in Fig. 15. It can be observed that, similarly to results in 2D (Fig. 7), Sal naturally avoids repetitive parts of the scene where others do not, particularly for the brickwork near the LiDAR scanner in the Courtyard dataset, and the tiled ceiling in Reception. The reprojection to 3D further demonstrates the ability of our approach to detect lines that are representative of the underlying aspects of the scene. This results in an often greater similarity between intensity and depth for Sal than there is for other methods, further demonstrating its applicability for multimodal data. Now quantitative results are discussed, between lines detected solely from intensity and solely from depth for a 2D image across a

Input Data
Depth Intensity Both  shown in Fig. 17. They indicate that Sal performs the best, particularly so for smaller numbers of lines. It demonstrates that Sal detects lines that are often geometrically salient and are potentially applicable for multi-modal registration (e.g. for the case of registering an image to an untextured LiDAR scan). Furthermore, we wish to emphasise that the results here are using Sal only for the sake of comparison (by constraining it to only the depth component), and that it has the other qualitative advantages of being able to detect both textural and geometric lines simultaneously, as well as naturally combining line segments between LiDAR scans.

Conclusions and future work
In this paper we have presented a novel, distribution-based approach to line detection. Whereas other line detectors simply detect lines based on the image gradient, our approach explicitly takes into account the surroundings of a line, resulting in a line segment detector that naturally avoids repetitive areas and returns lines that are representative of the structure of the scene. Furthermore, its highly generalisable formulation makes it readily applicable to other modalities, as demonstrated by an extension to depth imagery, where lines that jointly delineate changes in surface orientation or texture are detected. For fast salient line segment detection, a filtering approach is proposed, often yielding similar results as the full saliency approach. The results indicate that our approaches achieve superior repeatability across a range of transformations compared to other line detectors and the multi-modal results indicate that they naturally detects lines representative of the structure of the underlying scene. Not only is it of potential use in registration contexts as evaluated here, but also for compression related tasks as demonstrated by its high repeatability under this transformation.
There are potential areas for further improvementsin particular, the good results obtained by filtering methods (LSDF and LSDF-Loc) indicate that an approach that combines local and regional information about a line has potential benefits. Such a local and regional approach would have similarities with approaches to the more general problem of saliency detection in images. However since our approach is, to the best of our knowledge, the first distribution-based approach to line detection, we consider such a two-tier system beyond the scope of this research.
Future work will include the registration of lines between different modalities (e.g. 2D and 3D). For this problem, the correspondences between line segments need to be determined thus it is referred to as the Simultaneous Pose and Correspondence (SPC) problem. It is a computationally expensive problem [20] (for N 2D lines and M 3D lines, it has complexity OðM 2 NÞ) so any method that has a high repeatability for a smaller number of lines will be far more suited to this kind of problem. Hence it is anticipated that the approach proposed here will be of great use  for the more general problem of pose estimation, not only for its ability to detect the structure of a scene in a small number of lines, but also its unified approach to line detection in multi-modal data.

Research data
To facilitate repeatable research, all data used here that is not currently available as research data is now made available. Details are available for the Room dataset at [36]; for the Courtyard, Plaza and Reception datasets at [34]; and at [12] for images used that are not part of any cited dataset.

Conflict of interest
None declared.