Combining Low-Light Scene Enhancement for Fast and Accurate Lane Detection

Lane detection is a crucial task in the field of autonomous driving, as it enables vehicles to safely navigate on the road by interpreting the high-level semantics of traffic signs. Unfortunately, lane detection is a challenging problem due to factors such as low-light conditions, occlusions, and lane line blurring. These factors increase the perplexity and indeterminacy of the lane features, making them hard to distinguish and segment. To tackle these challenges, we propose a method called low-light enhancement fast lane detection (LLFLD) that integrates the automatic low-light scene enhancement network (ALLE) with the lane detection network to improve lane detection performance under low-light conditions. Specifically, we first utilize the ALLE network to enhance the input image’s brightness and contrast while reducing excessive noise and color distortion. Then, we introduce symmetric feature flipping module (SFFM) and channel fusion self-attention mechanism (CFSAT) to the model, which refine the low-level features and utilize more abundant global contextual information, respectively. Moreover, we devise a novel structural loss function that leverages the inherent prior geometric constraints of lanes to optimize the detection results. We evaluate our method on the CULane dataset, a public benchmark for lane detection in various lighting conditions. Our experiments show that our approach surpasses other state of the arts in both daytime and nighttime settings, especially in low-light scenarios.


Introduction
Lane detection is a critical and hot task for ensuring safe autonomous driving and advanced driver assistance system (ADAS) [1]. Computer vision technology has facilitated significant advancements in this field, making it a widely researched topic in academia and industry alike. The ability to accurately detect lanes plays a crucial role in guiding vehicles to travel safely on the road.
Various deep learning methods have emerged in lane detection in recent years, enhancing the detection performance and adaptability under diverse scenarios. Nevertheless, the task still faces several challenges that need to be overcome. One of the most pressing issues is the robustness of models to various adverse conditions in real-world scenarios, such as extreme light [2] and weather variations. Furthermore, lane markings are often occluded by other objects, such as cars, which frequently occurs in autonomous driving situations. This challenge requires lane detection models to possess high-level semantic understanding of the scene and be able to distinguish between lane markings and other objects accurately. Although the traditional method is simple and fast, the detection results are often not satisfactory. Addressing these challenges is crucial to enable the reliable deployment of autonomous driving systems in real-world environments. To more efficiently utilize visual information, there are specialized convolution operations that facilitate the aggregation of information from various dimensions in the segment model of SCNN [3]. These operations involve the processing of slice features, which are then added one by one to enable the information aggregation. Despite its effectiveness, this method is relatively slow to use in practice. SAD [4] has emerged as a promising approach to reduce the parameters required for lane detection. Despite its potential, SAD still suffers from suboptimal processing speed, which hinders its practical applicability. UFLD [5] transform this task into a simpler classification problem, which has the advantage of significantly reducing the computational overhead and enabling ultra-fast lane detection. However, the accuracy of UFLD still needs improvement, as it does not perform well in low light environments or for capturing small targets and details.
In this paper, we present a novel approach named low-light enhancement fast lane detection (LLFLD), which leverages the automatic low-light scene enhancement network (ALLE) network for image adaptive enhancement prior to detection. Specifically, we introduce a perceptual brightness threshold and apply enhancement detection to images that fall below this threshold, resulting in improved detection performance in low-light scenarios. To further increase accuracy, we leverage the symmetric distribution of lane markings in the driving perspective and design a symmetric feature flipping module (SFFM) that enhances low-level features for more precise lane localization. In addition to the main branch for detection, we present an auxiliary segmentation module that is only activated during training.
In our auxiliary segmentation module, we incorporate the channel fusion self-attention mechanism (CFSAT) to enhance the acquisition of global context information. This is achieved by establishing connections between the lane markings and the wider global feature map. Moreover, our approach involves the introduction of a novel structural loss function that is based on the geometric shapes of lanes. Through this loss function, we optimize the segmentation and classification performance of our model by leveraging the inherent structure of the lane markings. By employing these strategies, we are able to effectively capture important contextual features while simultaneously mitigating unnecessary redundancy, resulting in a more robust and efficient detection algorithm.
Our approach has been tested extensively using the benchmark CULane dataset [3], and we present comprehensive experimental results along with comparisons to other stateof-the-art methods. Additionally, we conducted an ablation study to analyze the impact of our design choices on the performance of the model. In summary, our contributions are:

•
We propose low-light enhancement fast lane detection (LLFLD), a lane detection system that combines a low-light image enhancement network (ALLE) with a lane detection network. Our approach significantly enhances the performance of the network in low-light environments while maintaining an ultra-fast detection speed.

•
We propose a symmetric feature flipping module (SFFM), which refines the low-level features and gains more precise lane localization. • We propose a channel fusion self-attention mechanism (CFSAT) in the auxiliary segmentation module, which captures and utilizes more global context information.

•
We propose a novel structural loss function that leverages the inherent geometric constraints of lanes to optimize the detection results.

Traditional Methods
Autonomous driving heavily relies on the accurate detection of lanes, which serves as a fundamental task in achieving self-driving capabilities. This task involves identifying the lanes on the road and providing relevant information such as lane ID, direction, curvature, width, length, and visualization. Over the years, a plethora of computer vision technologies have been developed to achieve robust lane detection without making assumptions about the number of lanes present, involving the location and tracking of lane boundaries in road scenes. Traditional approaches typically utilize hand-crafted features, such as edge detection, the Hough transform, and color filtering [6][7][8]. However, these techniques are known to be susceptible to various environmental factors, including illumination changes, occlusions, and complex road layouts, rendering them less effective in challenging driving scenarios.

Segmentation Methods
Semantic segmentation represents an active research area in computer vision, with the primary aim of assigning a semantic label to each pixel in an image. Two primary categories of methods can be identified, namely region-based and end-to-end approaches. Region-based methods often leverage region proposal networks (RPNs) [9] or sliding windows to extract regions of interest (ROIs) [10] from the input image. The segmented network is then applied to each ROI to generate precise masks. VPGNet [11] exemplifies a region-based approach, which generates ROIs for lanes and other road markings using RPNs and applies a multi-task network for pixel classification within ROIs. End-to-end methods, on the other hand, directly apply a segmentation network to the entire input image without using any ROIs, which are more computationally efficient. In the context of end-to-end methods, the SCNN [3] approach employs a specialized convolution operation in its segmentation module, enabling the effective aggregation of features from different dimensions. This is achieved by processing sliced features and incrementally adding them together to effectively capture multi-scale contextual information. However, the method can be computationally expensive. To deal with the challenge of real-time applications, researchers have proposed lightweight semantic segmentation methods, such as self-attention distillation (SAD) [4]. To enable effective knowledge transfer between high level semantic information and low-level location feature attentions, SAD utilizes an attention distillation module that treats the former as a teacher and the latter as a student. LaneNet [12] employs an instance segmentation pipeline to handle a varying number of lines; however, the generation of line instances necessitates post-inference clustering.

Anchor-Based Methods
The anchor-based approach is a widely used technique for lane detection, where a set of predefined anchors is employed to localize the lane markings. Among different variants of this approach, the linear anchor-based method utilizes linear anchors as reference points for accurate lane regression. The pioneering work of Line CNN [13] introduced the use of line anchors in lane detection; later, Feng et al. [14] proposed a Bezier curve-based regression model to address the challenge of a large number of polynomial regression parameters. The Bezier curve model is not only computationally efficient but also highly stable. Lane ATT [15] proposed a novel attention mechanism based on anchor points, which aggregates global information and shows outstanding performance accompanied by high efficiency. Some recent methods employ classification based on row selecting [5,16] as a means of custom down-sampling segmentation in units of one pixel, but still require post-processing. Another noteworthy anchor-based method is UFLD [5], which frames the lane detection problem as a row anchor presetting and row selection task that leverages continuity of lane lines for improving detection speed. It exhibits ultra-fast detection speed, albeit with an accuracy that leaves room for further improvement.

Low Light Image Enhancement
The field of low-light image enhancement has achieved a relatively high level of development [17][18][19]. Histogram equalization and dehazing-based methods are the two primary techniques employed in traditional image enhancement methods. Histogram equalization is a classical method that enhances images by stretching their histogram distribution to achieve uniform brightness across the entire range. Zhang et al. [20] propose the utilization of various image processing techniques, including histogram equalization, as a data normalization method for machine learning. To evaluate the effectiveness of these techniques, the paper compares their performance to that of z-score normalization on a face-based authentication algorithm using SVM and random forest classifiers. FEBD [21] for multiple histograms modification presents a novel fast expansion-bins-determination method for multiple-histogram-modification-based reversible data hiding (RDH), which is a powerful technique that enables the embedding of data into images while maintaining the original information content. Specifically, the proposed method enhances the RDH approach by introducing an efficient algorithm capable of determining optimal expansion bins quickly and accurately, with only a negligible performance loss compared to the original method. Some approaches adopt the dehazing-based [22], which presents a new single image dehazing algorithm based on the dark channel prior. This method features an improved atmospheric light estimation method and a low-complexity morphological reconstruction technique to generate high-quality dehazed images with significantly reduced computational complexity compared to previous approaches. To solve the scattering of sunlight in the atmosphere, Ref. [23] proposes an image enhancement approach based on inverting low light images and applying image dehazing with an atmospheric light scattering model. Unlike traditional methods that rely on hand-crafted features, deep learning methods leverage the power of neural networks to learn a pixel mapping function to transform low-light images to high-quality outputs. Recently, Ref. [24] raised a new dataset of low-light images and videos, along with an online platform that comprises various popular methods for evaluation and comparison. Differing from the methods which need the reference of Retinex theory and labels [25], Zero-DCE [26] is a no labels approach that considers image brightness enhancement mapping in supervising by several no references luminosity and color loss function. Accordingly, the algorithm exhibits enhanced robustness, a wider range for adjusting the dynamic range of images, and reduced computational costs.

Overall Pipeline
LLFLD is a lightweight lane detection model that integrates low light image enhancement techniques. Our approach achieves ultra-low computational costs by transforming the segmentation task into the selection and classification task of predetermined line anchors. An overview of the method is shown in Figure 1. Our framework comprises two parts: a primary classification detection branch and an auxiliary segmentation branch, designed to process RGB images I ∈ R 3×H×W captured from a forward-facing vehicle camera. The outputs of the system are selected lane boundary points corresponding to row anchors. To generate these outputs, the input images undergo automatic low-light enhancement before being fed to the classification detection branch, which includes a backbone ResNet stage. The low-level feature maps after convolution still retain strong location information, so prior to entering the backbone ResNet stage, we flip the feature maps symmetrically using the symmetric feature flipping module (SFFM) to enhance the visual information that is often lost due to occlusion and blur, thereby more effectively locating the lanes. Then, we obtain F ∈ R C lane ×H slice ×W slice by passing the feature maps output by the backbone through a fully connected layer and a reshape layer, where C lane , H slice , W slice are, respectively, the number of lanes in each image, and the numbers of row demarcation and column demarcation. Finally, we predict the final output channels using F by transforming the segmentation problem into an anchor-based selection classification problem, which improves the detection efficiency significantly.
The auxiliary segmentation branch performs feature map upsampling from different stages of the Resnet network at varying scales, and concatenates and convolves them, similar to the structure of FPN [27] that fuses high-level features on the top and low-level features on the bottom. We update the feature map after hierarchical fusion with global information through the channel fusion self-attention module (CFSAT) and generate segmentation results to assist parameter learning in the backbone classification branch. It is noteworthy that the auxiliary segmentation branch is exclusively active during training but deactivated during the test and detection phases. In this way, this approach not only facilitates parameter learning in the backbone branch but also preserves fast detection speed. segmentation results to assist parameter learning in the backbone classification branch. It is noteworthy that the auxiliary segmentation branch is exclusively active during training but deactivated during the test and detection phases. In this way, this approach not only facilitates parameter learning in the backbone branch but also preserves fast detection speed.

Automatic Low-Light Scene Enhancement (ALLE)
In practical applications, low brightness conditions can significantly impede visual perception, particularly in situations such as nighttime driving. The traditional method of improving image brightness often introduces more noise and lacks the necessary dynamic range and robustness. As a result, although it increases the visibility of the target lane lines, it also brings unwanted information that interferes with the detection results. To account for the complex and variable driving conditions, we recognize that not every image requires brightness enhancement. Additionally, applying the same enhancements to both normal and high brightness images can result in overexposure, which is also detrimental to the lane detection results. To overcome these challenges, our objective is to automatically enhance low-light images to achieve superior visual effects, with a specific emphasis on the detection performance of the enhancement model under low-light conditions. In this paper, we propose the automatic low-light scene enhancement (ALLE) module, which employs a Zero-DCE network to estimate the best mapping from low light image to normal exposure image, as depicted in Figure 2. We perform adaptive enhancement according to the brightness of different images, that is, enhance low-brightness images only, and maintain the same normal brightness and high-brightness images, rather than uniform or random preprocessing of all images. Initially, we calculate the root mean square (RMS) of RGB values of the input image , which enables us to determine the perceived brightness Pb [28] of the image I in the RGB space, as represented by the following equation: Figure 1. Overview of the proposed LLFLD. The pipeline consists of a primary classification detection branch and an auxiliary segmentation branch. Notably, the auxiliary segmentation module is exclusively activated during the training phase.

Automatic Low-Light Scene Enhancement (ALLE)
In practical applications, low brightness conditions can significantly impede visual perception, particularly in situations such as nighttime driving. The traditional method of improving image brightness often introduces more noise and lacks the necessary dynamic range and robustness. As a result, although it increases the visibility of the target lane lines, it also brings unwanted information that interferes with the detection results. To account for the complex and variable driving conditions, we recognize that not every image requires brightness enhancement. Additionally, applying the same enhancements to both normal and high brightness images can result in overexposure, which is also detrimental to the lane detection results. To overcome these challenges, our objective is to automatically enhance low-light images to achieve superior visual effects, with a specific emphasis on the detection performance of the enhancement model under low-light conditions. In this paper, we propose the automatic low-light scene enhancement (ALLE) module, which employs a Zero-DCE network to estimate the best mapping from low light image to normal exposure image, as depicted in Figure 2. We perform adaptive enhancement according to the brightness of different images, that is, enhance low-brightness images only, and maintain the same normal brightness and high-brightness images, rather than uniform or random preprocessing of all images. Initially, we calculate the root mean square (RMS) of RGB values of the input image I ∈ R 3×H×W , which enables us to determine the perceived brightness Pb [28] of the image I in the RGB space, as represented by the following equation: where R, G, and B denote to the RMS-normalized RGB channel values of the input image I. Our study reveals that values of perceived brightness below 60, calculated using Equation (1), noticeably affect human visual perception. On the contrast, for values above 80, the perceived image brightness is considered normal. Thus, we apply deep enhancement processing to those images whose perceived brightness falls below the threshold δ, which has been set to 70.
where R , G , and B denote to the RMS-normalized RGB channel values of the input image I . Our study reveals that values of perceived brightness below 60, calculated using Equation (1), noticeably affect human visual perception. On the contrast, for values above 80, the perceived image brightness is considered normal. Thus, we apply deep enhancement processing to those images whose perceived brightness falls below the threshold δ , which has been set to 70.
The framework of automatic low-light scene enhancement (ALLE). A Zero-DCE network is designed to inference a mapping n Ф that corresponds to the optimal light enhancement tactics.
The network enhances an input dark image iteratively, with n representing the number of iterations that regulate the extent of image enhancement. For normal exposure images, we recognize the original images as the outputs.
To accomplish the task of low-light image enhancement, we leverage the Zero-DCE network, which formulates the problem as a non-reference multi-level recursive curve estimation task. The network adopts a lightweight encoder-decoder architecture for feature extraction and image enhancement generation. This approach enables end-to-end enhancement of images under low-light conditions. Furthermore, the perceived brightness of different images varies greatly, which hinders the training of a single network. Minimizing this variation as much as possible facilitates network optimization. Specifically, our objective is to train a mapping function n Ф , capable of transforming low-light images into normally exposed ones, as denoted by the following mathematical expression: The equation governing automatic low-light image enhancement entails a parameter map Ф that is trainable and shares the same dimensions as the input image. Here, x refers to the RMS-normalized channel values of the input image, while n denotes the number of iterations that regulate the curvature of the curve. The enhanced version of the final input, denoted as ( ) n LE x , is obtained using the previous version x . Hence, the expression for ( ) 1 LE x can be expressed as: In order to learn the most-fitting enhance mapping utilizing ALLE network without any reference, we introduce a suite of differentiable luminosity and color loss functions that enable us to assess the quality of enhanced images while supervisory information is Figure 2. The framework of automatic low-light scene enhancement (ALLE). A Zero-DCE network is designed to inference a mapping Φ n that corresponds to the optimal light enhancement tactics. The network enhances an input dark image iteratively, with n representing the number of iterations that regulate the extent of image enhancement. For normal exposure images, we recognize the original images as the outputs.
To accomplish the task of low-light image enhancement, we leverage the Zero-DCE network, which formulates the problem as a non-reference multi-level recursive curve estimation task. The network adopts a lightweight encoder-decoder architecture for feature extraction and image enhancement generation. This approach enables end-to-end enhancement of images under low-light conditions. Furthermore, the perceived brightness of different images varies greatly, which hinders the training of a single network. Minimizing this variation as much as possible facilitates network optimization. Specifically, our objective is to train a mapping function Φ n , capable of transforming low-light images into normally exposed ones, as denoted by the following mathematical expression: The equation governing automatic low-light image enhancement entails a parameter map Φ that is trainable and shares the same dimensions as the input image. Here, x refers to the RMS-normalized channel values of the input image, while n denotes the number of iterations that regulate the curvature of the curve. The enhanced version of the final input, denoted as LE n (x), is obtained using the previous version LE n−1 (x). Hence, the expression for LE 1 (x) can be expressed as: In order to learn the most-fitting enhance mapping utilizing ALLE network without any reference, we introduce a suite of differentiable luminosity and color loss functions that enable us to assess the quality of enhanced images while supervisory information is not required at all. Our ALLE network is trained using the following three types of loss functions.
Intensity consistency loss. The loss function L Ints aims to foster intensity consistency of the images after enhancement. This is achieved by minimizing the difference value between the adjacent domains of the dark images and their corresponding enhanced domains, as represented by the following equation: where M represents the number of divided domains, and ω(i) denotes a set that contains the five domains including the center domain i and the upper, down, left, and right To prevent the problem of over exposure or under exposure occurring in the enhanced version, we incorporate an exposure constrain loss function L exp to regulate the exposure level: where N denotes the number of divided domains of size 8 × 8 which do not overlap each other, Z k is the average intensity value of the k − th divided domains in the enhanced version, and E represents the satisfactory exposure degree of an image, which is set to 0.55. The function encourages the network to produce enhanced low-light images with satisfactory exposure domains and penalizes over exposure or under exposure domains.
Color channel loss. The color channel loss function L col is formulated as follows: where J p and J q denote the average intensity value of the p − th and q − th channels in the enhanced image, respectively. (p, q) represents a pair of channels included (R, G), (R, B), and (G, B). The color constancy loss function encourages the DCE network to produce enhanced images that are color consistent by minimizing the difference between color channels. Total enhanced loss. The total loss is formulated as follows: where w col is the weight of the losses. The proposed low light enhancement network (ALLE) is only appropriate for dimly lit settings and cannot be applied in other harsh weather conditions that hinder visibility, for instance, fog and torrential rain. To clarify, the developed ALLE optimizes low-light images to normal-light conditions, and is not designed as a mechanism to remove any obstructions that might impede clear sight.

Symmetric Feature Flipping Module (SFFM)
The global structure of lanes in driving scenes is examined from a first-person perspective, where it is observed that lanes manifest symmetry. Specifically, in non- To effectively utilize this observation, we propose the symmetric feature flipping module (SFFM) in this paper, as illustrated in Figure 1. The SFFM comprises two distinct convolution and normalization layers that transform each feature map independently. The transformed feature maps are then combined and passed through a ReLU activation function. Specifically, the lower-level feature map contains abundant position information.
To obtain a new feature map, we perform a symmetric flip of the lower-level feature map along its longitudinal axis. As shown in Figure 1, the blue and orange regions of the feature map are symmetrically reversed and then concatenated together. A significant proportion of the examples in both our training set and test set consist of either straight or nearly straight slow turns. Approximately 70% of these turns are symmetric or nearly symmetric, 28% are not completely symmetric, and a small percentage of 1.2% are sharp turns. It is important to note that the road lanes are not strictly axisymmetric, but this processing step is implemented to improve the attention scores of the blocked and lost lanes. Notably, we employ deformable convolution [29] to replace the standard convolution operation to address problems caused by camera angle changes, such as jitter and rotation. The proposed module enhances the low-level features and gains more precise lane localization.

Channel Fusion Self-Attention Mechanism (CFSAT)
Lane detection is a critical computer vision task that demands a balance between high level semantic information understanding and precise low level localization feature refining. To attain this balance, we propose a two-stage approach, wherein we first detect lanes using high-level semantic information and then refine the results based on low-level features. By leveraging the complementary nature of these stages, our method produces more accurate lane detections. In particular, we introduced the symmetric feature fusion module (SFFM) in the previous section to refine the low-level features. However, since a simple segmentation network may not be sufficient to gather adequate global context information, we developed the channel fusion self-attention mechanism (CFSAT) in the auxiliary segmentation branch, as demonstrated in Figure 3. This mechanism enhances the representation of advanced semantic features of lanes by incorporating more global context information.
mation. To obtain a new feature map, we perform a symmetric flip of the lower-level feature map along its longitudinal axis. As shown in Figure 1, the blue and orange regions of the feature map are symmetrically reversed and then concatenated together. A significant proportion of the examples in both our training set and test set consist of either straight or nearly straight slow turns. Approximately 70% of these turns are symmetric or nearly symmetric, 28% are not completely symmetric, and a small percentage of 1.2% are sharp turns. It is important to note that the road lanes are not strictly axisymmetric, but this processing step is implemented to improve the attention scores of the blocked and lost lanes. Notably, we employ deformable convolution [29] to replace the standard convolution operation to address problems caused by camera angle changes, such as jitter and rotation. The proposed module enhances the low-level features and gains more precise lane localization.

Channel Fusion Self-Attention Mechanism (CFSAT)
Lane detection is a critical computer vision task that demands a balance between high level semantic information understanding and precise low level localization feature refining. To attain this balance, we propose a two-stage approach, wherein we first detect lanes using high-level semantic information and then refine the results based on low-level features. By leveraging the complementary nature of these stages, our method produces more accurate lane detections. In particular, we introduced the symmetric feature fusion module (SFFM) in the previous section to refine the low-level features. However, since a simple segmentation network may not be sufficient to gather adequate global context information, we developed the channel fusion self-attention mechanism (CFSAT) in the auxiliary segmentation branch, as demonstrated in Figure 3. This mechanism enhances the representation of advanced semantic features of lanes by incorporating more global context information.
In the equation above, the feature map before updating with channel fusion self-attention mechanism (CFSAT) is denoted as . Conv represents a The resulting updated feature map, denoted M update ∈ R C×H×W , can be expressed by the following formula: In the equation above, the feature map before updating with channel fusion selfattention mechanism (CFSAT) is denoted as M ∈ R C×H×W . Conv represents a convolutional operation. To further enhance the feature map, an attention mechanism referred to as SA is employed to update the map with attention weights. The SA operation utilizes an attention matrix to weigh different features in the map, resulting in improved accuracy in lane detection. Mathematically, the SA operation is expressed as follows: where the symbol represents the concatenation operation, while GA and GM denote global average pooling and global maximum pooling, respectively. The function ψ 2 sum computes the square of the sum of corresponding pixels across channels. The choice of the ψ 2 sum operator for channel fusion is based on the results of SAD. The SAD experiment revealed that using ψ 2 sum operator for channel fusion yields maximum performance improvement.

A Novel Lane Structural Loss Function
Most detection models only utilize segmentation loss or classification loss to supervise detection results, disregarding the structure of lane lines. Nevertheless, we recognize that lane lines possess strong geometric prior information. Specifically, lane lines are typically thin, white strips with symmetric properties. Furthermore, due to the perspective principle, the absolute value of slope of the lane lines on both sides of the lane in which the vehicle is travelling is greater than that of adjacent lanes, as illustrated in Figure 4. In this paper, we introduce the probability P i,j,k to indicate the probability that a grid located at (j, k) in the i − th lane lines belongs to the lane. To ensure differentiability of the function, we use the softmax function to obtain the probability of different locations: Probability i,j,: = so f tmax P i,j,1:w (10) where Probability i,j,: represents the probability at each location, P i,j,1:w is a w-dimension vector and 1: w indicates from column 1 to column w. It is worth noting that P i,j,k is a (w + 1) − dimension vector, where the (w + 1) − dimension indicates the probability of presence of a lane. The column localization Pos i,j of the i − th lane in the j − th row is transformed into a mathematical expectation, which is expressed as, where w represents the number of columns for partition. In practice, the localization points of lane lines between adjacent rows should be as close as possible. However, due to perspective relationships, there should be a geometric difference of α between the lane localization points of adjacent rows. To address this issue, we propose a novel structural loss function, denoted as L struc , which can be expressed mathematically as follows: where C denotes the number of lanes and h denotes the number of presetting row demarcation. Meanwhile, M is a binary mask with a value of 1 when i = l 1 or l 2 , and 2 otherwise. Here, l 1 and l 2 correspond to the lanes with the maximum and second largest absolute values of slopes, respectively. Their calculation formulas are as follows: To regulate the output results in the primary branch, we utilize the classification loss function denoted L cls , which can be expressed as follows: where GT i,j,: denotes the one-hot label of the correct locations, and L CE represents the crossentropy loss function. In addition to L cls , we also utilize cross-entropy as the auxiliary segmentation loss function, denoted as L seg , in the auxiliary segmentation branch. Combining these loss functions, we obtain the overall loss of our method can be expressed as: L total = L cls + w seg L seg + w struc L struc (16) where w seg and w struc are the weights of segmentation loss function L seg and structural loss function L struc , respectively.

Experiments
CULane [3] is one of the most comprehensive and challengi tasets for lane detection, which covers a wide range of scenarios taset contains images with a resolution of 1640 × 590 pixels, and ca different classes according to the degree of difficulty and comple formance and robustness of our proposed method, we conducte on the CULane dataset, which comprises 133,235 images in total three subsets: a training set with 88,880 images, a validation set test set with 34,680 images. In this section, we first introduce th some of the implementation details that are used in our experime

Experiments
CULane [3] is one of the most comprehensive and challenging publicly available datasets for lane detection, which covers a wide range of scenarios and conditions. The dataset contains images with a resolution of 1640 × 590 pixels, and categorizes them into nine different classes according to the degree of difficulty and complexity. To evaluate the performance and robustness of our proposed method, we conducted extensive experiments on the CULane dataset, which comprises 133,235 images in total. The dataset is split into three subsets: a training set with 88,880 images, a validation set with 9675 images, and a test set with 34,680 images. In this section, we first introduce the evaluation metrics and some of the implementation details that are used in our experiments. Then, we report and analyze the experimental results on the CULane dataset and compare them with several state-of-the-art methods. Finally, we present an ablation study to investigate the effectiveness of different components and settings in our method.

Evalutaion Metrics
For the CULane dataset, the official metric used is the F1 score proposed in [3].
where Precision = TP TP+FP and Recall = TP TP+FN . The CULane dataset is evaluated according to a specific protocol that involves representing the ground truth and predicted lane lines as 30-pixel-wide curves in the image space. This width of 30 pixels is used to define the lane lines of the dataset. The match was calculated for each whole line. A prediction line and a ground truth line are considered a match if their pixel IoU is over 0.5.

Implementation Details
In our implementation, we set the row demarcation for the CULane dataset according to the image height, which is 590 pixels. The row demarcation is a set of horizontal lines that divide the image into several regions for lane detection. We set the row demarcation to start from 0 pixels and end at 530 pixels with an interval of 10 pixels, by considering that the lane lines consistently appear at the bottom of the image (0 pixels high), but are not visible in the top portion of the image (around 530 pixels high), which typically features the sky, mountains, or distant objects in the driver's view. The interval represents the number of divided rows, and each grid cell provides an anchor to be selected for lane prediction. To balance model efficiency and accuracy, we opted for a relatively modest interval and grid cell. Specifically, we set the interval to 10 pixels and the number of grid cells per row to 155 for this dataset.
We resized the images to 288 × 800 following the method in [3] in the optimization process. With a cosine decay learning rate strategy [30], we trained our model using the SDG-momentum optimizer [31], and the momentum was set to 0.9. The learning rate was initialized to 4 × 10 −4 . We employed ResNet as our backbone network, specifically utilizing ResNet18 and ResNet34. For this architecture, the convolution kernel size of the backbone network is 3 × 3, and both the stride and padding were set to 1. The loss coefficients w seg and w struc in Equation (15) are both set to 0.8 to balance the segmentation loss and the structural loss. We use a batch size of 16, and the model is trained for 60 epochs on the CULane dataset. All training and testing are performed using PyTorch 1.9 [32], an Intel(R) Core(TM) i7-4790K and an NVIDIA GTX 1080Ti GPU with 11 GB memory.

Results
We present the quantitative results of LLFLD on the CULane dataset in Table 1, where we compare our method with other advanced methods in terms of total F1 score and running speed. We also show some qualitative results of nine scenarios randomly selected from the test set of the CULane dataset in Figure 5, where we visualize the predicted lane lines by proposed methods. As can be seen from Table 1, our method achieves the best performance among all the compared methods on the CULane dataset, with an F1 score of 75.2% while running at 177 FPS. Compared to UFLD, which has the same research type as ours, our method has a significant advantage in accuracy while maintaining a comparable speed. For example, our lightweight model based on ResNet-18 achieves an F1 score of 72.6% while running at 330 FPS, while UFLD's lightweight model based on ResNet-18 achieves an F1 score of 68.7% while running at 322 FPS on the same machine. Moreover, our method also surpasses the advanced method Lane ATT method, which uses an attention mechanism to enhance the feature representation for lane detection. For instance, our model based on ResNet-34 achieves an F1 score of 75.2% while running at 176.3 FPS, while Lane ATT's model based on ResNet-34 achieves an F1 score of 74.1% while running at 140 FPS on the same machine. Furthermore, our method shows remarkable improvement in some challenging scenarios, such as night and shadow scenes. For example, in the "night" scene, our method achieves an F1 score of 66.0%, which is 3.4% higher than Lane ATT's F1 score of 62.6%. Similarly, in the "shadow" scene, our method achieves an F1 score of 66.9%, which is 2.3% higher than Lane ATT's F1 score of 64.6%. These results demonstrate that our framework and structural loss function can effectively handle the low-light and low-contrast situations and produce accurate and robust lane detection results, as our method can enhance images for low light scenes and utilize abundant global information and geometric information to tackle the no-visual-clue problems.
We show some visualizations of our proposed LLFLD method on the CULane dataset in Figure 5, where we display the lane lines predicted by our method on nine different road conditions. As can be seen from the figure, our method performs well under various scenarios and challenges, such as normal, crowded, dazzle and so on. The figure demonstrates that our method can accurately and robustly detect the lane lines under different illumination, occlusion, and curvature situations. while running at 140 FPS on the same machine. Furthermore, our method shows remarkable improvement in some challenging scenarios, such as night and shadow scenes. For example, in the "night" scene, our method achieves an F1 score of 66.0%, which is 3.4% higher than Lane ATT's F1 score of 62.6%. Similarly, in the "shadow" scene, our method achieves an F1 score of 66.9%, which is 2.3% higher than Lane ATT's F1 score of 64.6%. These results demonstrate that our framework and structural loss function can effectively handle the low-light and low-contrast situations and produce accurate and robust lane detection results, as our method can enhance images for low light scenes and utilize abundant global information and geometric information to tackle the no-visual-clue problems.  The best and second-best results across methods are in bold and underlined, respectively. All the images are resized to 288 × 800, and all the experiments were computed on a machine with an Intel(R) Core(TM) i7-4790K CPU and an RTX1080Ti GPU.

Method Total Normal Crowded Dazzle Shadow No line Arrow Curve Cross Night FPS MACs(G)
Res50-Seg [33] 66 We show some visualizations of our proposed LLFLD method on the CULane dataset in Figure 5, where we display the lane lines predicted by our method on nine different

Ablation Study
This experiment aims to evaluate the impact of each major part of the proposed design on the lane detection performance. The four major parts are: automatic low-light scene enhancement (ALLE), symmetric feature flipping module (SFFM), channel fusion self-attention mechanism (CFSAT), and the proposed new structural loss function L struc . To evaluate the performance of our proposed modules, we conducted a series of experiments under the same training settings but with different combinations of modules. The quantitative results of our modules, measured by the same metrics, are presented in Table 2.
To demonstrate the effectiveness of the ALLE module, which is a novel adaptive low-light enhancement module that can adjust the brightness and contrast of the input image according to the ambient illumination, we conducted a comparative experiment on the CULane dataset with night scenes. We compared the detection results before and after turning on the ALLE module on the CULane dataset in night scenes, as shown in Figure 6.  From Table 2, we can reach the conclusion that all proposed designs achieve a significant improvement in performance compared to the baseline, which provides evidence for the effectiveness of the proposed modules.
It can be observed that Figure 6 visually demonstrates that the detection results of the model with ALLE are better than those without ALLE in the same low-light scenes, which provides more intuitive evidence for the effectiveness of ALLE. Specifically, the ALLE module can help the model detect lanes accurately, even in low-light conditions when they may not be visible to the human eyes. This is a significant advantage of the proposed method, as it can improve the safety and reliability of lane detection in realworld scenarios with low-light conditions.

Conclusions
In this paper, we proposed LLFLD: a novel lane detection model specifically designed for enhancing low light scenes based on row anchor selection. We introduced the automatic low-light scene enhancement (ALLE), an adaptive low-light enhancement module that can adjust the brightness and contrast of the input image, according to the ambient illumination, to optimize the lane detection results in low-light scenarios. Our approach leverages feature flipping and channel self-attention mechanisms to effectively collect and utilize both low-level location information and high-level semantic information from the feature maps. Additionally, our new structural loss function leverages the geometric priori of the lane to optimize the detection results. Moreover, our design based on row selection ensures that the model is fast and lightweight, which is suitable To assess the efficacy of the ALLE module in low-light scenarios, we utilized F1 values in shadow and night scenarios as our evaluation indicator. Table 3 illustrates the ablation experiments performed under two states-with ALLE activated and with ALLE deactivated. From Table 3, we can reach the conclusion that ALLE being activated achieves a significant improvement in performance compared to ALLE being deactivated, which provides evidence for the effectiveness of the proposed module in the low-light scenarios, such as shadow and night. Table 3. Quantitative comparison of detection results on "shadow" and "night" scenarios with and without automatic low-light scene enhancement (ALLE). The F1-measure is computed on the CULane testing set with an IoU threshold of 0.5. From Table 2, we can reach the conclusion that all proposed designs achieve a significant improvement in performance compared to the baseline, which provides evidence for the effectiveness of the proposed modules.

W/O ALLE W/ALLE
It can be observed that Figure 6 visually demonstrates that the detection results of the model with ALLE are better than those without ALLE in the same low-light scenes, which provides more intuitive evidence for the effectiveness of ALLE. Specifically, the ALLE module can help the model detect lanes accurately, even in low-light conditions when they may not be visible to the human eyes. This is a significant advantage of the proposed method, as it can improve the safety and reliability of lane detection in real-world scenarios with low-light conditions.

Conclusions
In this paper, we proposed LLFLD: a novel lane detection model specifically designed for enhancing low light scenes based on row anchor selection. We introduced the automatic low-light scene enhancement (ALLE), an adaptive low-light enhancement module that can adjust the brightness and contrast of the input image, according to the ambient illumination, to optimize the lane detection results in low-light scenarios. Our approach leverages feature flipping and channel self-attention mechanisms to effectively collect and utilize both lowlevel location information and high-level semantic information from the feature maps. Additionally, our new structural loss function leverages the geometric priori of the lane to optimize the detection results. Moreover, our design based on row selection ensures that the model is fast and lightweight, which is suitable for real-time applications. Experimental results on the popular CULane dataset demonstrate the favorable performance (measured by F1 score) of our proposed model. Furthermore, our method achieves fast inference speed and is lightweight. Specifically, the ResNet-34 version of our method can achieve 177 FPS while maintaining comparable performance at the same resolution. Overall, our proposed method is effective in enhancing low-light scenes for better lane detection results while maintaining fast inference speeds and low computational complexity.