A non‐contact vision‐based system for multipoint displacement monitoring in a cable‐stayed footbridge

Vision‐based monitoring receives increased attention for measuring displacements of civil infrastructure such as towers and bridges. Currently, most field applications rely on artificial targets for video processing convenience, leading to high installation effort and focus on only single‐point displacement measurement, for example, at mid‐span of a bridge. This study proposes a low‐cost and non‐contact vision‐based system for multipoint displacement measurement based on a consumer‐grade camera for video acquisition and a custom‐developed package for video processing. The system has been validated on a cable‐stayed footbridge for deck deformation and cable vibration measurement under pedestrian loading. The analysis results indicate that the system provides valuable information about bridge deformation of the order of a few centimetres induced, in this application, by pedestrian passing. The measured data enable accurate estimation of modal frequencies of either the bridge deck or the bridge cables and could be used to investigate variations of modal frequencies under varying pedestrian loads.

Deformation is another important metric for bridge condition and performance assessment. For example, measurement of deformation during controlled vehicle load testing helps to estimate bridge load carrying capacity. [4,5] Displacement is related to the structural stiffness, and extreme values might indicate either an extreme load or a deficiency in the structure. When recorded at high sample rates, displacement data provide valuable information about dynamic characteristics and, hence, changes in structural condition.
For conventional displacement sensors such as linear variable differential transformers and dial gauges, a stationary reference point is required that could be challenging in field tests. Global positioning system is the proper choice for monitoring only flexible large-scale structures due to the limitation of the measurement accuracy (i.e., sub-centimetre [6] or centimetre level [7] ). The indirect method through integrating the acceleration measurement is usually applied for short duration signals (e.g., a few seconds) and might fail to estimate static or quasi-static displacement components. Limitations of more traditional displacement sensing technologies have driven research in non-contact optical sensing.

| Review of vision-based approaches
Vision-based systems have advantages over other sensors, for example, easy installation, remote non-contact operation, and distributed sensing that promotes use of a single camera for multipoint simultaneous measurement. Efforts have been spent on developing advanced vision-based systems to provide accurate and robust displacement measurement primarily of high-rise buildings, [8] short-span bridges, [9][10][11][12] and long-span bridges. [10,[13][14][15][16] Previous studies have indicated the significant potential of vision-based systems for structural condition evaluation, especially for system identification. [17][18][19] Other applications based on the camera measured displacement include finite element model calibration, [20] damage detection, [21] and bridge weigh-in-motion where another camera is used for the traffic monitoring. [22] However, vision-based systems still face several field challenges, such as the requirement for stable camera mounting, [11] measurement error caused by lighting changes, [23] and atmospheric effects affecting light refraction-particularly for long-range measurements.
Most of the existing applications have relied on artificial targets for video processing convenience leading to necessity of direct access to the structure as well as increased installation effort. Moreover, the focus is commonly only for singlepoint displacement measurement, for example, at the bridge mid-span, although multipoint simultaneous sensing is supported by the camera sensors.

| Non-contact sensing
There have been relatively few field applications using completely non-contact vision-based systems. In most examples, an artificial target or a set of targets with salient features and some known dimensions [13,19,24] were attached to a structure for the convenience of stable target tracking and, more importantly, for providing point or line correspondences to determine the projection transformation relating the image coordinate system and the structural coordinate system. Recent non-contact field applications [17,[25][26][27] have eliminated the dependency on artificial targets by using a scaling factor for camera projection transformation. The scaling factor is the simplest method to obtain the projection transformation provided that either the camera-to-target distance or a feature dimension near the region of interest is known. The scaling factor estimated by the camera-to-target distance is sensitive to the tilt angle of the camera optical axis that is suggested to be less than 10°through laboratory validation tests in short distance (≤3.7 m). [28] Camera positioning is less critical for the scaling factor estimated by a known dimension, [9] but the estimated scaling factor is only reliable for displacement measurement along the same direction as the provided dimension.

| Distributed sensing
Vision-based systems allow a single camera to measure structural displacements of multiple points in a structure. The feature of distributed sensing has been used in laboratory structures [17,19,29,30] for multistorey displacement measurement and system identification, as well as for cable vibration monitoring [31][32][33][34] aimed at the estimation of modal frequencies or cable tensions, but applications in bridge deformation measurement are limited, with only a few examples. [27,35]

| Purpose of this study
The purpose of this study is to investigate the potential of non-contact vision-based systems for multipoint measurement in field applications. Realisation of the two features, completely non-contacting and multipoint simultaneous sensing, is the focus of this study. In most applications to date, the hardware used is a professional high-resolution camera with long-focus lens; thus, only a local region over the whole structure, for example, mid-span of bridge, is covered in the field of view. In this study, a low-cost consumer-grade camera with a wide angle lens is used for video acquisition with a wide area of the bridge included in the field of view. A custom-developed package is used for the video processing that supports non-contact sensing for both deck deformation and cable vibration measurement. The developed system enables quick installation/ removal, requires no access to the bridge structure, and provides simultaneous multipoint displacement measurement.
To that end, Section 2 provides descriptions of the proposed vision-based system including the hardware and video processing methods used. Section 3 describes the validation test on a simple laboratory beam structure and evaluates the measurement accuracy of vision-based system. Section 4 introduces a field test in a cable-stayed footbridge under pedestrian crowd loading. Section 5 provides the results of the field test including the bridge deck deformation and cable vibration in time and frequency domain. The analysis results illustrate the changing bridge modal frequencies under varying pedestrian loads.

| PROPOSED VISION-BASED SYSTEM
Applying a vision-based system for structural displacement measurement requires setting up one or more cameras in a stable location aimed at the "targets" of interest and deriving target motions through video processing techniques. Targets could be either artificial (i.e., light-emitting diode markers or planar targets with chessboard pattern) or natural structure features (i.e., bolts or holes). However, natural targets are preferred on site with reduced installation efforts for the monitoring system. In the proposed system, the hardware comprises one consumer-grade camera (i.e., GoPro Hero 4 Black) and a tripod shown in Figure 1. The recorded video files are post-processed in a custom-developed video processing package to extract the displacement information of structure. The programming environment is Visual Studio 2015 using C++ language and partly referring to OpenCV library.
The role of the video processing package is tracking the target locations in image sequences and transforming the target location information in images to a time history of structural displacements. The procedures could be fitted into FIGURE 1 Hardware of vision-based system consisting a GoPro camera and a tripod a three-component framework shown in Figure 2, namely, camera calibration, target tracking, and structural displacement calculation. The measured displacement data could be interpreted for the evaluation of structural condition, for example, system identification.
When the monitoring campaign is only for system identification and precise spatial measurements are not necessary, [36,37] for example, cable vibration measurement, target tracking may be the only part of the whole video processing procedure needed. The prerequisite is that the cable's depth change is much smaller than the camera-to-cable distance and that the cable location is close to the camera's principal axis so that the mapping between the world coordinates and the image plane becomes approximately linear. [36] This section mainly introduces the video processing package and data interpretation methods used in bridge monitoring campaigns. Sections 2.1 and 2.2 provide details of the two components, camera calibration and target tracking, whereas Section 2.3 demonstrates the system identification methods to analyse the monitoring data.

| Two-step camera calibration
Camera calibration is aimed at determining the transformation metric between the image natural units (pixels) and the real world units (e.g., millimetres), and the structural displacement could be easily derived from the change of structural coordinates given the image location of a target (output of target tracking) and a transformation metric (output of camera calibration).
Mathematically, the projection process from a three-dimensional (3D) spatial domain to a two-dimensional (2D) image plane loses some geometric information of the target. Thus, in a single-camera system, the calibration is realised by reducing the dimensions of target motion, that is, assuming that the target moves within a plane in 3D space. The projection is then simplified as a 2D-to-2D transformation, enabling the recovery of the 2D structural displacement. For bridge applications, the dominant motions under traffic or pedestrian loads are in the vertical direction, making it feasible to neglect the lateral or the longitude motions in a short-time monitoring campaign.
Several methods are available to determine the transformation metric.
• Scaling factor is the simplest method based on one-dimensional feature correspondence or camera-to-target distance thus is very popular in civil applications, for example, Liao et al., [8] Feng et al., [9] Stephen et al., [13] Ye et al., [16] and Yoon et al. [17] This method is based on an assumption that the camera principal axis is perpendicular to the structural surface plane or the two motion directions of interest, which sets constraints on camera position and orientation on site. Although the scaling factor estimated by a known dimension is less sensitive to camera positioning, the calibration should be applied separately to each target using one adjacent dimensional feature along the same direction as the movement of interest. • Planar homography matrix is a transformation metric that links the 2D image plane with the 2D structural surface plane and is applied for the 2D motion estimation. [38,39] The calibration is based on at least four sets of 2D-to-2D point correspondences, [40] that is, structural coordinates of points in 2D structural surface plane and image coordinates of their projections in 2D image plane. • Full projection matrix is the general form of transformation metric between the 2D image plane and the 3D structural coordinate system with no assumption, with example applications in Oh et al., [19] Martins et al., [24] Park et al., [41] and Kim et al. [42] The calibration process usually comprises two steps, (a) offline calibration in the laboratory to determine camera intrinsic parameters [43] and (b) site calibration to estimate the camera extrinsic matrix (i.e., camera position and orientation relative to the structural coordinate system) based on at least four sets of point correspondences. The full projection matrix is the multiplication of the camera intrinsic matrix and camera extrinsic matrix.
Scaling factor is inappropriate for site applications due to the prerequisite of camera perpendicular configuration or the required dimensional features. Estimation of either planar homography matrix or full projection matrix requires FIGURE 2 Procedures of video processing package and the corresponding output at each step. 2D = two-dimensional some known geometric information in the structure that is commonly acquired with the assistance of some artificial targets, for example, planar targets [42,44] and a 3D calibration object. [24] Because the offline camera calibration step in the full projection matrix method could consider lens distortion that is common in consumer-grade cameras, the full projection matrix method is used in the video processing package.
In the package, offline camera calibration is performed in the laboratory using the camera to observe a chessboard target in different views in order to obtain the camera intrinsic matrix and the lens distortion parameters. For the site calibration, the camera extrinsic matrix is derived based on at least four sets of 2D-to-3D point correspondences. Because completely non-contact sensing is preferred, the required geometrical information is acquired from the as-built drawings, for example, the bridge span length and the pylon height. To consider the lens distortion, instead of correcting the full frame before the target tracking step, the correction occurs after the target tracking step only to the image coordinates observed from the raw frame in order to save computation efforts. Finally, the 2D structural displacement along the vertical and longitudinal directions is derived based on the corrected image coordinates and the full projection matrix.

| Target tracking techniques
Target tracking is aimed at determining target locations in frame sequences of a video record with several techniques available: • Correlation-based template matching is a classic and widely used technique, [8,9,13,14,16] which is realised by searching for an area in a new frame most closely resembling the reference (or template) that is predefined as a rectangular subset in the initial frame. • Optical flow estimation is an established method that detects motions or flows of each pixel within the predetermined target region based on one temporal and one spatial constraints. [45] Lucas and Kanade optical flow estimation [46] has been validated in a laboratory test of a multistorey metal tower for system identification [17] and applied in field monitoring of bridge stay cables during normal operation. [18] • Feature point matching is an efficient approach that detects key-points in two images independently and then finds point correspondences based on their local appearance. Currently, the applications in structural monitoring are limited, two examples being displacement monitoring test in a stadium structure using FREAK matching [26] and in a viaduct system using SIFT matching. [25] • Shape-based tracking is used to match special target shapes and patterns between two images, that is, line-type target [34] or custom-made targets with white and black squares. [47,48] They do not have generality for all target patterns.
Target tracking methods for the deck and cable targets are chosen separately considering their pattern features.

| Tracking deck targets
Correlation-based template matching and feature point matching are the two potential approaches for tracking the deck target regions. Correlation-based template matching has been applied for structural displacement monitoring on a railway bridge, [9] a long-span bridge, [16] and a high-rise building for tracking specific patterns, [13,14,44] light-emitting diode lamp targets, [16] and feature targets on structural surfaces. [9] One critical advantage of this method is the minimal user intervention, limited to specifying the template region in the reference frame. However, the method is sensitive to lighting changes [23,49] and changes of background conditions. [50] Also, the method is not the ideal choice for tracking slender structural components, because a rectangular template might include background pixels that move differently from the structural elements. Feature point matching is an alternative for the target tracking based on the key-point detection and matching. Keypoints in computer vision are those that are stable, distinctive, and invariant to image transformation such as building corners, connection bolts, or other patches with interesting shapes. [51] Instead of the raw image intensities, a feature descriptor is used for matching that is a complex representation based on the shape and appearance of a small window around the key-point. Thus, this technique is less sensitive to illumination change, shape change, and scale variation. However, feature point matching requires the target region to have rich textures for saliency during the whole recording period. Also several threshold parameters need to be specified according to users' experience or judgement, for example, contrast threshold for the feature detector and distance threshold in the matching criteria. These threshold values might depend on environmental conditions, for example, the threshold for outlier removal varies with the illumination condition. [26] The existing applications are mainly focused on the short-range measurement, [25,26,50,52] whereas the feasibility for long-range monitoring and the stability over several hours are not validated yet.
For the studied footbridge, natural features near the bridge deck along the length direction are available for tracking, but these features are not very distinctive. The monitoring was continued over several hours, recording the different occupation states of the bridge. Thus, the automatic tracking with little user adjustment under varied environmental conditions was preferred. With these considerations, correlation-based template matching is used for deck target tracking in the video processing package.
The tracking process is given in Figure 3. A target region is selected as the template that is a subset image in the reference frame. A correlation criterion is defined to evaluate the similarity degree between the template and the new frame. Zero-mean normalised cross-correlation coefficient is used as the correlation criterion that offers robust noiseproof performance and is insensitive to offset and linear scale in illumination. [53] The target location in the new frame corresponds to the peak location in the zero-mean normalised cross-correlation coefficient matrix that has the resolution at pixel level. Subpixel interpolation schemes [9] are required to refine the tracking results. The interpolation method used in this study is zero-padding in frequency domain using matrix multiplication form of discrete Fourier transform. [54]

| Tracking cable targets
To enable a wider field of view covering the majority of the bridge, bridge cables are projected to be slender lines in a camera image, for example, with less than four pixels along the width direction. Correlation-based template matching is inappropriate in this case because pixels within a selected template (a rectangular subset from the reference frame) might cover cable segments and some background (e.g., clouds and tree branches) with inconsistent motions. Optical flow estimation method faces challenges due to the limited numbers of salient feature points. The cable tracking method based on edge detection is more robust to the variations of local features and is used in the video processing package. The cable tracking consists of two steps: edge detection and motion estimation. Edge detection is aimed at determining the cable location in a small subset window, whereas the cable motion is estimated from the distance between two extracted edges.
In the edge detection step, a region of interest including a small cable segment is selected for tracking shown in Figure 4 a. Because edge points have significant local changes in image intensity that lead to a local peak in the first derivative, image gradient is a common measure for edge detection. One of the gradient-based edge detectors, Sobel operator [55] is used to detect the probable edge points (in pixel level) through calculating the image gradients among three-by-three neighbourhood and thresholding the magnitude of gradients. Zernike moment operator [56,57] is then applied to relocate the edge precisely from the points detected by Sobel operator in Figure 4b. Zernike moments are constructed by mapping the image onto a set of complex polynomials through convolving the image intensity matrix with three predetermined masks. Three edge parameters (i.e., step height, perpendicular distance from the mask centre, and the edge direction with respect to one image axis) are estimated from Zernike moments for the probable edge points, and the edge parameters are then used as criteria to remove outliers and to refine image coordinates for the remaining edge points (in subpixel level). The direction of cable segment within the selected image region is determined by line fitting among the remaining edge points in Figure 4d.
The cable motion is then estimated from the distance between two edge lines along one assumed motion direction (see Figure 5). Even if the assumed motion direction deviates from the true direction, the motion estimate is proportional so not affecting identification of cable modal frequencies.

| Monitoring data interpretation
The acquired displacement data from the video processing package could be used for data interpretation, for example, extracting the structural dynamic properties. Because the monitored structure in this study is a cable stayed footbridge loaded by crowds of passing pedestrians, the modal properties might vary with the loading.
For system identification, Welch's method [58] was used to estimate the power spectral densities of monitoring data by computing the average of periodograms, and the modal frequencies were estimated through peak picking. The datadriven stochastic subspace identification (SSI) method [59] was also used to extract the modal frequencies and mode shapes through estimating a state-space model from measurement data and performing eigenvalue decomposition to the state-space model.
In time-frequency analysis, the continuous wavelet transform (CWT) was used to acquire the time-frequency distribution of measured signals including the displacement and acceleration responses. The complex Morlet wavelet was set as the mother wavelet with the relevant parameters (frequency bandwidth and central frequency) tuned according to the    [60][61][62] The instantaneous frequencies were extracted from the ridges of wavelet transform modulus using the modulus maxima method. [63]

| VALIDATION TEST IN LABORATORY
The proposed vision-based system was first validated on a beam structure in controlled laboratory conditions. Because the cable tracking method based on the edge detection has been validated in the previous work, [23] the focus of this laboratory test is to investigate the working performance of correlation-based template matching method for structural displacement measurement. The system was applied to measure the displacement responses of several points in the beam when repeatedly set into free vibration. The measured data during the stationary periods were used to evaluate the measurement accuracy, whereas the measured data under the excitation periods were evaluated by comparison with the accelerometer measurement. Section 3.1 describes the tested beam structure and sensors used, and Section 3.2 evaluates the accuracy of the displacement data measured by vision-based system.

| Description of a beam structure and the monitoring test
A simply supported beam structure was created in the Structures Laboratory of University of Exeter by mounting a steel circular hollow section (tube) on the top of two columns and holding in place with C-clamps. The beam structure, with a span of 5.70 m, is shown in Figure 6.
To generate vertical free vibration, the beam was repeatedly pulled down with a rope and released, and the free vibration response was monitored by a combination of vision-based system and four wireless accelerometers.
In the vision-based system, a GoPro Hero 4 Black camera was mounted on the top of a tripod approximately 4.59 m from beam mid-span. The entire beam was in the field of view with one sample frame indicated in Figure 6b showing an obvious lens distortion effect in the four corner regions of the captured frame. The nominal sample rate was set as 60 Hz, whereas the actual rate was 59.94 Hz. Narrow field of view setting was selected with the corresponding focal length equivalent to 30-34 mm. The image dimensions were 1,920 × 1,080 pixels.
The video processing consists of the three main steps to extract the time histories of beam displacement, camera calibration, target tracking, and displacement calculation. In terms of camera calibration, camera intrinsic matrix and lens distortion parameters were predetermined by analysing the chessboard images taken from different views in the laboratory. The lens distortion parameters were used to correct the lens distortion influence with the corrected frame shown in Figure 6c. Camera extrinsic matrix was determined based on five pairs of point correspondences between the structural coordinate system and the image plane. The structural coordinate system was defined as the origin at the mid-span of the beam with the X axis along the beam span direction and the Y axis in the vertical direction (see Figure 6a). The control points (CPs) with known structural coordinates used for calibration are marked in Figure 6c as red dots. The camera extrinsic matrix was derived by minimising the total reprojection error between the observed image points and the calculated projected image points given the estimated projection relation. The reprojected image points according to the estimated camera extrinsic matrix and the structural coordinates are indicated in Figure 6c as the "+" markers in light colour. The coordinate information and reprojection errors are given in Table 1. An obvious deviation between the observed and the projected image points (reprojection error = 10.4 pixel) occurs at CP3 that is the mid-span point of the beam. This deviation might be caused by the error in the provided structural coordinates of CP3 because the initial deformation of the beam induced by self-weight was not considered.
In the second step of target tracking, four targets (T1~4) were chosen for tracking along the span direction located at 1/8, 1/4, 3/8, and 1/2 span points of the beam, respectively. When using the correlation-based template matching method, a planar area with a proper projection size in the video frame (e.g., 40 pixels) is required as the target region. This could be easily satisfied in the bridge monitoring test, for example, using the deck area in Baker Bridge. However, the required planar area was not available in the structural surface of this simply-supported beam due to the long, thin shape. Thus, several planar boards with black random patterns were attached to the beam for selection as regions of interest, shown in Figure 6a.
In the last step, the structural displacement along the vertical and longitude directions was estimated based on the camera calibration and target tracking results.
Reference sensors were required for evaluating the measurement by the vision-based system. Conventional sensors for displacement measurement such as linear variable differential transformer and dial gauges would not work because the target regions on the beam structure were over 2 m higher than the stationary base (the ground). Integration schemes from accelerometer data have some limitations due to low frequency noise and attempts to filter it out that result in loss of quasi-static components. However, for a short signal, for example, a few seconds, it is possible to derive the reliable displacement information from accelerometer measurement. [64] With the consideration of feasibility and installation effort, wireless accelerometers were chosen as the reference in this laboratory test. Four triaxial APDM Opal™ wireless sensors were attached to the beam to record the acceleration responses. The sensors were fixed at the top right of target plate using black tape, as indicated in Figure 6a. The Opal sample rate was set to 128 Hz.
The GoPro camera and the wireless sensors had independent clocks but were separately synchronised with an online reference time. Before comparing the signals, the small time shift (i.e., 0.004 s) between the two sensing systems clocks was corrected by finding the maximum of cross-correlation of two velocity signals respectively derived from acceleration data and displacement, both resampled at 256 Hz.

| Measurement and analysis results
Four points (T1~4) along the half span of the beam structure were tracked by the vision-based system with the measured displacement in vertical direction shown in Figure 7. Free vibration was induced three times by pulling and releasing at mid-span, and in each case, vibrations decayed within 5 s. The third free vibration response is the strongest and clearest and is replotted in Figure 7b with expanded timescale for a clearer visualisation. The maximum deformation at the midspan (T4) reaches 5.93 mm at the time T = 40.5 s with the corresponding deflection at 1/8, 1/4, and 3/8 points (T1/2/3) at 1.20, 3.09, and 4.85 mm, respectively. Vibrations decayed to less than 0.3 mm within 2 s and showed a modal frequency of approximately 4.5 Hz. The displacement measurement shown in Figure 7a includes data during several stationary periods (i.e., the time intervals of [0, 12], [17,24], [31,38], and [46,55] s). The data samples collected during these periods were used to evaluate the measurement accuracy of the vision-based system. The non-zero measured data are regarded as the measurement error because the true value of displacement is zero. The estimated distribution of measurement error is shown in Figure 8 indicating standard deviations of measurement error at the four targets (T1~4) to be almost identical, varying from 0.018 to 0.019 mm. The measurement accuracy with 95% confidence interval was estimated by the standard deviation multiplying a critical value determined from the T distribution (i.e., ±1.96). Thus the measurement accuracy during the stationary period was ± 0.037 mm. For dynamic displacement data, the measurement accuracy might be decreased because the possible deviation in estimated projection transformation influences more on the measured displacement with larger amplitudes.
Accelerometers were used as reference sensors to evaluate the measurement accuracy of vision-based system because their acceleration resolution at 4.5 Hz, limited by noise of 128 μg/√Hz, translates to velocity resolution of 0.044 mm/s and displacement resolution of 0.0016 mm in the band 4.5 ± 0.5 Hz. Ideally, the displacement data could be directly recovered from the accelerometer measurement through double integration. However, to mitigate the amplification and accumulation of acceleration error during the integration procedures, the accelerometer measurement was integrated to velocity response that was then compared with velocity derived from vision-based measurement.
Acceleration response at the beam mid-span (T4) during the third period of free vibration was truncated for comparison with displacement data for the corresponding period shown in Figure 7b. Figure 9a indicates the measured acceleration and displacement from Opal accelerometer and vision-based system. The derived velocity results are shown in Figure 9b indicating high similarity with 98.86% cross-correlation coefficient, whereas the normalised root mean square deviation between two velocity signals is 3.70 mm/s compared with the maximum amplitude value of velocity response at 95.89 mm/s.
Having demonstrated the reliability of the vision-based system for displacement measurement in the laboratory, the proposed system was applied in a monitoring test of a full-scale footbridge having dominant vibration modes with frequencies below 2.5 Hz. The low frequencies make it more challenging to recover the deflection information from accelerometer measurement. Therefore, the vision-based system has the advantage in quantifying the quasi-static deflection under heavy loads and also provides the capacity for the evaluation of bridge dynamic performance.

| FIELD TEST ON A CABLE-STAYED FOOTBRIDGE
The vision-based system was applied in a monitoring test of a cable-stayed footbridge, Baker Bridge in Exeter, UK. This section described both the bridge and the configuration of vision-based system on site.

| Bridge description
Baker Bridge is a 109-m cable-stayed footbridge crossing the A379 dual-carriageway in Exeter, UK (see Figure 10). The bridge provides cyclist and pedestrian access to Sandy Park Stadium (south side of bridge), the home ground of Exeter Chiefs  [17,24], [31,38], and [46,55] s), STD = standard deviation.
Rugby Club, and thus experiences heavy pedestrian traffic on match days. The bridge comprises a single A-shaped tower that supports the continuous steel deck over a simple support at the pylon cross-beam and via seven pairs of stay cables.
In a previous ambient modal test, [65] four modal frequencies below 2.5 Hz were observed in the vertical direction, that is, 0.94, 1.62, 2.0, and 2.24 Hz. Thus, the bridge has noticeable vibration response due to pedestrian traffic.  The video processing procedures consist of three main steps similar to those in the laboratory validation test. During the camera calibration, camera intrinsic parameters were determined ahead of the test. A sample corrected frame after removing the influence of lens distortion is shown in Figure 11b. Camera extrinsic matrix was determined on site based on several pairs of 2D-to-3D point correspondences. The structural coordinate system was specified with the origin at the deck height of the tower section, the Y axis along the vertical direction and the Z axis along the transverse direction (see Figure 10a,b). The CPs are marked in Figure 11b with known structural coordinates from the as-built drawings provided by the Devon County Council: CP1-4 along the edge of the bridge tower and CP5-11 near the outer-section of the crossbeams to which the cables are secured.  For target tracking, four targets (D1-4) along the deck longitude direction and two targets (C1-2) at the cable edges were chosen for tracking, all at the south-west side of the bridge in Figure 11a. The pixel dimensions of the selected targets in video frames are indicated in Table 2. Due to the limited availability of stable features in the bridge deck, the height of these deck targets is approximately 20 pixels, smaller than the suggested value (40 pixels) in the previous study. [23] For deck targets (D1-4), the structural displacement along the longitude and vertical direction was estimated based on camera calibration and target tracking results. For the cable targets (C1-2), the cable motion estimated in the target tracking step was directly outputted.

| Description of a monitoring test
As well as the vision-based system, six triaxial wireless accelerometers (APDM Opal™) were installed on the bridge: four (B1-4) on the deck parapet and two (A1-2) on the cables with locations marked in Figure 10a. The purpose of the Opal sensors was to corroborate the identification of modal parameters of bridge deck and cables obtained using the vision-based system. The Opal sensors B1-4 corresponded to the target regions D1-4 in the vision-based system, whereas the sensors A1-2 were collocated to the same cable sequences as the target regions C1-2. The sample rate was set to 128 Hz.

| MEASUREMENT AND ANALYSIS RESULTS
In this section, the measurement results obtained by the vision-based system are illustrated in time and frequency domains. The time interval for analysis from 16:39 to 17:14 (35 min) thus included periods when large crowds of spectators crossed the bridge on the way home after the match. The measured data from the vision-based system were analysed to investigate the dynamic properties of the bridge including the changing modal frequencies under varying pedestrian loads. Sections 5.1 and 5.2 demonstrate the measurement and analysis results of bridge deck displacement and cable vibration, respectively.

| Measurement and analysis of deck displacement
The vertical displacements of the four deck targets along the bridge span are described in this section. The measured data are presented in time domain (Section 5.1.1) and frequency domain (Section 5.1.2), respectively.

| Time history measurement of vertical displacement
Four deck targets (D1-4) were tracked with the time histories of vertical displacement shown in Figure 12 and four extracted frames from the video files in Figure 13. During the recording, the bridge changed from almost empty (Figure 13a) to almost full (Figure 13b), then reverting to a trickle of pedestrians (Figure 13d).
In Figure 12, an obvious downward trend of the bridge deck is observed from 800 to 1,250 s in the measured data at D1 and D2 with the maximum deformation value reaching 72.58 and 64.10 mm, respectively. A quick deformation recovery is seen at approximately 1,300 s from the measurement at D1-D3 that should correspond to a sudden reduction in bridge loading. The captured frame at 1,315 s ( Figure 13c) shows a clear gap (approximately 16.5 m) between two groups of pedestrians, which accords with observations from the measured data.

| Frequency components of vertical displacement
The power spectral densities of vertical displacement and acceleration measurement were estimated using Welch's method with the window length of 1-min duration and a 50% overlap. Figure 15 illustrates the estimation results for the signals recorded during three time intervals, that is, ([0, 400], [800, 1,200], and [1,600, 2,000] s). The first time interval ([0, 400] s) was at the end of the match (during stoppage time), and thus, few pedestrians crossed the bridge; the second duration ([800, 1,200] s) was after the Rugby match, and the bridge was almost fully occupied by pedestrians; and the third time interval was after most spectators had left, and still a few pedestrians were crossing the bridge. Acceleration data of the deck at B3 were not available due to a faulty battery.  Figure 15a, four apparent modal frequencies are identified using peak-picking with the values of 0.92, 1.61, 2.00, and 2.23 Hz, which match well with the results from acceleration measurement in (b). The displacement measurement at the deck point D1 contains significant quasi-static response due to the local deformations resulting from passing pedestrians (see Figure 14), preventing identification of the first modal frequency at 0.92 Hz.

FIGURE 16
Mode shapes and frequency estimates of the bridge longer span: blue curves represent the mode shapes extracted from the previous ambient modal test [65] corresponding to the longer span closest to the stadium; and red dots represent the mode shapes extracted from displacement data measured by vision-based system In the second time interval shown in (c), only the second modal frequency is clearly indicated, with the value decreased to 1.48 Hz (from 1.61 Hz). The signal power near this frequency value is increased sharply compared with the data in the other two periods. The shift of the second modal frequency is also observed in (d) from acceleration data.
In the third time interval shown in (e), the second mode still contains the highest power with frequency value shifted back to 1.59 Hz. The first and third modal frequencies are identified with the same values (0.92 and 2.00 Hz) as in the first time interval. The observations match well with the analysis results of the acceleration data shown in (f).
Through the analysis, it indicates that • The measured data by vision-based system capture the modal frequencies of the bridge deck accurately through the comparison with the acceleration data; • The second mode of the bridge deck is very sensitive to the occupation status of the bridge and the frequency value reduced from 1.61 to 1.48 Hz with full pedestrian occupancy, corresponding to a reduction of 8%.
The SSI method [59] was used to identify the modal frequencies and mode shapes from the collected data in the third time interval (i.e., [1,600, 2,000] s), and the analysis results were compared with those observed from a previous ambient modal test using APDM Opal sensors. [65] Figure 16 compares results for the two bending modes in vertical direction: • The second modal frequency estimated by displacement data is 1.58 Hz, lower than the value (1.62 Hz) reported in Brownjohn et al. [65] This is due to more frequent crossing pedestrians on the test day. • The third modal frequency (2.00 Hz) estimated by displacement data matches the value in Brownjohn et al. [65] • For these two bending modes, the mode shape ordinates (red circular dots) at the points D1-4 predicted by the visionbased measurement match well with the mode shapes (blue curves) previously estimated in Brownjohn et al. [65] The frequency responses of measured signals in Figure 15 indicate dependency on time, whose study requires timefrequency analysis rather than methods based on the Fourier transform (e.g., Welch's method) that are designed for the analysis of stationary signals. CWT analysis was therefore used to acquire the time-frequency distribution of displacement and acceleration measurement. Figure 17a indicates the CWT results for displacement measurement at the deck target D1 with the frequency range from 1.3 to 1.8 Hz that covers the variations of the second modal frequency of the bridge deck (varying from 1.48 to  Figure 15). During the analysis, the two parameters (frequency bandwidth f b and central frequency f c ) in the complex Morlet wavelet were tuned according to the minimisation of Shannon wavelet entropy, reaching optimal parameters at f b = 4.5 Hz and f c = 29 Hz. To consider the edge effect, the influenced region was estimated to be 40-s duration according to Yan and Miyamoto, [61] and the padding scheme of reflecting the signal at two ends was used to mitigate the edge effect. A threshold (e.g., −0.5 in Figure 17a) was set for the wavelet transform modulus value during the plotting for a clear visualisation. The instantaneous frequencies were estimated by the modulus maxima at each time step and are shown as the sparse dots in the figure. The results in Figure 17a indicate an obvious variation of the second modal frequency during the recording period.
• In the first 500 s, the modal frequency has small deviation with the value over 1.60 Hz.
• During the time interval from 600 to 1,100 s, a sharp decrease of the modal frequency value is observed with the lowest shifting to approximately 1.37 Hz, a reduction of 15%. • The data after 1,500 s reflect a recovery of frequency value to approximately 1.58 Hz.
These observations match well with the analysis results for acceleration data (B1) shown in Figure 17b.

| Measurement and analysis of cable vibration
This section presents the measurement results of cable vibration using the vision-based system. The measured data were directly used to estimate cable modal frequencies by peak-picking from power spectral densities and using SSI. To evaluate the variations of cable modal frequency with changing pedestrian loads, the CWT analysis was performed on the measured data to identify the time-frequency distribution of cable vibration that was compared with the observations from acceleration measurement. Two cable targets C1 and C2 shown in Figure 11a were tracked, with the time histories of cable motion shown in Figure 18a,c. The cable motion here corresponds to the motion of cable projection in the image in pixel units. The power spectral densities of the cable motions during the three time intervals (i.e., ([0, 400], [800, 1,200], and [1,600, 2,000] s) are indicated in Figure 18b,d for the cable targets C1 and C2, respectively.
• The modal frequency of the cable C1 is approximately 1.66 Hz for the first and third time intervals and slightly increased for the second time window. • For the cable C2, the modal frequency could be identified as approximately 2.10 Hz during the first and third time intervals, whereas the analysis result for the data from the second time interval indicates no obvious peak frequency, but rather a frequency range with higher energy near 2.2 Hz.
The SSI method was used to identify the modal frequencies from the collected displacement and acceleration data during the third time interval (i.e., [1,600, 2,000] s). Cable target C1 (in Figure 11a) and accelerometer A1 (in Figure 10) correspond to the same bridge cable (the longest one, in the southwest side), whereas the sensor locations were different. C1 was at approximately ¼ span point close to the bridge tower, and A1 was in the lower height close to the cable end where it is attached to the bridge deck. Similarly, C2 and A2 correspond similarly.
The displacement signal at C1 indicates two close modal frequencies at 1.59 and 1.66 Hz, whereas the acceleration signal at A1 captures the first modal frequency at 1.63 Hz as well as the higher modal frequencies at 4.96, 6.62, and 8.27 Hz. Through comparison, the fundamental frequency of the longest cable (C1) was at approximately 1.66 Hz. The mode at 1.59 Hz identified from the displacement signal might correspond to the second bending mode of the bridge deck. The first modal frequency estimated from the acceleration data is different from the estimated fundamental frequency, which might be due to mixing of frequency responses between the cable fundamental mode and the second bending mode of bridge deck.
The displacement signal at C2 indicates modal frequencies at 2.10 and 4.16 Hz, whereas the acceleration data at A2 capture modal frequencies at 2.12, 6.30, 8.36, and 10.48 Hz. Therefore, the fundamental frequency of the second longest cable (C2) was at approximately 2.10 Hz.
The analysis indicates that the vision-based system works better to capture the lower modal frequencies of cables, whereas the accelerometers provide reliable estimations of higher frequency modes.
CWT analysis was performed to acquire the time-frequency distribution of cable vibrations from the vision-based system and accelerometers. The analysis results are shown in Figures 19 and 20 for the cables C1 and C2, respectively. In

FIGURE 19
Contour plot of continuous wavelet transform analysis results of cable vibration for the longest cable in the south-west side of the bridge: (a) wavelet transform modulus for the cable motion (C1) measured by vision-based system at the frequency range of [1.4, 1.8] Hz with the estimated instantaneous frequencies marked as sparse dots and (b) wavelet transform modulus for the cable vibration measured by the accelerometer (A1) at the frequency range of [7,9] Hz with the estimated instantaneous frequencies marked as sparse dots terms of accelerometer measurement, instead of plotting directly the results near the fundamental frequency of the cable, higher frequency ranges, that is, near the fifth modal frequency for the measurement at A1 and near the third modal frequency for the measurement at A2, are illustrated in Figures 19b and 20b, with the corresponding values near the fundamental frequency marked in the right y axis. Figure 19a indicates the CWT analysis results for the measurement at the cable C1 by vision-based system. The instantaneous frequencies estimated by the modulus maxima (shown in the figure as sparse dots) initialled at approximately 1.66 Hz rose to over 1.70 Hz during the time interval from 900 to 1,160 s and then recovered to 1.66 Hz after 1,500 s. Because the time interval from 900 to 1,160 s corresponds to the period where the deck points D1 and D2 experienced a large deformation (see Figure 12), the observations indicate that heavy pedestrian loads on the bridge lead to a rise in cable modal frequency by 2.4%, probably by increasing the cable tension. Compared with the analysis result of acceleration measurement shown in Figure 19b, the time-frequency distribution acquired by vision-based measurement captures the general trend of frequency shift under pedestrian loads over the whole 35 min. However, some details of frequency variation within a short-time range are only identified by acceleration measurement, for example, a sharp decrease and recovery of cable modal frequency at approximately 1,280 s.
As well as the cable C1 modal frequency at approximately 1.66 Hz, another less obvious mode is indicated in Figure 19a with frequency value lower than the cable fundamental frequency. This mode is salient in the lighter loading condition, for example, (a) in the time interval from 200 to 900 s with the modal frequency decreasing from 1.62 to 1.46 Hz and (b) the time interval from 1,400 to 2,100 s with the modal frequency increasing from 1.53 to 1.60 Hz. The observed mode shows a similar trend as the variation of the second modal frequency of the bridge deck. Therefore, this mode might be due to forced vibration of the cable by motion of the bridge deck.
The CWT analysis results for the measurement by vision-based system at the cable C2 are indicated in Figure 20a with the estimated instantaneous frequencies shown as sparse dots. The cable modal frequency started off at approximately 2.08 Hz when the bridge was occupied by only a few pedestrians. An obvious rise of modal frequency (exceeding 2.2 Hz) is observed during the time range from 900 to 1,260 s when the bridge was under heavy pedestrian loads. Compared with the quiet period, the maximum shift of cable modal frequency reaches 9.1% with the frequency value reaching 2.27 Hz. In the period after 1,500 s, the modal frequency of cable vibration was recovered to approximately 2.1 Hz. These observations match well with those from Figure 20b corresponding to the acceleration measurement (A2) of the same cable. However, the analysis results of acceleration data illustrate better resolution of the variations of cable modal frequency with time, especially during heavy load periods from 900 to 1,260 s.

FIGURE 20
Contour plot of continuous wavelet transform analysis results of cable vibration for the second longest cable in the south-west side of the bridge: (a) wavelet transform modulus for the cable motion (C2) measured by vision-based system at the frequency range of [1.9, 2.3] Hz with the estimated instantaneous frequencies marked as sparse dots and (b) wavelet transform modulus for the cable vibration measured by the accelerometer (A2) at the frequency range of [7,9] Hz with the estimated instantaneous frequencies marked as sparse dots

| DISCUSSION OF MEASUREMENT ACCURACY OF VISION-BASED SYSTEM
In the field test, the multipoint displacement measurements by the vision-based system were validated to be viable for tracking both deformation induced by passing pedestrians and modal properties of the deck and cables under varying pedestrian loads. Based on this demonstration, the procedure would be viable for other (e.g., larger) bridges.
The issue of measurement accuracy of a vision-based system is critical but hard to quantify, especially on site. In the laboratory validation test, the accuracy level was evaluated as measurement of a stationary structure, as well as by comparing measurement using the vision system with accelerometer data, both converted to velocity. However, the measurement accuracy might not be directly comparable with that obtained in other applications such as in the field.
The measurement accuracy of a vision-based system depends on several parameters, for example, camera-to-target distance, [26] estimation of camera intrinsic parameters, dimension information, [24] and dispersion of target tracking results in images. [67] Theoretically, displacement measurement using a vision-based system is derived from two parts: (a) target tracking results and (b) the transformation metric between the real structure and their projection in image.
• In terms of target tracking, the nominal algorithm resolution can be better than 0.01 pixel with an interpolation scheme, whereas the reported accuracy varies from 0.5 to 0.01 pixel. [67] In this study, the tracking accuracy was quantified to be 0.013 pixel in the laboratory condition, whereas the tracking accuracy in the field test was not evaluated. The ideal image size of the target for correlation-based template matching is suggested to be no less than 40 × 40 pixels [23] to ensure good performance. For field application, determining the camera set-up location should consider the balance between measurement accuracy and the possibility to monitor a large portion of the bridge. • During the camera calibration in this case, the structural coordinates of CPs were derived according to the as-built drawings that might not represent the current condition, for example, effects of self-weight deflection. Because the calibration process of camera extrinsic matrix is by minimising the total reprojection error between the detected image points and the calculated image projection points based on least-squares optimisation, a better and more stable estimation might be made using more CPs.
Other factors on site might influence the measurement accuracy and stability using a vision-based system. For example, Figure 12 shows that the displacement measurement at the deck point D1 did not recover to the initial condition; this might be due to the error caused by camera movement. During the recording, data loss was found due to partial obstruction of the target and pattern blur by raindrops. The measurement could also be influenced by atmospheric refraction and turbulence. [68] For a robust sensing system, the measurement accuracy and uncertainty are required for quality assurance and metrological traceability; thus, further study is necessary.

| CONCLUSIONS
A non-contact single-camera vision-based system used for non-contact measurement of bridge displacement provided results comparable with those obtained using an array of wireless accelerometers and offered additional information about quasi-static response to varying pedestrian loads.
In the laboratory validation test, the measurement accuracy of vision-based system was evaluated to be ±0.037 mm under the camera-to-target distance of 5.70 m, but it was not possible to test accuracy directly in the field application, only to compare with another measurement, in this case using the accelerometers.
The multipoint deformation data obtained using the vision system proved to be effective for tracking cable dynamic properties at the same time as bridge deformation, allowing for the effect of varying load on cable tensions to be observed. This provides a powerful diagnostic capability for larger cable-supported structures.