The goal of pedestrian trajectory prediction is to predict the future trajectory according to the historical one of pedestrians. Multimodal information in the historical trajectory is conducive to perception and positioning, especially visual information and position coordinates. However, most of the current algorithms ignore the significance of multimodal information in the historical trajectory. We describe pedestrian trajectory prediction as a multimodal problem, in which historical trajectory is divided into an image and coordinate information. Specifically, we apply fully connected long short-term memory (FC-LSTM) and convolutional LSTM (ConvLSTM) to receive and process location coordinates and visual information respectively, and then fuse the information by a multimodal fusion module. Then, the attention pyramid social interaction module is built based on information fusion, to reason complex spatial and social relations between target and neighbors adaptively. The proposed approach is validated on different experimental verification tasks on which it can get better performance in terms of accuracy than other counterparts. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
CITATIONS
Cited by 1 scholarly publication.
Visualization
Information visualization
Computer programming
Video
Image fusion
Photonic integrated circuits
Performance modeling