1

STSI: Efficiently Mine Spatio-Temporal Semantic Information between Different Multimodal for Video Captioning

IEEE International Conference on Visual Communications and Image Processing (VCIP), 2022

Spatial-Semantic Attention For Grounded Image Captioning

International Conference on Image Processing (ICIP), 2022

What Happens in Crowd Scenes: A New Dataset about Crowd Scenes for Image Captioning

CrowdCaption is a new challenging image captioning dataset for complex real-world crowd scene understanding, which towards to describe crowd scene.

RefCrowd: Grounding the Target in Crowd with Referring Expressions

Proceedings of the 30th ACM International Conference on Multimedia (ACMMM), 2022

CrossDet: Crossline Representation for Object Detection

IEEE International Conference on Computer Vision (ICCV), 2021

Multi-stage Tag Guidance Network in Video Caption

Proceedings of the 28th ACM International Conference on Multimedia (ACMMM Workshop), 2020