Singh, Gurkirt; Saha, Suman; Cuzzolin, Fabio Predicting action tubes Journal Article 2018, (Proceedings of the ECCV 2018 Workshop on Anticipating Human Behaviour (AHB 2018), Munich, Germany, Sep 2018). Abstract | Links | BibTeX | Tags: Artificial Intelligence, Computer Science, Computer vision, Object recognition, Pattern Recognition, Robot, Robotics Singh, Gurkirt; Saha, Suman; Cuzzolin, Fabio TraMNet - Transition Matrix Network for Efficient Action Tube Proposals Proceeding 2018. Abstract | Links | BibTeX | Tags: Computer Science, Computer vision, Electrical Engineering, Image processing, Pattern Recognition, Robot, Robotics, Systems Science, Visual processing Behl, Harkirat Singh; Sapienza, Michael; Singh, Gurkirt; Saha, Suman; Cuzzolin, Fabio; Torr, Philip H S Incremental Tube Construction for Human Action Detection Proceeding 2018. Abstract | Links | BibTeX | Tags: Action detection, Artificial Intelligence, Computer Science, Computer vision, Detection, Pattern Recognition, Robot
2018
title = {Predicting action tubes},
author = {Gurkirt Singh and Suman Saha and Fabio Cuzzolin},
editor = {ECCV 2018 Workshop on Anticipating Human Behaviour (AHB 2018), Munich, Germany, Sep 2018},
url = {http://openaccess.thecvf.com/content_ECCVW_2018/papers/11131/Singh_Predicting_Action_Tubes_ECCVW_2018_paper.pdf},
doi = {10.5281/zenodo.3362942},
year = {2018},
date = {2018-08-23},
abstract = {In this work, we present a method to predict an entire `action tube' (a set of temporally linked bounding boxes) in a trimmed video just by observing a smaller subset of it. Predicting where an action is going to take place in the near future is essential to many computer vision based applications such as autonomous driving or surgical robotics. Importantly, it has to be done in real-time and in an online fashion. We propose a Tube Prediction network (TPnet) which jointly predicts the past, present and future bounding boxes along with their action classification scores. At test time TPnet is used in a (temporal) sliding window setting, and its predictions are put into a tube estimation framework to construct/predict the video long action tubes not only for the observed part of the video but also for the unobserved part. Additionally, the proposed action tube predictor helps in completing action tubes for unobserved segments of the video. We quantitatively demonstrate the latter ability, and the fact that TPnet improves state-of-the-art detection performance, on one of the standard action detection benchmarks - J-HMDB-21 dataset.},
note = {Proceedings of the ECCV 2018 Workshop on Anticipating Human Behaviour (AHB 2018), Munich, Germany, Sep 2018},
keywords = {Artificial Intelligence, Computer Science, Computer vision, Object recognition, Pattern Recognition, Robot, Robotics},
pubstate = {published},
tppubtype = {article}
}
title = {TraMNet - Transition Matrix Network for Efficient Action Tube Proposals},
author = {Gurkirt Singh and Suman Saha and Fabio Cuzzolin},
url = {https://arxiv.org/abs/1808.00297},
year = {2018},
date = {2018-08-01},
abstract = {Current state-of-the-art methods solve spatio-temporal ac-tion localisation by extending 2D anchors to 3D-cuboid proposals onstacks of frames, to generate sets of temporally connected bounding boxescalled action micro-tubes. However, they fail to consider that the underly-ing anchor proposal hypotheses should also move (transition) from frameto frame, as the actor or the camera do. Assuming we evaluate n2D an-chors in each frame, then the number of possible transitions from each2D anchor to he next, for a sequence of fconsecutive frames, is in theorder of O(nf), expensive even for small values of f.To avoid this problem we introduce a Transition-Matrix-based Network(TraMNet) which relies on computing transition probabilities betweenanchor proposals while maximising their overlap with ground truth bound-ing boxes across frames, and enforcing sparsity via a transition threshold.As the resulting transition matrix is sparse and stochastic, this reducesthe proposal hypothesis search space from O(nf) to the cardinality ofthe thresholded matrix. At training time, transitions are specific to celllocations of the feature maps, so that a sparse (efficient) transition ma-trix is used to train the network. At test time, a denser transition matrixcan be obtained either by decreasing the threshold or by adding to itall the relative transitions originating from any cell location, allowingthe network to handle transitions in the test data that might not havebeen present in the training data, and making detection translation-invariant. Finally, we show that our network is able to handle sparseannotations such as those available in the DALY dataset, while allowingfor both dense (accurate) or sparse (efficient) evaluation within a singlemodel. We report extensive experiments on the DALY, UCF101-24 andTransformed-UCF101-24 datasets to support our claims.},
keywords = {Computer Science, Computer vision, Electrical Engineering, Image processing, Pattern Recognition, Robot, Robotics, Systems Science, Visual processing},
pubstate = {published},
tppubtype = {proceedings}
}
title = {Incremental Tube Construction for Human Action Detection},
author = {Harkirat Singh Behl and Michael Sapienza and Gurkirt Singh and Suman Saha and Fabio Cuzzolin and Philip H. S. Torr},
editor = {British Machine Vision Conference (BMVC). Newcastle-Upon-Tyne, UK},
url = {https://arxiv.org/abs/1704.01358},
year = {2018},
date = {2018-07-23},
abstract = {Current state-of-the-art action detection systems are tailored for offline batch-processing applications. However, for online applications like human-robot interaction, current systems fall short, either because they only detect one action per video, or because they assume that the entire video is available ahead of time. In this work, we introduce a real-time and online joint-labelling and association algorithm for action detection that can incrementally construct space-time action tubes on the most challenging action videos in which different action categories occur concurrently. In contrast to previous methods, we solve the detection-window association and action labelling problems jointly in a single pass. We demonstrate superior online association accuracy and speed (2.2ms per frame) as compared to the current state-of-the-art offline systems. We further demonstrate that the entire action detection pipeline can easily be made to work effectively in real-time using our action tube construction algorithm.},
keywords = {Action detection, Artificial Intelligence, Computer Science, Computer vision, Detection, Pattern Recognition, Robot},
pubstate = {published},
tppubtype = {proceedings}
}
Predicting action tubes Journal Article 2018, (Proceedings of the ECCV 2018 Workshop on Anticipating Human Behaviour (AHB 2018), Munich, Germany, Sep 2018). TraMNet - Transition Matrix Network for Efficient Action Tube Proposals Proceeding 2018. Incremental Tube Construction for Human Action Detection Proceeding 2018.
2018