Singh, Gurkirt; Saha, Suman; Cuzzolin, Fabio TraMNet - Transition Matrix Network for Efficient Action Tube Proposals Proceeding 2018. Abstract | Links | BibTeX | Tags: Computer Science, Computer vision, Electrical Engineering, Image processing, Pattern Recognition, Robot, Robotics, Systems Science, Visual processing
2018
title = {TraMNet - Transition Matrix Network for Efficient Action Tube Proposals},
author = {Gurkirt Singh and Suman Saha and Fabio Cuzzolin},
url = {https://arxiv.org/abs/1808.00297},
year = {2018},
date = {2018-08-01},
abstract = {Current state-of-the-art methods solve spatio-temporal ac-tion localisation by extending 2D anchors to 3D-cuboid proposals onstacks of frames, to generate sets of temporally connected bounding boxescalled action micro-tubes. However, they fail to consider that the underly-ing anchor proposal hypotheses should also move (transition) from frameto frame, as the actor or the camera do. Assuming we evaluate n2D an-chors in each frame, then the number of possible transitions from each2D anchor to he next, for a sequence of fconsecutive frames, is in theorder of O(nf), expensive even for small values of f.To avoid this problem we introduce a Transition-Matrix-based Network(TraMNet) which relies on computing transition probabilities betweenanchor proposals while maximising their overlap with ground truth bound-ing boxes across frames, and enforcing sparsity via a transition threshold.As the resulting transition matrix is sparse and stochastic, this reducesthe proposal hypothesis search space from O(nf) to the cardinality ofthe thresholded matrix. At training time, transitions are specific to celllocations of the feature maps, so that a sparse (efficient) transition ma-trix is used to train the network. At test time, a denser transition matrixcan be obtained either by decreasing the threshold or by adding to itall the relative transitions originating from any cell location, allowingthe network to handle transitions in the test data that might not havebeen present in the training data, and making detection translation-invariant. Finally, we show that our network is able to handle sparseannotations such as those available in the DALY dataset, while allowingfor both dense (accurate) or sparse (efficient) evaluation within a singlemodel. We report extensive experiments on the DALY, UCF101-24 andTransformed-UCF101-24 datasets to support our claims.},
keywords = {Computer Science, Computer vision, Electrical Engineering, Image processing, Pattern Recognition, Robot, Robotics, Systems Science, Visual processing},
pubstate = {published},
tppubtype = {proceedings}
}
TraMNet - Transition Matrix Network for Efficient Action Tube Proposals Proceeding 2018.
2018