SARAS-ESAD endoscopic vision challenge for surgeon action detection – July 09, 2020

The challenge provides a unique and benchmark dataset for the development and testing of the action detection algorithms in the field of medical computer vision.

Challenge description

Minimally Invasive Surgery (MIS) is a very sensitive medical procedure, whose success depends on the competence of the human surgeons and the degree of effectiveness of their coordination. The SARAS (Smart Autonomous Robotic Assistant Surgeon) EU consortium is working towards replacing the assistant surgeon in MIS with two assistive robotic arms. To accomplish that, an artificial intelligence-based system is required which not only can understand the complete surgical scene but also detect the actions being performed by the main surgeon. This information can later be used to infer the response required from the autonomous assistant surgeon. The correct detection of surgeon action and its localization is a critical task to design the trajectories for the motion of robotic arms.

This challenge has recorded four sessions of complete prostatectomy procedures performed by expert surgeons on real patients with prostate cancer. Later, expert AI and medical professions annotated these complete surgical procedures for the actions. Multiple action instances might be present at any point during the procedure (e.g., the right arm and the left arm of the da Vinci robot operated by the main surgeon might perform different coordinated actions). Hence, each frame is labeled for multiple actions and these actions can have overlapping bounding boxes. 

The bounding boxes, in the training data, are selected to cover both the ‘tool performing the action’ and the ‘organ under the operation’. A set of 21 actions is selected for the challenge after the consultation with the expert medical professionals. From a technical point of view, then, a suitable online surgeon action detection system must be able to: (1) locate and classify multiple action instances in real-time; (2) connect the detection associated bounding boxes. 

To the best of our knowledge, this challenge presents the first benchmark dataset for action detection in the surgical domain, and paves the way for the introduction, for the first time, of partial/full autonomy in surgical robotics. Within computer vision, other datasets for action detection exist, but are of limited size.


The objective of this challenge is to provide a unique and benchmark dataset for the development and testing of the action detection algorithms in the field of medical computer vision. This challenge will help in evaluating different types of computer vision systems for this specific task. It will also lay the foundation for more robust algorithms that will be used in future surgical systems to accomplish the tasks; like, autonomous assistant surgeon, surgeon feedback systems, surgical anomaly detection, etc.


The task for this challenge is to detect the actions performed by the main surgeon or the assistant surgeon in the current frame. There are 21 action classes in the challenge dataset.

Evaluation Metrics: The task will use mAP the evaluation metric which is a standard metric in all of the detection tasks. As this is the first of its kind task and correct detection of action in the surgical environment is difficult, we will be used a bit relaxed metric for the evaluation. The evaluation will be performed at three different levels of IOU: 0.1, 0.3 and 0.5. The final score will be mean of all the Average Precision values.














This challenge is part of the Medical Imaging with Deep Learning (MIDL, 2020) conference.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.