Surgical Tracking Based on Stereo Vision and Depth Sensing

Project Goals:

The objective of this research is to incorporate multiple sensors at broad spectrum, including stereo infrared (IR) cameras, color (or RGB) cameras and depth sensors to perceive the surgical environment. Features extracted from each modality can contribute to the cognition of complex surgical environment or procedures. Additionally, their combination can provide higher robustness and accuracy beyond what is obtained from single sensing modality. As a preliminary study, we propose a multi-sensor fusion approach for localizing surgical instruments. We developed an integrated dual Kinect tracking system to validate the proposed hierarchical tracking approach.


This project considers the problem of improving the surgical instrument tracking accuracy by multi-sensor fusion technique in computer vision. We proposed a hierarchical fusion algorithm for integrating the tracking results from depth sensor, IR camera pair and RGB camera pair. Fig. 1 summarized the algorithm involved in this project. It can be divided into the “low-level” and the “high-level” fusion.

Fig. 1 Block diagram of hierarchical fusion algorithm.

The low-level fusion is to improve the speed and robustness of marker feature extraction before triangulating the tool tip position in IR and RGB camera pair. The IR and RGB camera are modeled as pin-hole cameras.  The depth data of the tool can be used as a priori for marker detection. The working area of the tracking tool is supposed to be limited in a reasonable volume v(x, y, z) that can be used to refine the search area for feature extraction, which could reduce the computational cost for real-time applications.
The high-level fusion is to reach a highly accurate tracking result by fusing two measurements. We employ the covariance intersection (CI) algorithm to estimate a new tracking result with less covariance.


To demonstrate the proposed algorithm, we designed a hybrid marker-based tracking tool (Fig. 2) that incorporates the cross-based feature in visible modality and retro-reflective marker based feature in infra-red modality to get a fused tracking of the customized tool tip. To evaluate the performance of the proposed method, we employ two Kinects to build the experimental setup. Fig. 3 shows the prototype of multi-sensor fusion tracker for the experiment, which indicates that the CI-based fusion approaches obviously tend to be better than the separate IR tracker or RGB tracker.  The mean error and deviation of the fusion algorithm are all improved.
Hybrid marker

Fig. 3 Dual Kinect tracking system

People Involved

Staffs: Wei LIU, Shuang SONG, Andy Lim
Advisor: Dr. Hongliang Ren
Collaborator: Wei ZHANG


[1] Ren, H.; LIU, W. & LIM, A. Marker-Based Instrument Tracking Using Dual Kinect Sensors for Navigated Surgery IEEE Transactions on Automation Science and Engineering, 2013
[2] Liu, W.; Ren, H.; Zhang, W. & Song, S. Cognitive Tracking of Surgical Instruments Based on Stereo Vision and Depth Sensing, ROBIO 2013, IEEE International Conference on Robotics and Biomimetics, 2013

Related FYP Project

Andy Lim: Marker-Based Surgical Tracking With Multiple Modalities Using Microsoft Kinect


[1] H. Ren, D. Rank, M. Merdes, J. Stallkamp, and P. Kazanzides, “Multi-sensor data fusion in an integrated tracking system for endoscopic surgery,” IEEE Transactions on Information Technology in Biomedicine, vol. 16, no. 1, pp. 106 – 111, 2012.
[2] W. Liu, C. Hu, Q. He, and M.-H. Meng, “A three-dimensional visual localization system based on four inexpensive video cameras,” in Information and Automation (ICIA), 2010 IEEE International Conference on. IEEE, 2010, pp. 1065–1070.
[3] F. Faion, S. Friedberger, A. Zea, and U. D. Hanebeck, “Intelligent sensor-scheduling for multi-kinect-tracking,” in Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on. IEEE, 2012, pp. 3993–3999.

FYP: Surgical Tracking With Multiple Microsoft Kinects

FYP Project Goals

The aim of this project is to perform tracking of surgical instruments utilizing the Kinect sensors. With the advances in computing and imaging technologies in the recent years, visual limitations during surgery such as those due to poor depth perception and limited field of view, can be overcome by using computer-assisted systems. 3D models of the patient’s anatomy (obtained during pre-operative planning via Computed Tomography scans or Magnetic Resonance Imaging) can be combined with intraoperative information such as the 3D pose and orientation of surgical instruments. Such a computer-assisted system will reduce surgical mistakes and help identify unnecessary or imperfect surgical movements, effectively increasing the success rate of the surgeries.
For computer-assisted systems to work, accurate spatial information of surgical instruments is required. Most surgical tools are capable of 6 degrees of freedom (6DoF) movement, which includes the translation in the x, y, z- axes as well as the rotation about these axes. The introduction of Microsoft Kinect sensor raises the possibility of an alternative optical tracking system for surgical instruments.
This project’s objective would be the development of an optical tracking system for surgical instruments utilising the capabilities of the Kinect sensor. In this part of the project, the focus will be on marker-based tracking using the Kinect sensor.


  • The setup for the tracking of surgical instruments consists of two Kinects placed side by side with overlapping field of views.
  • The calibration board used to find out the intrinsic camera parameters as well as the relative position of the cameras. This allows us to calculate the fundamental matrix, which is essential for epipolar geometry calculations used in 3D point reconstruction. (a) without external LED illumination (b) with LED illumination. The same board is used for RGB camera calibration.
  • Seeded region growing allows the segmentation of retro-reflective markers from the duller background. The algorithm is implemented through OpenCV.
  • Corner detection algorithm: the cornerSubPix algorithm from OpenCV is used to refine the position of the corners. This results in sub-pixel accuracy of the corner position.

Current Results

  • The RMS error for IRR and checkerboard tracking ranges from 0.37 to 0.68 mm and 0.18 to 0.37 mm respectively over a range of 1.2 m. Checkerboard tracking is found to be more accurate. Error increases with distance from camera.
  • The jitter for the checkerboard tracking system was investigated and it was found to range from 0.071 mm to 0.29 mm over the range of 1.2 m.
  • (dots) Measurement of jitter plotted against the distance from the left camera. (line) the data is fitted to a polynomial of order 2 to analyze how jitter varies with depth.

People Involved

FYP Student: Andy Lim Yong Mong
Research Engineer: Liu Wei
Advisor: Dr. Ren Hongliang

Related Project

Surgical Tracking Based on Stereo Vision and Depth Sensing


[1] Sun, W., Yang, X., Xiao, S., & Hu, W. (2008). Robust Checkerboard Recognition for Efficient Nonplanar Geometry Registration in Projector-camera Systems. Proceedings of the 5th ACM/IEEE International Workshop on Projector camera systems. ACM.
[2] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, 2 ed., Cambridge: Cambridge University Press, 2003.
[3] Q. He, C. Hu, W. Liu, N. Wei, M. Q.-H. Meng, L. Liu and C. Wang, “Simple 3-D Point Reconstruction Methods With Accuracy Prediction for Multiocular System, “IEEE/ASME Transactions on Mechatronics, vol. 18, no. 1, pp. 366-375, 2013