DisguisOR: Holistic Face Anonymization in the Operating Room

Abstract

Purpose: Recent advances in Surgical Data Science (SDS) have contributed to an increase in video recordings from hospital environments. While methods such as surgical workflow recognition show potential in increasing the quality of patient care, the quantity of video data has surpassed the scale at which images can be manually anonymized. Existing automated 2D anonymization methods underperform in Operating Rooms (OR), due to occlusions and obstructions. We propose to anonymize multi-view OR recordings using 3D data from multiple camera streams.

Methods: RGB and depth images from multiple cameras are fused into a 3D point cloud representation of the scene. We then detect each individual's face in 3D by regressing a parametric human mesh model onto detected 3D human keypoints and aligning the face mesh with the fused 3D point cloud. The mesh model is rendered into every acquired camera view, replacing each individual's face.

Results: Our method shows promise in locating faces at a higher rate than existing approaches. DisguisOR produces geometrically consistent anonymizations for each camera view, enabling more realistic anonymization that is less detrimental to downstream tasks.

Conclusion: Frequent obstructions and crowding in operating rooms leaves significant room for improvement for off-the-shelf anonymization methods. DisguisOR addresses privacy on a scene level and has the potential to facilitate further research in SDS.

Pipeline

Input: RGB images, depth maps, and a fused 3D point cloud of the scene.
3D Human Pose Key-Point Estimation: Detect and fuse human pose keypoints in 3D for each person.
Human Mesh Fitting: Fit a 3D mesh for each person by regressing 2D human keypoints into a global 3D coordinate frame.
Head Alignment: Refine mesh positioning by aligning the face mesh with the fused 3D point cloud.
Face Extraction: Extract the face from the refined mesh.
Output: Texturize the 3D mesh and back-project to each camera view, anonymizing each face by replacing each individual's face with the rendered mesh model.

Holistic Recall

Holistic recall is a metric that considers a face detected only if identified in all partially visible camera views, which is crucial in multi-view setups to ensure consistent and complete anonymization across all camera angles, thereby improving privacy and data utility.

Results

Results of DisguisOR compared with GAN-based method DeepPrivacy. DeepPrivacy struggles to generate a face appropriate to the environment, often replacing a masked face with an unmasked one, whereas our mesh is capable of seamlessly fusing the texture into the image. Even at unusual angles, DisguisOR provides reasonable and uncorrupted faces. However, the rendering technique sometimes results in the texture being rendered on occlusions (third image). If the target face is wearing skull caps, GAN-based methods fail to generate realistic faces, whereas DisguisOR still blends in the replaced face seamlessly.

BibTeX

@article{bastian2023disguisor,
        title={DisguisOR: holistic face anonymization for the operating room},
        author={Bastian, Lennart and Wang, Tony Danjun and Czempiel, Tobias and Busam, Benjamin and Navab, Nassir},
        journal={International Journal of Computer Assisted Radiology and Surgery},
        pages={1--7},
        year={2023},
        publisher={Springer}
      }