Recently, I've been looking into methods to potentially improve the depth map produced by an Intel RealSense D435i, and OpenCV's disparity filter ([1], [2], [3]) seemed like a good initial starting point. At a high level, after generating a disparity map between left and right camera images via either StereoSGBM or StereoBM (where StereoBM computes the disparity by comparing a sum of absolute differences for each block of pixels while StereoSGBM "semi-globally" block matches by forcing similar disparities on a neighborhood of blocks; credits here), the disparity map can be filtered using a weighted least squares approach in the form of a fast global image smoother [2].

Ideally, for a fixed camera pointing at a static scene, the generated depth map should not change across time; unfortunately, this is rarely the case, and the amount of variation across time for such a scene depends on how the depth map is being generated.

To compare between the depth map outputted by librealsense and the one produced by this disparity filter, we can compute several metrics on the same scene for each method and compare:

  • Pixel Variation: the average per-pixel standard deviation across 30 frames for a static scene
    • units: mm; bounds: [0, ∞); lower is better
  • Structural Similarity Index (SSIM): a relative metric between two images that looks at luminance, contrast, and structure, and produces a value between [-1, 1], from least similar to most similar
    • units: arbitrary; bounds: [-1, 1]; higher is better
  • R-SSIM: a variation of SSIM which accounts for potential "holes" in the images where depth values may not be valid or missing
    • units: arbitrary; bounds: [-1, 1]; higher is better

Note that while SSIM was originally created for RGB-to-RGB comparison, I've seen several depth image filtering papers that have used this metric as a means to compare between depth maps. In addition, the translation between what "luminance," "constrast," and "structure" to a depth image is well explained by Malpica et al. and is summarized in the following table:

RGB Depth
Luminance Range
Contrast Surface Roughness
Structure 3D Structure


Experiment 1: Hallway

Left: depth map directly produced by the RealSense; Right: depth map via disparity filter

Left: metrics of depth map directly produced by the RealSense; Right: metrics of depth map via disparity filter

Raw Filtered
Pixel Variation 0.984 ± 0.003 0.996 ± 0.003
SSIM 0.739 ± 0.028 0.998 ± 0.002
R-SSIM 77.58 ±13.65 0.454 ± 0.142


Experiment 2: Window

Left: depth map directly produced by the RealSense; Right: depth map via disparity filter

Left: metrics of depth map directly produced by the RealSense; Right: metrics of depth map via disparity filter

Raw Filtered
Pixel Variation 0.897 ± 0.028 0.968 ± 0.022
SSIM 0.788 ± 0.066 0.980 ± 0.020
R-SSIM 570.84 ± 119.84 1.988 ± 0.675


Experiment 3: Ceiling

Left: depth map directly produced by the RealSense; Right: depth map via disparity filter

Left: metrics of depth map directly produced by the RealSense; Right: metrics of depth map via disparity filter

Raw Filtered
Pixel Variation 0.997 ± 0.001 0.999 ± 0.001
SSIM 0.637 ± 0.019 0.999 ± 0.002
R-SSIM 19.07 ± 4.56 0.225 ± 0.243

Next Post Previous Post