Presentation description
This work presents a compact, passive depth-imaging method that uses an extended depth-of-focus (EDoF) lens to capture a single, all-in-focus image from which a synthetic focal stack is generated. EDoF optics remove the need for mechanical focus adjustment, enabling smaller and simpler hardware. However, the spatial blur introduced by such optics complicates depth reconstruction. To address this, we explore a transformer-based model originally developed for mask-based lensless imaging to translate EDoF input into varifocal outputs. To enable higher-fidelity reconstruction, we redesigned the hardware, a dual-camera setup, achieving better physical alignment and integrating higher-quality sensors. Although data collection with this improved configuration is still pending, expected to produce cleaner input images and more consistent training data, potentially reducing otput artifacts such as pixelation and blur. Our model consistently generates sharp and depth-consistent focal slices. We observe: structure preservation and color separation, SSIM (0.556), MAE (0.129), and PSNR (16.03), and reduced color bleeding and edge artifacts. These results demonstrate that transformer-based approaches are not only feasible but also superior in learning depth cues from EDoF images. Our findings highlight key trade-offs in model design-resolution, dataset composition, and architectural choices all significantly impact depth estimation accuracy and perceptual quality. By refining both the computational models and the hardware acquisition pipeline, this work contributes to bridging the gap between numerical performance and visual fidelity. Ultimately, we aim to advance the development of lightweight, ML-enhanced depth-sensing systems for applications in robotics, surveillance, and autonomous navigation.
Ballroom