Illumination-Robust Camera-Based Heart-Rate Estimation for Physiological Sensing in Robots
This paper presents an end-to-end spatial-temporal transformer framework for remote heart-rate estimation from RGB camera images under varying illumination.
This paper presents a novel end-to-end spatial-temporal transformer framework for remote heart-rate estimation under varying illumination.
Before reading this…
Applications
- →Service robots
- →Social robots
- →Assistive robots
To understand this paper, make sure you know these concepts first:
- Knowledge of computer vision and machine learningfind papers →
- Understanding of transformer frameworksfind papers →
Abstract
More Like ThisPhysiological awareness is important for service, social, and assistive robots that interact with humans in everyday environments. Remote photoplethysmography (rPPG) enables non-contact heart-rate (HR) estimation from an RGB camera, making it a promising sensing modality for robot-mounted vision systems. However, illumination variation remains a major barrier to robust deployment. This paper presents an end-to-end spatial-temporal transformer framework for remote HR estimation on a new dataset with varied illumination. Our estimator integrates PRNet-based 3D face alignment, clip-level illumination augmentation, the Residual Temporal Standardization Module, and controlled hybrid temporal-frequency supervision. The training objective combines a Soft-Shifted Pearson waveform loss with a spectral Kullback-Leibler divergence loss, where a tuned weight ($\mathbfβ$) controls the contribution of frequency-domain heart-rate guidance. Experiments on a static all-level mix protocol covering three illumination levels show that $\mathbfβ=5$ provides the strongest result among the tested beta settings, achieving a best-run HR mean absolute error (MAE) of 0.79 bpm and an HR correlation of 0.982. Compared with the PhysFormer baseline evaluated on our dataset, our estimator reduces HR MAE by 93.6 %, while increasing HR correlation from 0.088 to 0.982, making it usable when illumination varies.