DiffusionPoser: Real-time Human Motion Reconstruction From Arbitrary Sparse Sensors Using Autoregressive Diffusion

Motion capture from a limited number of body-worn sensors, such as inertial measurement units (IMUs) and pressure insoles, has important applications in health, human performance, and entertainment. Recent work has focused on accurately reconstructing whole-body motion from a specific sensor configuration using six IMUs. While a common goal across applications is to use the minimal number of sensors to achieve required accuracy, the optimal arrangement of the sensors might differ from application to application.

We propose a single diffusion model, DiffusionPoser, which reconstructs human motion in real-time from arbitrary sensor configurations including IMUs and pressure insoles. Unlike existing methods, our model grants users the flexibility to determine the number and arrangement of sensors tailored to the specific activity of interest, without the need for retraining. A novel autoregressive inferencing scheme ensures real-time motion reconstruction that closely aligns with measured sensor signals. The generative nature of DiffusionPoser ensures realistic behavior, even for degrees-of-freedom not directly measured.

We construct diverse motion while instrumenting the pelvis, head, wrists and shanks. This is the configuration that has been used in a series of prior work ([Yi et al. 2021], [Yi et al. 2022],[Jiang et al. 2022]) . In our paper we show quantitatively that our reconstructions are as good as previous systems for this specific configuration. For this video we replace our stick-figure reconstruction with a SMPL mesh and remove the delay, in order to appreciate details better. The reconstruction is still done with the live algorithm - but just after data collection. In the following videos we show the reconstruction on the monitor and real motion together.

We remove an IMU and slightly change the configuration that now consists of five IMUs attached to the feet, pelvis, left wrist and right thigh. Such configuration matches convenient locations to attach devices for daily life activity monitoring (shoes, belt, watch, smartphone). We reconstruct a similar set of diverse motions. Note that uninstrumented segments are reconstructed realistically (f.e. right arm swing during walking) but that specific motions might not be captured.

We remove another sensor and now instrument the left wrist, right thigh and the feet and reconstruct a similar sequence of diverse motions. This configuration mimics a realistic real-life setup with instrumented shoes, a watch and a smartphone.

We remove more sensors and now have a setup with 3 IMUs. Such setups are useful for monitoring over long periods of time. Clinical studies that aim to monitor patients and track rehabilation would benefit from continuous gait analysis during daily life. Here we show that 3 sensors suffice to capture some typical gait deviations in different patient populations

Toe-in and toe-out gait are two gait deviations that can reduce knee load in patients with knee pain (e.g. knee osteoarthritis patients). [Uhlrich et al. 2018]

Scissor gait is a gait deviation/compensation observed in patients with hip adductor spasticity (e.g. cerebral palsy patients).

Patients with a weak hip flexor on one side compensate by exorotating the hip joint to shift the workload to the hip adductor. A weak hip flexor on one side is observed in patients with nerve damage or hip osteoarthritis patients.

Finally we go back to our setup with six sensors and reconstruct dynamic motion outdoors. We record a tennis player, who is playing against the wall.

Our generative model can complete motion realistically if signals from sensors become unreliable or are lost. (f.e. lost connection or package losses). Here we show reconstructions of motions where at a certain point in time we drop the signal form all sensors for a couple of seconds. Our generative model completes the motion. As soon as signals are back a natural transition is performed to match the underlying motion. Regressive models such as PIP do not have this capacity to complete motion realistically.

Here are some more examples of generative motion completion when signal is lost using DiffusionPoser.

A video that summarizes results for DiffusionPoser for OpenSim. It includes more live demos and results with different configurations. We also compare to a purely regressive model.

DiffusionPoser: Real-time Human Motion Reconstruction From Arbitrary Sparse Sensors Using Autoregressive Diffusion

Reconstructions of the same sequence with PIP [Yi et al. 2022] using 6 sensors and OURS using 6, 4, 3 and 2 IMU sensors.

Abstract

Live Demo: Diverse motion and locomotion with six IMU sensors

Live Demo: Diverse motion and locomotion with five IMU sensors

Live Demo: Diverse motion and locomotion with four IMU sensors

Live Demo: Diverse motion and locomotion with six IMU sensors, but with live visualization.

Live Demo: Clinically relevant gait deviations captured with 3 sensors (foot L/R and pelvis)

Live Demo: Tennis

Signal drop is completed by our generative model.

DiffusionPoser for OpenSim