Biosignals · Generative audio · Mobile · 2025In progress

Pulse2Symphony

Biosignal-conditioned music generation from smartphone-camera PPG.

SDNN · RMSSD · LF/HF

HRV features extracted from camera PPG

Valence × Arousal

Russell's two-dimensional affect model

CNN-LSTMTransformer decoderREMI tokenizationPPG / HRV

Repository and writeup are coming online shortly. Reach out on email if you’d like early access.

Overview

Pulse2Symphony is a mobile pipeline that turns a smartphone camera into a biosignal sensor for personalised music generation. It extracts heart-rate-variability (HRV) features from a short photoplethysmography (PPG) recording, classifies the user's affective state in a two-dimensional valence-arousal space, and generates MIDI music conditioned on that state through a Transformer decoder trained on REMI tokens.

Background

HRV is the variation in time between consecutive heartbeats. Short-term HRV statistics like SDNN, RMSSD, and the LF/HF ratio correlate reliably with autonomic nervous-system balance and have been used as proxies for arousal and, less directly, for valence. A smartphone camera with the flash on can recover the PPG waveform from fingertip perfusion, making HRV extraction possible without specialised hardware.

Russell's circumplex model places affective states on two axes (valence and arousal), a compact representation that maps naturally onto music-generation conditioning. REMI tokenisation represents music as structured events (position, bar, note, duration) that a Transformer can learn from efficiently.

Approach

The pipeline has three stages. The first extracts SDNN, RMSSD, and LF/HF from the PPG signal using standard time- and frequency-domain methods. The second is a CNN-LSTM that maps the HRV feature window onto a point in Russell's valence-arousal plane. The third is an emotion-conditioned Transformer decoder that generates REMI token sequences from the predicted affect coordinates, which are decoded back into MIDI.

Tech stack

Smartphone-camera PPG: Signal acquisition with flash-on fingertip method.
CNN-LSTM: Affect classification in Russell's valence-arousal plane.
Transformer decoder: Emotion-conditioned MIDI generation.
REMI tokenisation: Structured music token representation for the decoder.