UNDERSCORE
AI That Scores Your Video, Scene by Scene

Finding the right music for a video is slow and licence-fraught. Underscore watches your footage, understands each scene, and generates an original soundtrack matched to the cuts — composed locally, on your own machine.
Origin Story

An ambitious experiment: can a fully local AI pipeline actually compose to picture, the way a human composer scores to an edit?
How It's Built
It segments the video, uses Gemini to read the mood and content of each scene, then drives a local ACE-Step music model (GGUF) to generate audio — aligning everything to the edit with FFmpeg, with YAMNet assisting the analysis.
Engineering Notes
Understanding feeds generation
Underscore segments a video, uses Gemini to read the mood and content of each scene, and turns that into the brief a music model composes from. It's multimodal understanding driving generation, scene by scene.
Local generative music
A local ACE-Step diffusion model (GGUF) generates an original score on-device — no licensing, no cloud — with FFmpeg aligning audio to the cuts and YAMNet assisting analysis. An early demo, Apple-Silicon-first, and exactly the kind of pipeline I find worth chasing.
Underscore is the most ambitious pipeline here — multimodal understanding feeding a local generative model. Early and rough, and exactly the kind of problem I find worth chasing.
What It Does
Scene Understanding
Gemini reads the mood and content of each scene, producing the brief the music generator works from.
Local Music Generation
A local ACE-Step diffusion model (GGUF) composes an original score on-device — no licensing, no cloud.
Cut-Aware
Audio is aligned to the edit with FFmpeg, with YAMNet helping analyse the footage's own sound.
Native Desktop
Tauri 2 + SvelteKit 5, Apple-Silicon-first, with the whole pipeline running locally.
Where It Stands
Underscore is an early, ambitious demo: a working Gemini-plus-ACE-Step pipeline that scores video on-device, Apple-Silicon-first. Longer clips and finer control over the generated score are where it goes next — the multimodal-to-generative loop is the hard part, and it already runs.
Underscore is Tauri 2 + SvelteKit 5 + Rust. Gemini handles per-scene mood analysis; a local ACE-Step diffusion music model (GGUF) generates audio on-device; FFmpeg and YAMNet handle alignment and audio analysis. It is an early demo, Apple-Silicon-only today.