Generative AI· 2026

UNDERSCORE

AI That Scores Your Video, Scene by Scene

Finding the right music for a video is slow and licence-fraught. Underscore watches your footage, understands each scene, and generates an original soundtrack matched to the cuts — composed locally, on your own machine.

Origin Story

An ambitious experiment: can a fully local AI pipeline actually compose to picture, the way a human composer scores to an edit?

How It's Built

It segments the video, uses Gemini to read the mood and content of each scene, then drives a local ACE-Step music model (GGUF) to generate audio — aligning everything to the edit with FFmpeg, with YAMNet assisting the analysis.

Engineering Notes

Understanding feeds generation

Underscore segments a video, uses Gemini to read the mood and content of each scene, and turns that into the brief a music model composes from. It's multimodal understanding driving generation, scene by scene.

Local generative music

A local ACE-Step diffusion model (GGUF) generates an original score on-device — no licensing, no cloud — with FFmpeg aligning audio to the cuts and YAMNet assisting analysis. An early demo, Apple-Silicon-first, and exactly the kind of pipeline I find worth chasing.

Underscore is the most ambitious pipeline here — multimodal understanding feeding a local generative model. Early and rough, and exactly the kind of problem I find worth chasing.

What It Does

Scene Understanding

Gemini reads the mood and content of each scene, producing the brief the music generator works from.

Local Music Generation

A local ACE-Step diffusion model (GGUF) composes an original score on-device — no licensing, no cloud.

Cut-Aware

Audio is aligned to the edit with FFmpeg, with YAMNet helping analyse the footage's own sound.

Native Desktop

Tauri 2 + SvelteKit 5, Apple-Silicon-first, with the whole pipeline running locally.

Where It Stands

Underscore is an early, ambitious demo: a working Gemini-plus-ACE-Step pipeline that scores video on-device, Apple-Silicon-first. Longer clips and finer control over the generated score are where it goes next — the multimodal-to-generative loop is the hard part, and it already runs.

// UNDER THE HOOD

Underscore is Tauri 2 + SvelteKit 5 + Rust. Gemini handles per-scene mood analysis; a local ACE-Step diffusion music model (GGUF) generates audio on-device; FFmpeg and YAMNet handle alignment and audio analysis. It is an early demo, Apple-Silicon-only today.