Composer · Scene 07 · Spatial Index

01 · Problem

Most browser audio tools flatten space into a timeline.

Conventional DAWs are built around tracks, lanes, and left-right pan. That works for arrangement, but it collapses the feeling of multiple voices sharing a room into a narrow strip of controls.

Composer starts from a different assumption: source position is part of the composition itself. The project’s core interface is a large bird's-eye radar where every sound source is visible relative to the listener and can be selected, dragged, frozen, pinned, or muted directly in space.

"Not a timeline with panning. A field with inhabitants."

Composer spatial audio interface overview

Composer · spatial radar interface · binaural playback

02 · Context

Spatial audio as interface, not post-production effect.

The project sits between web audio tooling and installation logic. Instead of treating HRTF rendering as polish applied at the end, Composer makes spatial placement the primary authoring metaphor from the first interaction.

The live system separates ambient environment from composition structure. That makes it possible to pair a mood layer with anchored conversational layouts, so the same field can work as an instrument, a staging surface, or a spatial interview room.

03 · Approach

A radar layout backed by compositions, drift, and diarization.

Sources are positioned across a radar interface and rendered binaurally through Resonance Audio. The placement model is intentionally direct: what you see in the field is what you hear in the headphones. In ambient mode, clouds drift through the field as looping objects; in composition mode, named anchor slots define stable relational positions for voices.

Composer supports layouts like the Dyad, Council, Journal, and Interview. Each template defines azimuth, elevation, and distance slots that can be filled manually or via speaker diarization.

AssemblyAI handles speaker detection when a public audio URL or uploaded mix is ingested. Detected speakers are routed into the active composition as positioned sources. When the number of speakers exceeds the current layout, Composer offers a mismatch flow so the field can switch compositions or expand to fit.

04 · Stack

Web-native spatial mixing with automated voice parsing.

Rendering

Vanilla JS · Web Audio API · Resonance Audio · browser UI radar field

Backend / Data

AssemblyAI diarization · ElevenLabs voice generation · uploaded audio and public URL ingestion

Pipeline

Audio source → drifting cloud or anchored slot → speaker parsing when needed → positioned source graph → binaural render

Deploy

Vercel · composer.spatial-index.xyz

05 · Reflections

Spatial composition becomes an editing decision much earlier.

The main lesson is that diarization is not just a transcription feature. Once each voice becomes a movable or anchorable source, analysis turns directly into arrangement. The browser stops being a playback surface and becomes a staging environment.

The larger idea in the progress notes is persistence across sessions: shared arrangements, ghost traces, and saved spatial structures that make the field feel less like a one-off mixer and more like a place you return to.

06 · Build Log

From source ingestion to spatial field.

2026

Radar composition model

Defined the browser composition surface around a central radar, making spatial layout the primary editing metaphor instead of track lanes.

2026

Cloud and composition system

Built interactive sound clouds with drag, freeze, mute, and pin controls, then added named composition templates for anchored voice layouts.

2026

Speaker routing

Integrated AssemblyAI diarization so uploaded mixes could be separated by speaker and routed into compatible compositions, including mismatch handling for overflow cases.

2026

Spatial environments

Expanded the instrument into a spatial conversation tool with environment presets, radar-centered interaction, uploaded sources, and compositional anchor roles.

2026

Launch

composer.spatial-index.xyz live with browser-native binaural composition, drifting clouds, anchored voice layouts, and diarized source placement.