Virtual Consistency for Audio Editing - Audio Samples

Matthieu Cervera, Francesco Paissan, Mirco Ravanelli, Cem Subakan

Welcome to the companion website where you can listen to the edited samples that composed the user study.

Paper Code

Abstract: Free-form, text-based audio editing remains a persistent challenge, despite progress in inversion-based neural methods. Current approaches rely on slow inversion procedures, limiting their practicality. We present a virtual-consistency based audio editing system that bypasses inversion by adapting the sampling process of diffusion models. Our pipeline is model-agnostic, requiring no fine-tuning or architectural changes, and achieves substantial speed-ups over recent neural editing baselines. Crucially, it achieves this efficiency without compromising quality, as demonstrated by quantitative benchmarks and a user study involving 16 participants.

Keywords: Neural Audio Editing, Diffusion Models, Virtual Inversion, Consistency Models

Source Prompt : A woman is singing a gospel song, accompanied by guitar and drums.

Target prompt : A violin is playing a gospel song accompanied by guitar and drums.

Input

VCI (ours)

ZETA

DDIM

SDEDIT

MusicGen

Source Prompt : a recording of piano music with no percussion propeller sound and male narrative voice at the end at a moderate tempo

Target prompt : a recording of trumpet music with no percussion propeller sound and male narrative voice at the end at a moderate tempo

Input

VCI (ours)

ZETA

DDIM

SDEDIT

MusicGen

Source Prompt : A recording of a grunge rock song.

Target prompt : A recording of an arcade game soundtrack.

Input

VCI (ours)

ZETA

DDIM

SDEDIT

MusicGen

Source Prompt : A recording of an upbeat disco song.

Target prompt : A recording of an epic movie soundtrack.

Input

VCI (ours)

ZETA

DDIM

SDEDIT

MusicGen

Source Prompt : A man is singing while a guitar, bass and drums playing in the background.

Target prompt : A recording of a techno song.

Input

VCI (ours)

ZETA

DDIM

SDEDIT

MusicGen

Source Prompt : a recording of a vibrant sax concerto with a [fast] tempo

Target prompt : a recording of a vibrant sax concerto with a [slow] tempo

Input

VCI (ours)

ZETA

DDIM

SDEDIT

MusicGen

Source Prompt : a recording of a melancholic string symphony featuring violin viola and cello

Target prompt : a recording of a melancholic string symphony featuring flute viola and cello

Input

VCI (ours)

ZETA

DDIM

SDEDIT

MusicGen

Source Prompt : A high quality recording of a rock song. a man singing while a guitar, bass and drums playing in the background, when later a woman joins.

Target prompt : A cool 60s jazz song of a saxophone playing while a guitar, bass and drums accompany him.

Input

VCI (ours)

ZETA

DDIM

SDEDIT

MusicGen

Source Prompt :a recording of italian baroque music with cello and harpsichord

Target prompt : a recording of italian baroque music with flute and harpsichord

Input

VCI (ours)

ZETA

DDIM

SDEDIT

MusicGen