Virtual Consistency for Audio Editing - Audio Samples
Matthieu Cervera, Francesco Paissan, Mirco Ravanelli, Cem Subakan
Welcome to the companion website where you can listen to the edited samples that composed the user study.
Paper CodeAbstract: Free-form, text-based audio editing remains a persistent challenge, despite progress in inversion-based neural methods. Current approaches rely on slow inversion procedures, limiting their practicality. We present a virtual-consistency based audio editing system that bypasses inversion by adapting the sampling process of diffusion models. Our pipeline is model-agnostic, requiring no fine-tuning or architectural changes, and achieves substantial speed-ups over recent neural editing baselines. Crucially, it achieves this efficiency without compromising quality, as demonstrated by quantitative benchmarks and a user study involving 16 participants.
Keywords: Neural Audio Editing, Diffusion Models, Virtual Inversion, Consistency Models
Source Prompt : A woman is singing a gospel song, accompanied by guitar and drums.
Target prompt : A violin is playing a gospel song accompanied by guitar and drums.
Input
VCI (ours)
ZETA
DDIM
SDEDIT
MusicGen
Source Prompt : a recording of piano music with no percussion propeller sound and male narrative voice at the end at a moderate tempo
Target prompt : a recording of trumpet music with no percussion propeller sound and male narrative voice at the end at a moderate tempo
Input
VCI (ours)
ZETA
DDIM
SDEDIT
MusicGen
Source Prompt : A recording of a grunge rock song.
Target prompt : A recording of an arcade game soundtrack.
Input
VCI (ours)
ZETA
DDIM
SDEDIT
MusicGen
Source Prompt : A recording of an upbeat disco song.
Target prompt : A recording of an epic movie soundtrack.
Input
VCI (ours)
ZETA
DDIM
SDEDIT
MusicGen
Source Prompt : A man is singing while a guitar, bass and drums playing in the background.
Target prompt : A recording of a techno song.
Input
VCI (ours)
ZETA
DDIM
SDEDIT
MusicGen
Source Prompt : a recording of a vibrant sax concerto with a [fast] tempo
Target prompt : a recording of a vibrant sax concerto with a [slow] tempo
Input
VCI (ours)
ZETA
DDIM
SDEDIT
MusicGen
Source Prompt : a recording of a melancholic string symphony featuring violin viola and cello
Target prompt : a recording of a melancholic string symphony featuring flute viola and cello
Input
VCI (ours)
ZETA
DDIM
SDEDIT
MusicGen
Source Prompt : A high quality recording of a rock song. a man singing while a guitar, bass and drums playing in the background, when later a woman joins.
Target prompt : A cool 60s jazz song of a saxophone playing while a guitar, bass and drums accompany him.
Input
VCI (ours)
ZETA
DDIM
SDEDIT
MusicGen
Source Prompt :a recording of italian baroque music with cello and harpsichord
Target prompt : a recording of italian baroque music with flute and harpsichord
Input
VCI (ours)
ZETA
DDIM
SDEDIT
MusicGen