Video editor with the ability to filter out only a single person's voice? - eviltoast

I’ve recently done some talks for my schools cybersecurity club, and now I want to edit them.

My actual video editing needs are very simple, I just need to clip parts of the video out, which basically every editor can do, as per my understanding.

However, my videos were recorded from my phone, and I don’t have a presentation mic or anything of the sort, meaning background noise, including people talking has slipped in. From my understanding, it’s trivial to filter out general noise from audio, as human voices have a specific frequency, even “live”, like during recording or during a game, but filtering voices is harder.

However, it seems that AI can do this:

https://scribe.rip/axinc-ai/voicefilter-targeted-voice-separation-model-6fe6f85309ea

Although, it seems to only work on .wav audio files, meaning I would need to separate out the audio track first, convert it to wav, and then re merge it back in.

Before I go learning how to do this, I’m wondering if there is already an existing FOSS video editor, or plugin to an editor that lets me filter the video itself, or a similar software that works on the audio of videos.

  • WastedJobe@feddit.de
    link
    fedilink
    arrow-up
    14
    ·
    7 months ago

    Audio Engineer here. Not sure Ardour can open video, but it’s a capable DAW and open source. Reaper is closed source but it can open (and even render) pretty much any video format. To actually seperate a single voice, you do need additional plugins though, no matter which DAW you’re using.
    I think iZotope RX could do it, but it is fairly expensive. I haven’t seen any open source audio tools that can do this at all. It is pretty much guaranteed to require some kind of machine learning, as parametrically seperating by EQ or phase won’t work if you have only one source signal (even with two or more microphones, it would be really, really hard).
    A very good spectral editor might technically work, it would however take several days of manually deleting select frequencies on an almost single sample level and still sound bad, especially if the noise is nearly the same level as the signal.

    • datavoid@lemmy.ml
      link
      fedilink
      English
      arrow-up
      3
      ·
      7 months ago

      I would recommend RX as well, but it is pricey if you can’t wait for a sale (and if you’re only using it once, its expensive when on sale too).

      Assuming your footage isn’t super long, I’d be happy to try running it through RX for you - feel free to send me a message

  • WolfLink@lemmy.ml
    link
    fedilink
    arrow-up
    13
    ·
    7 months ago

    The technique you linked is, even at a couple years old, a pretty cutting-edge technique. You aren’t going to find it or something similar in any video editing software. Maybe someone’s made a plugin for one if you are lucky.

    However, there are a lot of free tools that make it easy to split or rejoin audio and video, and convert it between different formats.

    Id recommend:

    • Audacity if you want a GUI
    • FFMPEG if you want a command line tool
    • VLC can also do a lot of conversions FFMPEG does if you dig through its features (it’s basically a GUI wrapper for FFMPEG)
  • The Bard in Green@lemmy.starlightkel.xyz
    link
    fedilink
    English
    arrow-up
    7
    ·
    7 months ago

    Any AI solution you find is probably going to be command line / python and is going to require some debugging of your python environment and dependencies to get it working. And that means yes, you will need to separate the audio and video tracks and then recombine them. For that kind of work, I’m only familiar with Linux tools. I’ve used a tool called Vidcutter that is buggy, but powerful and has a semi intuitive gui.

    That said, the results from those AI tools can be a powerful game changer if you can figure them out.

  • IrritableOcelot@beehaw.org
    link
    fedilink
    arrow-up
    1
    ·
    7 months ago

    It’s possible that there’s a reason it requires lossless audio, in that it requires uncompressed signal to work. For instance, if the ML model is trained on uncompressed data, it may need audio which has never been compressed.