View Single Post
Old 14th September 2020
  #77
Lives for gear
 

Moving forward in the science re: the art of extraction of say, three guitars from a mono mix...which is time consuming to do circa 2020....

MIT is almost three years into a project that can extract difficult mixed instruments..like separate guitars...via a novel approach....like...oh yeah, that makes sense.

Let's say you have an old audio mix of three guitars. For the moment, let's say it's stereo instead of a mono mix. A steel string guitar being flat picked on the left, a classical fingerpicked guitar on the right.....separation not too good...and...right on top of everything, sorta spread across the panorama, a metal-fingerpicked electric guitar.

A nice tapestry, nice arrangement, each guitar doing a unique thing but....because there are sooooo many notes....and guitars being similar sounding no matter how well your brain can isolate them.....it's difficult to massively impossible to extract the three onto resulting separate mono tracks....circa 2020.

Well......

To shorten this most interesting AI research development story......

Take the recorded audio....get in front of a camera (you yourself with a guitar), make sure you yourself know how to play those 3 guitar parts pretty damn well...and....

do a video of you sitting there, playing along with the recording of the left guitar....NO MIC. Remember, the left guitar is an ac flat picked. The camera just has to SEE your both hand movements.

Next, do the video for the classical guitar part....with a classical guitar.

Third...do the electri guitar part.

Do body/head movements that are appropriate for each pass.

Edit the videos so all three of "you" are now simultaneously on the screen "playing" the synchtonized audio.

The point?

The MIT project software (I'd name it Hal 9000) uses ai....'watches" you, listens to the recorded guitars, instantly figures out....by seeing you....which guitar part goes with which guy....and then.....splits the guitars into mono....that can be turned up/down/isolated.

Apparently, it's like our mind workings at a party...you may hear a cluster of voices, but if your eyes focus on a guy twenty feet away who is using various body motions, your brain can pretty solidly radar in and "isolate" his voice so that can can catch just that conversation.

Pretty neat eh?

MIT was calling the thing PixelPlayer three years ago but have just renamed it as the reasearch advanced over the past six months.

Works on existing video as well. My tutorial was for where you have no video....but want to extract very difficult stuff.

Heck...you could extract a mixed trumpet/trombone/sax/tuba via video....even if you don'y play those instruments. You'd simply hold them and mimic movements for the camera.

Such interesting possibilities. Such interesting times.

I post this here on purpose for Steinberg/SL to see.

Someone's gonna grab the code and work it into what I've described.

Who will be first?
Show replies