The No.1 Website for Pro Audio
 Search This Thread  Search This Forum  Search Reviews  Search Gear Database  Search Gear for sale  Search Gearslutz Go Advanced
FFT And breaking downa sound into its harmmonics?
Old 10th February 2018
  #1
Lives for gear
 

Thread Starter
FFT And breaking downa sound into its harmmonics?

Hi, Sorry if this question don't make sense i'm just trying to ge tto grip ith the concept.

But say if i have a wav file of a EDM kick drum so has a bit of high,mid,low and sub could I use a FFT to break it down into its sine wave heritiage (Lack for a better word!)?

So assume all above is correct the next step would be i have a bunch of sine wavs at different freqencies, How would i then combine them to get back to the original single sound of that EDM kick drum?

Probably huge gaps in my understanding but if you could correct where i've gone wrong it would be much appreshiated

Thanks
Old 10th February 2018
  #2
Lives for gear
FFT works with regular, repeating waves... you can isolate the most important sine waves, and recombining them will usually give you a recognizable version of the sound. That's how Synclavier worked, 40 years ago when computers weren't particularly powerful.

Most drum sounds can be through of as random noise bursts that are processed through filters and envelopes. You can create electronic drums that go "poing" for disco sounds, but not by analyzing the original noise.

Some modern apps, like Audionamix, are using Neural Networks to isolate various instruments in a track. FFT is part of the input process, but the magic happens within the hidden layer by using pattern recognition of the changes in frequency/level over time.

That's my understanding, and I'm still wrapping my head around Neural Networks in audio. I could be wrong.

What are you trying to accomplish?
Old 11th February 2018
  #3
Lives for gear
 

Thread Starter
Quote:
Originally Posted by Jay Rose View Post
FFT works with regular, repeating waves... you can isolate the most important sine waves, and recombining them will usually give you a recognizable version of the sound. That's how Synclavier worked, 40 years ago when computers weren't particularly powerful.

Most drum sounds can be through of as random noise bursts that are processed through filters and envelopes. You can create electronic drums that go "poing" for disco sounds, but not by analyzing the original noise.

Some modern apps, like Audionamix, are using Neural Networks to isolate various instruments in a track. FFT is part of the input process, but the magic happens within the hidden layer by using pattern recognition of the changes in frequency/level over time.

That's my understanding, and I'm still wrapping my head around Neural Networks in audio. I could be wrong.

What are you trying to accomplish?
You hit the nail on the head mate!

Basically I'm doing a master degree in AI and I wana use machine learning to generate sounds, So far my experimentation with neural network has come out quite **** but to be honest I haven't trained it for long enough or had enough samples. I was training off the .wav data.

I was thinking if i could spilt it into its harmonics, so the fundemental and then the freqency of the next one, so I get a bunch of sine waves that I can try and train my neural net on. Bascially thinking of different way to convert/expand the data to give it more meaning/definition so its easier to learn for a neural net or what ever algo.
Old 12th February 2018
  #4
Lives for gear
 

you'd need to tell your computers whether they are looking at an fft from a single sound source or from multiple sound sources...
Old 12th February 2018
  #5
Lives for gear
DD! (or deedeeyeah), that wouldn't matter so long as your training samples were properly answer-keyed. General way to train a NN for a sound is to give it the spectral information (x amplitude in y frequency band, across the whole spectrum of interest) of the sample input, and then the same information for the desired result. Repeat -- and repeat -- until the hidden layers show some reliability at coming up with the same result for a given sample. Audio complicates things a little since there's also a time variability that's important, but the principle is the same.
Old 12th February 2018
  #6
Lives for gear
 

thx for further technical insight (with which i'm somewhat familiar: i owned a fairlight!)

i was just wondering about the results if the training ground for the machines would be 'just' samples (of a single instrument) versus more complex waveforms (music played by serveral instrument) and that it (probably?) would be difficult to use fft for the latter...
Old 13th February 2018
  #7
Lives for gear
The problem with training a NN with samples of just one instrument, is it never learns how to generalize what is that instrument and what isn't. Once the network learns those samples, it'll be fine at identifying those samples again... if it hears that instrument solo. But if the trumpet it's learned is mixed with a flute it never heard before, it won't have a clue whether this is two instruments (and how to tell them apart), or even if it's a flumpt.

Check the article on my website, jayrose.com. There's a sidebar about training.
Old 14th February 2018
  #8
Lives for gear
 

Thread Starter
I might just use a bunch of bandpass filters to filter each frequency. Since I'm still not fully clued up on how the FFT works.

At the moment I'm just training on 40 samples which are all synths so lots of variation, I might also try a convolutional neural net to see if I get better results.

One of the problems is I'm only using 40 samples, most ANN's need way more 10,000's. I'm thinking of making a training set starting from a bunch of sine waves and then just incrementally making changes to them and hopefully it will be easier for it to learn that way
Old 15th February 2018
  #9
Lives for gear
 
acreil's Avatar
 

Quote:
Originally Posted by complex View Post
I might just use a bunch of bandpass filters to filter each frequency. Since I'm still not fully clued up on how the FFT works.
What you're thinking of is a constant-Q transform. Gammatone filters and wavelets are other options. The constant-Q transform and gammatone filters correspond well with human hearing, better than a short-time Fourier transform. Machine learning applications for things like speech recognition typically use a mel-frequency cepstrum.
Post Reply

Welcome to the Gearslutz Pro Audio Community!

Registration benefits include:
  • The ability to reply to and create new discussions
  • Access to members-only giveaways & competitions
  • Interact with VIP industry experts in our guest Q&As
  • Access to members-only sub forum discussions
  • Access to members-only Chat Room
  • Get INSTANT ACCESS to the world's best private pro audio Classifieds for only USD $20/year
  • Promote your eBay auctions and Reverb.com listings for free
  • Remove this message!
You need an account to post a reply. Create a username and password below and an account will be created and your post entered.


 
 
Slide to join now Processing…
Thread Tools
Search this Thread
Search this Thread:

Advanced Search
Forum Jump
Forum Jump