electro-music.com   Dedicated to experimental electro-acoustic
and electronic music
 
    Front Page  |  Radio
 |  Media  |  Forum  |  Wiki  |  Links
Forum with support of Syndicator RSS
 FAQFAQ   CalendarCalendar   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   LinksLinks
 RegisterRegister   ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in  Chat RoomChat Room 
 Forum index » DIY Hardware and Software » Developers' Corner
Anyone familiar with mel frequency cepstral coefficients?
Post new topic   Reply to topic Moderators: DrJustice
Page 1 of 1 [5 Posts]
View unread posts
View new posts in the last week
Mark the topic unread :: View previous topic :: View next topic
Author Message
dewdrop_world



Joined: Aug 28, 2006
Posts: 858
Location: Guangzhou, China
Audio files: 4

PostPosted: Mon Sep 11, 2006 1:22 pm    Post subject:  Anyone familiar with mel frequency cepstral coefficients? Reply with quote  Mark this post and the followings unread

I'm looking for pseudocode to calculate MFCCs (mel frequency cepstral coefficients) from a time-domain signal. All I can find online are engineering papers (which cover theory but not implementation), or very basic definitions with absolutely nothing about how to do it. Or, "use this library" and be done with it. No help if I want to do an implementation in SuperCollider.

The intent is to grab microsound events from an input stream using feature detection (SC has PV_JensenAndersen and PV_HainsworthFoote feature detectors) and then analyze them for spectral similarity so that I can spit out grain clouds of similar-but-not-exactly-the-same events. Eventually this will be based on live mic input.

I've been trying all week to work with raw FFT data but it's just not good enough... two sounds can be very different sonically but have surprisingly similar FFT magnitudes.

Thanks.

hjh

(I will also be looking on musicdsp.org but I can't access the site from work, for an unknown reason.)

_________________
ddw online: http://www.dewdrop-world.net
sc3 online: http://supercollider.sourceforge.net
Back to top
View user's profile Send private message Visit poster's website AIM Address
DrJustice



Joined: Sep 13, 2004
Posts: 2112
Location: Morokulien
Audio files: 4

PostPosted: Tue Sep 12, 2006 2:53 pm    Post subject: Reply with quote  Mark this post and the followings unread

Hmm... no replies yet?

Well, I'll give it a try although I'm not familiar with MFCC as such. However, I got curious, and you're right there's very little in the way of real examples out there. I include the definition (for a magnitude only output vector) for the benefit of our readers:

mfcc[] = fft( mel( log( fft( signal[] ) ) ) )

Of course it's the mel() mapping of the spectrum from Hertz to Mel that is the crux. I have no idea how to best choose mel frequency scaling and possibly truncation before the last fft, but I guess experimentation could shed some light on that. The equations to convert from Hz to mel and vice versa are:

mel = 1127.01048 * log( 1 + fhz / 700 )
fhz = 700 *( exp( mel / 1127.01048 ) - 1 )

If you can do the fft and log steps, the bit that's left is the spectral mapping. I assume you can simply interpolate in the Hz spectrum to pick the magnitudes at the frequencies corresponding to your chosen "mel bins" before the last fft, truncating (windowing?) the mel series as necessary when/if you overshoot the Hz spectrum. Perhaps the scaling/truncation can be dealt with by scaling the mel spectrum to exactly span the full Hz spectrum for the number of mel bins you choose.

Below is some code to implement the latter mapping. It assumes that you have the log magnitude spectrum in mag_hz_v[] and that the fft has has enough points (more than the mel spectrum) to give adequate resolution in the low frequencies. The resultant vector, mag_mel_v[], is the remapped spectrum ready for the final fft. The code is a working Lua program; note that the table index in Lua starts at 1.

Code:

-- Mapping of equidistant Hz based spectrum
-- to equidistant Mel based spectrum

fhz_bins  = 16
frq_hz_v  = {0,1000,2000,3000,4000,5000,6000,7000,8000,9000,10000,11000,12000,13000,14000,15000}
mag_hz_v  = {0,  15,  14,  13,  12,  11,  10,   9,   8,   7,    6,    5,    4,    3,    2,    1}

mel_bins  = 8
mag_mel_v = {}

function mel2hz( mel )
  return 700 *(math.exp(mel/1127.01048) -1)
end

function hz2mel( fhz )
  return 1127.01048 * math.log(1+ fhz/700)
end

mel_max      = hz2mel( frq_hz_v[fhz_bins] )
mel_interval = mel_max / (mel_bins-1)
mag_mel_v[1] = mag_hz_v[1]  -- DC component

for i = 2,mel_bins,1 do
  local frq_mel,frq_hz,j
  -- Find the the closest Hz bin
  frq_mel = mel_interval * (i-1)
  frq_hz  = mel2hz( frq_mel )
  j  = 2
  while( frq_hz_v[j] < frq_hz ) do
    j = j + 1
  end
  -- Use the magnitude of the closest Hz bin
  -- (In real life, interpolate between the two closest Hz bins)
  mag_mel_v[i] = mag_hz_v[j]
  print( "frq_mel, j, mag,  = ", frq_mel, j, mag_mel_v[i] )
end

-- Last step of MFCC algorithm, get the coefficients
-- mfcc_v = math.rfft( mel_bins, mag_mel_v )



Don't know if this is of any help, or if it is indeed correct - you may be well past this stage anyway...

DJ
--
Back to top
View user's profile Send private message Visit poster's website
dewdrop_world



Joined: Aug 28, 2006
Posts: 858
Location: Guangzhou, China
Audio files: 4

PostPosted: Thu Sep 14, 2006 12:42 pm    Post subject:   Reply with quote  Mark this post and the followings unread

Thanks, quite helpful.

Turns out I didn't need to do the log and the second fft. I had been differentiating the fft curve and locating places where the slope changes from positive to negative to identify magnitude peaks. Simply mapping the linear frequency scale onto mel frequencies and applying that algorithm is getting entirely acceptable results.

I will post the sc code sooner or later, just have some cleanup to do.

Thanks!
hjh

_________________
ddw online: http://www.dewdrop-world.net
sc3 online: http://supercollider.sourceforge.net
Back to top
View user's profile Send private message Visit poster's website AIM Address
DrJustice



Joined: Sep 13, 2004
Posts: 2112
Location: Morokulien
Audio files: 4

PostPosted: Fri Sep 15, 2006 2:27 pm    Post subject: Reply with quote  Mark this post and the followings unread

I'm glad it was of some use Smile

Do you mean that just taking the fft of the signal and doing a Hz to Mel mapping is sufficient for use as a feature extraction in your case?

I was thinking that it might be nice to give the output of the code so people can see what is going on - I.e. that there is increased spectral resolution Hertz-wise in the lower frequencies (still no interpolation though, so the magnitudes are a bit off):

Code:

The print() statement was changed to include some formatting:

print( string.format("frq_mel = %9.3f (%9.3fHz) mag = %2d", frq_mel,  mel2hz(frq_mel), mag_mel_v[i] ) )

And the output is:

frq_mel =   500.769 (  391.616Hz) mag = 15
frq_mel =  1001.537 ( 1002.321Hz) mag = 14
frq_mel =  1502.306 ( 1954.685Hz) mag = 14
frq_mel =  2003.075 ( 3439.851Hz) mag = 12
frq_mel =  2503.843 ( 5755.893Hz) mag = 10
frq_mel =  3004.612 ( 9367.647Hz) mag =  6
frq_mel =  3505.381 (15000.000Hz) mag =  1


DJ
--
Back to top
View user's profile Send private message Visit poster's website
dewdrop_world



Joined: Aug 28, 2006
Posts: 858
Location: Guangzhou, China
Audio files: 4

PostPosted: Sat Sep 23, 2006 7:53 pm    Post subject: Reply with quote  Mark this post and the followings unread

Yes, the resolution in terms of real frequency is better at the low end.

Here's the beginning and end of the 256-element table. mel increases linearly, fhz starts out with a delta below 20 Hz at first, increasing to around 1300 Hz by the time we hit Nyquist.

I know this isn't truly mathematically ideal but it gets reasonable results to the ear Smile

hjh

mel: 0, fhz: 0 .. 0
mel: 15.325758062803, fhz: 0 .. 21.533203125
mel: 30.651516125605, fhz: 21.533203125 .. 21.533203125
mel: 45.977274188408, fhz: 21.533203125 .. 43.06640625
mel: 61.30303225121, fhz: 43.06640625 .. 43.06640625
mel: 76.628790314013, fhz: 43.06640625 .. 64.599609375
mel: 91.954548376815, fhz: 64.599609375 .. 64.599609375
mel: 107.28030643962, fhz: 64.599609375 .. 86.1328125
mel: 122.60606450242, fhz: 86.1328125 .. 86.1328125
mel: 137.93182256522, fhz: 86.1328125 .. 107.666015625

...

mel: 3816.1137576378, fhz: 19982.8125 .. 20262.744140625
mel: 3831.4395157006, fhz: 20262.744140625 .. 20564.208984375
mel: 3846.7652737634, fhz: 20564.208984375 .. 20844.140625
mel: 3862.0910318263, fhz: 20844.140625 .. 21145.60546875
mel: 3877.4167898891, fhz: 21145.60546875 .. 21447.0703125
mel: 3892.7425479519, fhz: 21447.0703125 .. 21748.53515625
mel: 3908.0683060147, fhz: 21748.53515625 .. 22050


// partial code to make the table, taken out of context
Code:
      // depends on these functions
   ~freqToMel = { |self, fhz|
      1127.01048 * log(1 + (fhz/700))
   };
   
   ~melToFreq = { |self, mel|
      exp(mel / 1127.01048) - 1 * 700
   };

   // ... snip ...

      ~fftsize = nextPowerOfTwo(~fftsize);
      
      resolution = ~sampleRate.(self) / ~fftsize;
      ~melfreqsize = (~fftsize * 0.125).nextPowerOfTwo.asInteger;
      melreso = ~freqToMel.(self, ~sampleRate.(self) * 0.5) / ~melfreqsize;
      
         // melmap[mel_bin_index] == first bin index into fft
      ~melmap = Array.fill(~melfreqsize+1, { |i|
         fhz = ~melToFreq.(self, melreso * i);
         ((fhz + (resolution * 0.5)) / resolution).asInteger
      });

_________________
ddw online: http://www.dewdrop-world.net
sc3 online: http://supercollider.sourceforge.net
Back to top
View user's profile Send private message Visit poster's website AIM Address
Display posts from previous:   
Post new topic   Reply to topic Moderators: DrJustice
Page 1 of 1 [5 Posts]
View unread posts
View new posts in the last week
Mark the topic unread :: View previous topic :: View next topic
 Forum index » DIY Hardware and Software » Developers' Corner
Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Forum with support of Syndicator RSS
Powered by phpBB © 2001, 2005 phpBB Group
Copyright © 2003 through 2009 by electro-music.com - Conditions Of Use