what does a wav file data sample value mean?

mrjason · Joined: Jul 25, 2006 Posts: 8 Location: Austin, Tx

Hi all,

I've been trying to understand how a wav file works. I'm a software developer by profession, and I understand the header and layout of the data chunks just fine.. but I'm having a hard time figuring out what actual data values translate to in the real world.

For example if we have an 8 bit data chunk in 44.1khz sample for the left channel with a value of "511", what is that? Is that a directive that says make the speaker play a frequency close to 22khz? Are some of the bits there a volume level specifier and some of the bits frequency specifiers?

I'm just having trouble understanding what a sample chunk actually represents. I am a total amateur when it comes to music theory so maybe i'm missing something here. I've read that an "A" note is 440hz or something like that.. so if we had a wav file that has a solid A note fading from zero volume to high volume is that just a 440hz directive over and over or is there some multiplier that will increase volume somehow by setting a subsequent sample frame to a value that means something like 880hz?

JovianPyx

BananaPlug · Posted: Fri Dec 26, 2008 2:49 pm Post subject:

Like he said, very low level data. Similar to the representation of pixels in an image file.

mrjason · Joined: Jul 25, 2006 Posts: 8 Location: Austin, Tx

yes thank you for explaining a bit.

apologies for the confusion. you're correct in that there are only 256 values in 8 bits, I did some quick (bad) math when posting earlier.

I've seen the audio representation in DAWs before but I assumed (incorrectly?) that what I was seeing was just volume levels.

I'm still a bit confused though. Am I right in understand, now, that the value for any given sample is simply a volume level?

The confusion is probably best explained this way: consider for example's sake we have a wav file that is a mono 1000hz sample rate. Suppose this wav file is one second in length and contains a simple, pure, 440hz max volume signal for the duration of the file. So we would have 1000 8 bit samples, with 440 of those samples being a value of 127 (or 128?) and the non-440 samples being a value of 0? (or -127?)? Does that sound right?

If we further complicate things let's say we have the same 1000hz mono sample rate, and the same 1 second duration, and the same 440hz tone throughout the entire 1 second.. and we add in a 220hz tone for the duration as well. If in our example both tones have max volume, then for starters for the 440hz tone we'd have 440 samples with a value of 128, then half of those same samples would have "128" added to it for the 220hz values and we'd essentially clip on all 220 samples? yeah?

But if we did the same thing and had a volume/voltage value of "15" for both the 440hz and 220 hz tones then we'd have 220 samples with a volume of "30" (where 220 and 440 overlap) and 220 with a value of "15" (where 440 is, but 220 is not) and the other samples would be empty/mute samples?

Is that how it works?

mrjason · Joined: Jul 25, 2006 Posts: 8 Location: Austin, Tx

also, one more bit, as to the sinewaves aspect of this conversation. when you say values of +127 and +127, how does that translate to what's going on with volume? how does that translate to what's going on with the speaker/voltage?

is it possible to have an off-axis-centered sound sinewav in the wav file with peaks on the positive side being like +100 and the neg side being -20? what would that sound like, or is that not something that happens?

my basic understanding of speakers is that there's an electro magnet that's moving air at a rapid rate.. put a certain voltage rate on the speaker and the magnet attracts or repels and moves air. perhaps my understanding is too limited though, I thought the voltage was basically on, off, on, off.. these negative values make it sound more like perhaps the speaker at rest state is at "0" on the sinewave map and a positive voltage would make the speaker magnet attract and a negative would make it repel... ?

---

and as to the hertz, does a 440hz sample mean the 0 axis is crossed by the sinewave 440 (or 880) times per second? or does that mean there are 440 peaks per second, or both?

and since this is sinewaves rather than simple binary "on, off".. would a 440hz sample have tons of non-zero values but just 440 of them would be the peaks (or axis-cross, or whatever)?

JovianPyx · Posted: Sun Dec 28, 2008 9:10 pm Post subject:

JovianPyx · Posted: Sun Dec 28, 2008 9:23 pm Post subject:

mrjason · Joined: Jul 25, 2006 Posts: 8 Location: Austin, Tx

thanks for the clarification.

i thought of a better example to sum all of this up.

consider a 1000hz mono sample 1-second duration wav file.

that file will have 1000 8 bit samples in it that should be played 1 millisecond apart.

if this file had a single pure max-voltage 100hz tone in it then one cycle should last 10 samples, correct:?

sample 1: 0% of 127
sample 2: 50% of 127
sample 3: 100% of 127
sample 4: 50% of 127
sample 5: 0% of 127
sample 6: -50% of 127
sample 7: -100% of 127
sample 8: -50% of 127
sample 9: 0% of 127

(cleary my math is bad here because i've got 9 samples, and really the first 8 would be the cycle that repeats.. if i div 1000hz by 8 samples per frequency i get that this example above is actually .. 125hz?)

or something along those lines?

JovianPyx · Posted: Sun Dec 28, 2008 9:30 pm Post subject:

mrjason · Joined: Jul 25, 2006 Posts: 8 Location: Austin, Tx

i see. thanks so much for clarifying all of this for me.

urbanscallywag · Joined: Nov 30, 2007 Posts: 317 Location: sometimes

This is a MATLAB snippet that will generate the sine you mentioned in the post above. Sampling rate of 1000 Hz, tone at 100 Hz, first 10 samples of the series. The sine is generated with unity gain so I have scaled it by 127 and truncated the decimal to get an 8 bit answer.

urbanscallywag · Joined: Nov 30, 2007 Posts: 317 Location: sometimes

Hah, what's interesting there is that you can see the negative DC bias effect of truncation. The magnitude of the positive half cycle is one less than the negative half cycle. DC offset is a nightmare, analog or digital! Twisted Evil

BananaPlug · Posted: Mon Dec 29, 2008 3:21 pm Post subject:

JovianPyx · Posted: Mon Dec 29, 2008 4:06 pm Post subject:

Well, while it's typical for a wav file to have a 44.1 KHz sample rate, it is not cast in stone. It could actually have a sample rate of 1000 Hz, though that would not be useful for signals more than 500 Hz (realistically, somewhat less than that actually).

In fact, the digital synths I've designed have sample rates well above 44.1 KHz, one with a sample rate of 1.0 MHz, the others are 250 KHz and 200 KHz.

Also, there are uses for very low sample rates - when the signal being sampled has a low frequency. For example, if you are sampling a 60 Hz power signal you could use a low sample rate (perhaps 6 KHz or even 600 Hz) to conserve data storage space, assuming that you can still extract the desired information at that sample rate.

For audio, yes, 44.1 KHz is typical, just not universal.

BananaPlug · Posted: Mon Dec 29, 2008 5:04 pm Post subject:

JovianPyx · Posted: Mon Dec 29, 2008 5:25 pm Post subject:

I understand, and you were quite correct, I was merely expanding, really for the OP - who seems interested in DSP and being a programmer probably already has the math skills to do interesting things. I just wanted to make sure he understands that - really all sample rates are useful, but must be appropriate for the sampled signals, i.e., the Nyquist Limit.

I'll give a bizarre example you may appreciate: I have a Cirrus CS4344 DAC. 24 bit stereo (delta-sigma) with sample rates up to 200 KHz both channels driven. It will go much lower, however. Anyway, it's an I2S interface and has some master clock constraints as well as demanding that the master clock be a multiple N of shift clock, that N being one of 8 (I think) different values based on 32 and 48. The largest of those "dividers" is 1152. But the DAC has an upper limit of 50 MHz for master clock.

50 MHz / 1152 = 43.4 KHz (approx).

43.4 KHz is only a hair under 44.1 KHz. But the fact that the setup naturally provides for 1152 clocks per DAC enable means that I can do a lot of math and logic per sample. I think the difference between the standard 44.1 KHz and 43.4 KHz will go unnoticed. The lower Nyquist means small adjustments, but realistically should be as capable as 44.1 KHz in terms of music.

Shocked

The point is that 43.4 KHz is anything but standard, but it works well enough that the clock count advantage makes a bigger difference (to me).

Rolling Eyes

Sorry for the blather. Got nuttin' to do today... Rolling Eyes

urbanscallywag · Joined: Nov 30, 2007 Posts: 317 Location: sometimes

After designing some audio DSP I don't think I'd go back to 44.1 ksps instead of 48 ksps. The transition band between the signal of interest and the half sample rate is just too small at 44.1 ksps.

I also wouldn't use 96+ ksps unless necessary.

JovianPyx · Posted: Mon Dec 29, 2008 6:16 pm Post subject:

I use the highest sample rate a design will allow when designing a synthesizer (which I design for my own use). This comes down to balancing sample rate against other factors such as voice complexity or voice count.

And you may not agree with this - but I've had quite satisfactory results with synthesizer designs using high sample rates as a way of reducing audible aliasing artifacts as opposed to a more complex design that can run at lower sample rates. (though I've experimented with other designs using lower sample rates with very good results)

I have an odd way of looking at sample rate, kind of like the calculus notion of 1/n where as you increase n, the value approaches zero. As you increase sample rate, the representation approaches a continuous time signal. Approaches, but never attains. So I like to keep sample rates as high as I can. This is one reason I enjoy designing for FPGAs.

urbanscallywag · Joined: Nov 30, 2007 Posts: 317 Location: sometimes

I agree that you'll have less aliasing at high sample rates in waveform generation for example. It might give better results than a more complicated design at a lower sample rate.

But (in general) I don't agree with keeping data at those higher sample rates when it isn't necessary. Its largely a waste of computations/storage/etc. Some exceptions come in digital to analog and analog to digital conversion.

For example, if I have an unnecessarily oversampled audio signal, say 192 ksps. The number of taps in a FIR filter is inversely proportional to the transition band over the sample rate, Ft/Fs. So if my sample rate could have been 48 ksps but I'm using 192 ksps, I have to use 4 times as many taps to build the same FIR filter (and its running at 4 times the clock rate, right?!). Recursive filters have to use higher order where they are more sensitive to finite arithmetic. The impulse response gets longer making the filter sound bad. You can't use polyphase tricks if you aren't willing to go to a lower sample rate either.

So high sample rates have some use, but in general stick to minimal oversampling.

JovianPyx · Posted: Mon Dec 29, 2008 7:26 pm Post subject:

It really depends on the application. I can see where both approaches are quite valid, equally so.

urbanscallywag · Joined: Nov 30, 2007 Posts: 317 Location: sometimes

Like I said there are places for oversampling, but most of the time its a waste.

JovianPyx · Posted: Mon Dec 29, 2008 7:58 pm Post subject:

I would disagree that "most of the time" it's a waste. It really depends on what you are trying to accomplish. I have a set of applications that demand very high oversampling at one stage. The result uses a lower sample rate to the DAC, but interally it runs much higher. I am about to take the internal sample rate even higher since I've heard from others who've tried it with improved results. One example of this is the use of high oversampled generation of naive waveforms sent to a large FIR filter with the DAC's sample rate Nyquist cutoff characteristic. Higher oversample rate allows higher musical fundamentals to be used with less amplitude ripple at the high end. Currently, I'm getting to about 4.7 KHz before I hear the ripple. I'd like to push that up to 6 KHz if I can.

I am writing about synth design, not general audio design. Is that the difference? With synth design, I see a relaxation of other parameters when sample rate is increased. As long as my device (FPGA) will do it, I put no constraint on sample rate. There is no "too high" in my design world.

urbanscallywag · Joined: Nov 30, 2007 Posts: 317 Location: sometimes

Hey its OK I know a guy who designed a sample rate converter with 140dB dynamic range and I don't agree with that either. Waste away. Very Happy

urbanscallywag · Joined: Nov 30, 2007 Posts: 317 Location: sometimes

Do you use FIR or IIR as a "musical" filter? Do they get more relaxed requirements at higher sample rates?

I haven't had time to investigate the infamous state variable filter that actually works better at high sample rates. Razz

JovianPyx · Posted: Mon Dec 29, 2008 10:32 pm Post subject:

I've used only IIR filters for musical filters.

I first tried a simple single stage IIR filter which was lackluster.

I then tried a SVF, which was much much better - but I knew about it's higher sample rate constraint. Ok, so I run the design at an appropriately high rate. What is nice about a state variable filter is that it is resonant and very easy to tune, simply change one number to change cutoff, change another number to change Q. It is also a common analog construct. But because it has positive gain in the passband near cutoff, amplitude must be compensated. I have a simple subtractive monosynth (MIDI) that runs at 1 MHz sample rate, has a SVF and sounds very analog (to me...) It does portamento without artifacts.

Here are two samples:

http://www.fpga.synth.net/pmwiki/uploads/FPGASynth/GateMan_II_notes.mp3 (crappy clip, I know, it clips, but it still demonstrates the filter)

http://www.fpga.synth.net/pmwiki/uploads/FPGASynth/metal_pipe1.mp3

Portamento definately in use in the second of them. The second uses two oscillators hard synched to two other oscillators. Note that portamento is implemented by passing the pitch signal through single stage IIR filter Cool

According to literature I've read, an SVF can handle a cutoff up to only 1/6 of the sample rate. This is why I chose to run the sample rate as high as possible (1.0 MHz) to get the widest Fc range. Arithmetic calculation word size determines Q range. Wider being better. Hence why FPGAs give a freedom in this regard.

Also, GateManPoly uses one SVF per voice (8 voices). Cool

the higher sample rate allows this simplicity.