The evolution of a data structure - the WAVEFORMAT.

Article
10/18/2007

In the beginning, there was a need to be able to describe the format contained in a stream of audio data.

And thus the WAVEFORMAT structure was born in Windows 3.1.

typedef struct WAVEFORMAT { WORD wFormatTag; WORD nChannels; DWORD nSamplesPerSec; DWORD nAvgBytesPerSec; WORD nBlockAlign;} WAVEFORMAT;

The problem with the WAVEFORMAT is that it was ok at expressing audio streams that contained samples whose size was a power of 2, but there was no way of representing audio streams that contained samples whose size was something other than that (like 24bit samples).

So the PCMWAVEFORMAT was born.

typedef struct PCMWAVEFORMAT {
WAVEFORMAT wf;
WORD wBitsPerSample;
} PCMWAVEFORMAT;

If the application passed in a WAVEFORMAT with a wFormatTag of WAVE_FORMAT_PCM, it was required to actually pass in a PCMWAVEFORMAT so that the audio infrastructure could determine the number of bits per sample.

That worked fine and solved that problem, but the powers that be quickly realized that relying on the format tag for extensibility was going to be a problem in the future.

So once again, the structure was extended, and for Windows NT 3.5 and Windows 95, we got the WAVEFORMATEX that we know and love:

typedef struct tWAVEFORMATEX
{
    WORD        wFormatTag;         /* format type */
    WORD        nChannels;          /* number of channels (i.e. mono, stereo...) */
    DWORD       nSamplesPerSec;     /* sample rate */
    DWORD       nAvgBytesPerSec;    /* for buffer estimation */
    WORD        nBlockAlign;        /* block size of data */
    WORD        wBitsPerSample;     /* number of bits per sample of mono data */
    WORD        cbSize;             /* the count in bytes of the size of */
                                    /* extra information (after cbSize) */
} WAVEFORMATEX, *PWAVEFORMATEX, NEAR *NPWAVEFORMATEX, FAR *LPWAVEFORMATEX;

This solved the problem somewhat. But there was a problem - while all the APIs were changed to express a WAVEFORMATEX, there were still applications that passed in a WAVEFORMAT to the API (and there were WAV files that had been authored with WAVEFORMAT structures). The root of the issue is that there was no way of distinguishing between a WAVEFORMAT (which didn't have a cbSize field) and a WAVEFORMATEX (which did). To resolve this, for WAVEFORMAT structures kept in files, the file metadata provided the size of the structure, so we could use the size of the structure to distinguish the various forms.

When the structure was passed in as a parameter to a function, there was still a problem. For that, the code that parses WAVEFORMATEX structure must rely on the fact that if the wFormatTag field in the WAVEFORAMAT structure was WAVE_FORMAT_PCM, then the WAVEFORMAT structure is actually a PCMWAVEFORMAT, which is the same as a WAVEFORMATEX with a cbSize field set to 0. For all other formats, the code simply assumes that the caller is passing in a WAVEFORMATEX structure.

Unfortunately, the introduction of the WAVEFORMATEX wasn't quite enough. When you're dealing with two channel audio streams, it's easy to simply say that channel 0 is left and channel 1 is right (or whatever). But when you're dealing with a multichannel audio stream, it's not possible to determine which channel goes with which speaker. In addition, with a WAVEFORMATEX, there's still a problem with non power-of-2 formats. This time, the problem happens when you take a 24bit waveformat and try to pack it into 32bit samples - doing this can dramatically speed up any manipulation that needs to be done on the samples, so it's highly desirable.

So one final enhancement was made to the WAVEFORMAT structure, the WAVEFORMATEXTENSIBLE (introduced in Windows 2000):

typedef struct {
    WAVEFORMATEX    Format;
    union {
        WORD wValidBitsPerSample;       /* bits of precision */
        WORD wSamplesPerBlock;          /* valid if wBitsPerSample==0 */
        WORD wReserved;                 /* If neither applies, set to zero. */
    } Samples;
    DWORD           dwChannelMask;      /* which channels are */
                                        /* present in stream */
    GUID            SubFormat;
} WAVEFORMATEXTENSIBLE, *PWAVEFORMATEXTENSIBLE;

In the WAVEFORMATEXTENSIBLE, we have the old WAVEFORMATEX, and adds a couple of fields that allow the caller to specify packing of the samples, and to allow the caller to describe which channels in the stream should be redirected to which speaker. For example, if the dwChannelMask is SPEAKER_FRONT_LEFT | SPEAKER_FRONT_RIGHT | SPEAKER_LOW_FREQUENCY | SPEAKER_TOP_FRONT_LEFT, then channel 0 is the front left channel, channel 1 is the front right channel, channel 2 is the subwoofer, and channel 3 is the top front left speaker. The way you identify a WAVEFORMATEXTENSIBLE is that the Format.wFormatTag field is set to WAVE_FORMAT_EXTENSIBLE and the Format.cbSize field is always set to 0x16.

That's where things live for now - who knows if there will be another revision in the future.

Comments

Anonymous
October 18, 2007
PingBack from http://www.artofbam.com/wordpress/?p=10326
Anonymous
October 18, 2007
Hi Larry, Sorry if this post is slightly off-topic, but I've been searching for a couple of simple answers and your name is popping up everywhere ... you're obviously the guy to ask! I'm struggling with getting our music app to operate with the new mixer structure under Vista. It uses the traditional midiOutXXX to generate sound, waveInXXX to capture it (for VU meter display and WAV/MP3 file output) and mixerXXX to select the capture source. With all versions of Windows we would use the mux control with the mixer API to select the source for waveIn, typically this would be "MIDI Synth" for a hardware midi device and "Wave/MP3" for a softsynth. Now with Vista, there are no input lines at all ... so what exactly determines the input to the waveInXXX functions? Also the output lines are very limited, is it no longer possible to control MIDI Synth output separately from Wave output?
Anonymous
October 18, 2007
"ok at expressing audio streams that contained samples whose size was a power of 2, but there was no way of representing audio streams that contained samples whose size was something other than that" nSamplesPerSec = 12; nAvgBytesPerSec = 33; // average sample size is 2.75 bytes, is it not? "must rely on the fact that if the wFormatTag field in the WAVEFORAMAT structure was WAVE_FORMAT_PCM, then the WAVEFORMAT structure is actually a PCMWAVEFORMAT" Yes but "which is the same as a WAVEFORMATEX with a cbSize field set to 0." no, the cbSize field is absent, unknown, filenotfound, isverynull. 0 would require the full WAVEFORMATEX structure including that WORD with a zero value. "For all other formats, the code simply assumes that the caller is passing in a WAVEFORMATEX structure." Surely no? For old formats as detected in the wFormatTag field, surely only the old WAVEFORMAT structure should be assumed? For old formats there shouldn't even be a test of whether the cbSize field is present, because even if a tentative test suggests that it might be present, a 1-in-65536 chance doesn't mean next Tuesday it means the next millisecond. Just don't try it. "take a 24bit waveformat and try to pack it into 32bit samples" Use a graphics chip for that ^_^
Anonymous
October 18, 2007
This is why "EX" suffixes are scary :) (ie. what you you call the next one when "*EX" isn't good enough any more?) [that was rhetorical, incidentally] Though I note that WAVEFORMATEXTENSIBLE doesn't appear to have a way to actually map the channels -- it just says which ones are present. What happens if the wave provider wants to send the subwoofer as channel 0 and the front right as channel 1? If mapping were supported, that would also provide an easy means to swap left/right speakers, which was an option that you used to always have in the DOS days. (I don't know if any of this is actually necessary or desirable, I'm just musing.)
Anonymous
October 18, 2007
WAVEFORMATEXTENSIBLE, huh? Doesn't seem any more extensible than WAVEFORMATEX to me. Maybe it should have been WAVEFORMATEXEX? :-) Such are the perils of versioning... Granted, the COM approach of Interface, Interface2, Interface3 isn't that much more appealing. (And as far as versioning is concerned, COM has it easy. Versioning structs reliably is a lot more tricky.)
Anonymous
October 19, 2007
What about making the first member of the struct a version of the struct? ie, version 0 structs are one size, version 1 structs have these other members, and so on. Then you don't need to keep making new struct types.
Anonymous
October 19, 2007
No problem - Just let me borrow your time machine so I can go back to 1991 and ask the guys on the Windows team to change the structure.
Anonymous
October 19, 2007
Good explanation. Is this a follow up on the problem I had using my 10 year old Scotty wav file? Bill
Anonymous
October 19, 2007
Bill, nope - just a bit of history.
Anonymous
October 20, 2007
Hey Larry, According to http://en.wikipedia.org/wiki/Features_removed_from_Windows_Vista, "the ability to choose a different hardware or software MIDI synthesizer other than the default Microsoft GS Wavetable Software synth has been removed from the user interface for audio configuration in Windows Vista." If this is true, I'm shocked beyond anything. Can MS fix this in SP1 since it is only a matter of the UI?
Anonymous
October 21, 2007
"What about making the first member of the struct a version of the struct?" Some exstructables do that. In the wave format exstruction it finally got in as the 7th element, so they only have to avoid that element for a limited number of known formats. "Just let me borrow your time machine so I can go back to 1991" Borrow the FBI's time machine. Theirs goes back to 1990. That's when they destroyed documents that didn't even exist yet.
Anonymous
October 22, 2007
someone: The simple answer to your question is: It got cut because there was only so much time to get stuff done in Vista, and some things got left on the floor. The only scenarios we're aware of that are affected by this involve simple MIDI playback from web pages or WMP, and for those, the builtin software synth should work fine. All of the scenarios we're aware of that involve more sophisticated MIDI rendering involve applications that already allow the user to choose which MIDI synth is used for the application. Is there some significant scenario that we've missed? If there is, I'd love to hear about it.
Anonymous
December 07, 2007
So if that feature got cut, then doesn't MS have enough time between Vista RTM and SP1, why isn't it in SP1? This is a pretty important feature to users, especially musicians that connect external synths and play music through media players which use the default MIDI mapper of Windows. Applications like WinGroove, YAMAHA XG WDM SoftSynthesizer or the Roland Software Sound Canvas all get killed by Vista due to this removal. I also used to select the MIDI synthesizer depending on the program that I ran..e.g.Microsoft's Age of Empires includes music which sounds good only on an FM synthesizer or the Yamaha XG synthesizer. I am now forced to listen MIDI music using Microsoft's GS Wavetable synth if I decide to use Windows Media Player....if not I'll have to use some other player which allows the user to choose which MIDI synth is used for the application as you said above.

Share via

The evolution of a data structure - the WAVEFORMAT.

Comments

Additional resources