… and the Beat Goes On
Understanding how MIDI time is kept by your DAW and how plug-ins sync to track tempo may not make you a better musician, but it might make you a smarter one.
by David Baer, Sept. 2013
Beat Me Daddy, Four to the Bar
In this article we’re going to take a close look at MIDI tempo and time signature treatment. After that, we’ll briefly examine how tempo-synced effects do their magic. This information will not make you a better musician. It will be of interest only to those of you with the Geek gene in your DNA. If that’s not you, then I’ll tell you up front that to read on would probably be a waste of your time. But if you are fascinated by what goes on inside that black box which is your DAW, you might like to learn a little something about these topics. At very least, you’ll likely come away with a greater appreciation for software developers of DAW/sequencers and plug-in effects, given the complexity with which they wrestle.
I came across a statement in a music forum in a thread about DAW tempo handling and the subject of what constitutes a “beat”. Someone made a comment something like “What’s a beat? A beat is when I tap my foot!”. Musically, it’s hard to argue with that, but your DAW may very well think otherwise.
Not that I’ve worked with every DAW/sequencer out there by any means, but it’s been my experience that in DAWs, beat usually means quarter note … period, no argument, no exceptions. Much of the time, this isn’t controversial or confusing. But if you happen to have a passage in the unusual time signature of 3/8, you are faced with the awkward reality of one and a half beats in a bar. You’ll just have to accept it!
In the beginning of the MIDI era, personal computers were still a few years off. I suspect the designers of the initial MIDI specification gave no thought at all to encoding time. A MIDI keyboard has a note depressed, and it sends a note-on message to some recipient to deal with ASAP. No time information required. Pretty simple.
But with the advent of affordable personal computers, sequencers came to play a major role in music production. For those not familiar with the term, “sequencer” is a program that records and plays back MIDI … like a DAW but without any audio capability. Once we had sequencers at our disposal, it became useful to have a standard MIDI interchange format, and thus was born the Standard MIDI File Format, which became a standard supported by the MMA (MIDI Manufacturers Association, http://www.midi.com), the same organization than maintains the MIDI specification itself.
And here’s where the standardization of a beat being a quarter-note regardless of time signature may have originated. We’ll take a look at time information encoding in MIDI files in a moment, but first a quick review of MIDI file types of which there are three:
- Type 0 – A single track holding everything in the performance.
- Type 1 – Multiple tracks, holding a single performance.
- Type 2 – Multiple performances in a single file. This option is not widely used or supported.
We’ll just consider Type 0 and Type 1 files in what follows.
For either type 0 or type 1, a MIDI file begins with a header segment. There’s one field in the header that’s relevant to timing: ticks per quarter-note. The same field can alternately hold SMPTE information, but we’ll not get into SMPTE here. Since this value is in the header, clearly it’s constant for the entire sequence. Your DAW probably has a ticks-per-quarter-note that’s user-modifiable. For SONAR, the default value is 960 ticks per quarter-note, for Cubase, it’s 480. But how fast is a quarter-note interval? This is where things start to get interesting.
Tracking Tempo and Time Signature
A sequencer does not need time signature information to play back a sequence, however nuanced that might be tempo-wise. But anything that translates to visual representation, like a piano-role view or a scoring view, will need to know time signature. A DAW metronome likewise will need this information. Finally, there are tempo-synced effects like delays or anything using a tempo-synced LFO. These effects do not need time signature, but they do need to know when beat boundaries are crossed.
For type 0 MIDI files, there is only a single track, so all data must reside there. For type 1 files, it is recommended that all timing events (tempo markings and time signature markings) go in the first track and that that first track holds no other MIDI data. Let’s assume that this is so and move on.
Next we need to address how timing is encoded in MIDI. An event in a track always begins with a delta time, that is, the time interval between the previous event and itself (or the start of the track if the event is the first one in the track). This will be true if the event is a MIDI event, like a note-on or controller message, or a meta-event. There are a number of meta-event types that include things like copyright notice or lyrics. The two of concern to us here are Set Tempo and Time Signature events.
But back to timing, here’s how it works. The value is the number of quarter-note ticks transpiring since the previous event. How many ticks in a quarter-note? That’s what we picked up from the header. But how fast is a quarter-note? Be patient, we’re almost there.
But we need to look at one additional aspect of the delta-time first. In the days of early computers, storage, both memory and external storage was size constrained. 5.25” floppy disks had only 360K available. Hard drives, where they even were installed, offered 10 or 20M. So, file size was an issue, and the MIDI file spec was optimized for size. Sequencer programs most assuredly use internal data structures very different from that on the file.
To keep size at a minimum, the MIDI file standard employs a clever trick that allows the delta-time value to take as few bytes as possibly. If a value can fit into seven bits (all delta-time values are positive), then it takes a single byte in the file. Fourteen bits requires two bytes, twenty-one bits, three bytes and twenty-eight bits, four bytes. The leftmost bit of each byte in the delta-time byte sequence indicates if it’s the last byte in the sequence: 1 means there’s more following, 0 means this is the final byte. So, what’s the maximum delta time? Let’s assume a fast tempo of 240 quarter-notes per minute and dense 960 ticks per quarter-note. We’ve got close to a day to work with before we get to the maximum value a four-byte delta time sequence can represent. Unless you’re recording some multi-day avant-garde extravaganza (yes, there actually have been a few of these written), this should always be plenty, needless to say.
So, with the timing out of the way, let’s look at the meta-events themselves. A meta-event has the format “FF” “xx” “ln” “data”. “FF” says “meta-event”, and “xx” indicates which type. “ln” is the number of bytes following that contains the event data.
So, for tempo-change, the format is FF 54 03 tt tt tt, where tt tt tt is a three-byte positive integer specifying the number of microseconds per quarter-note. At last … we finally know how fast a quarter-note is! A resolution of microseconds should provide adequate accuracy for any application. At 120 quarter-notes per minute, an hour long performance should not drift from absolute precision by more than about seven milliseconds.
A sequence with no tempo events is assumed to be 120 quarter-notes per minute. For a sequence that has tempo events, it would be reasonable to expect one at the very start of a track.
Time signature meta-events are a little more ornate. They have the form: FF 58 04 nn dd cc bb. The first two bytes, “nn dd” represent the numerator and denominator of the time signature. “nn” is what it is, but “dd” is a power of two where 02 represents a quarter-note, 03 an eighth-note, 04 a sixteenth, etc. The next two bytes, “cc bb” are of interest to only certain types of MIDI software. “cc” specifies the number of MIDI clock ticks in a metronome click, where there are 24 MIDI clock ticks in a quarter-note. The “bb” specifies the number of 32nd-notes in a MIDI clock. I believe these last two fields catered to a prevailing practice at the time the MIDI file standard was conceived. I honestly do not know if they really have much relevance today or not.
If you’re still with me, it’s probably a sure thing that you have that Geek chromosome in your DNA, so let’s look at another technical aspect of this. We’ve seen so far how your DAW can read in a MIDI file and extract all the relevant timing information. But how does that get communicated to plug-in effects that need it? We’re talking about things like soft synths that do tempo-based LFO modulation for example, or a delay that uses delay times based on current tempo divisions.
We’ll look at how this is done for plug-ins built to the Steinberg VST model. A naïve assumption would be that the host DAW identifies the plug-ins that need this sort of information and does some like:
- “tap-on-the-shoulder … excuse me but a new quarter-note has just started”
- “tap-on-the-shoulder … excuse me but a 24th of a quarter note has elapsed”
- “tap-on-the-shoulder … excuse me but a 24th of a quarter note has elapsed”
- … and so on.
But nothing of the sort happens. In fact, it’s just the opposite. A plug-in needing timing information always must ask the host for it. Typically, a plug-in gets called several hundred times a second to process a few milliseconds of data. A tempo-based plug in would normally make the call back to the host to get timing information at the start of each of these invocations.
The good news is that tempo-synced plug-ins rarely, if ever, require time signature information. They may need to know where beat boundaries are, but they seldom need to know the time signature. We should all offer a sigh of relief that this is so. Can you imagine the complications if our plug-ins had to wrestle with the complexities of time-signatures?
According to the VST specification, the host isn’t obligated to provide any answers when asked about timing. Furthermore, if it does provide an answer, it does not have to provide a complete one. Now, a DAW can be expected to cooperate to a reasonable degree. If it did not, none of your tempo-synced plug-ins would function. But a simple VST host that, for example, allows one to run a non-standalone soft synth, may not bother with timing-information requests.
Assuming the host does provide timing information, here are some of the data fields that may be offered:
samplePos – sample position (number of samples since host transport started)
sampleRate – samples per second
ppqPos – the quarter-note position related to SamplePos (a number where the integer portion is the quarter note number, zero-offset, and the fraction is the position within the quarter-note beat).
tempo – in beats per minute (quarter-notes per minute!)
kVstTransportPlaying – a flag indicating that the host transport is playing or stopped
cycleStartPos, cycleEndPos, nanoSeconds, and more.
All of the above fields are returned in a structure called VstTimeInfo. If you want to see the structure in all its glory, you can find it here:
But here’s the catch. If a host does return tempo information at all, it does not have to supply all the fields (but it does have to send back a set of flags noting which fields it did supply). Only samplePos and sampleRate are required. Any decent DAW will certainly provide tempo and the transport-playing flag.
With that, the plug-in developer can do pretty much anything needed. If the ppqPos is useful, it can be calculated from samplePos, sampleRate and tempo, provided the transport is playing.
But when the transport if stopped, samplePos will not have a value of any relevance. This means that a tempo-synced plug-in will also have to maintain its own internal clock for use when the transport is stopped, but presumably using the tempo value passed back by the host. However, if different tempo-synced plug-ins are chained together and running on their own internal clocks, there could be differences of opinion as to where beat boundaries are. If this happens, cut the developers some slack. You now should appreciate just what a marvelous job they’ve done making all this work in the first place!