Of Digital Bits and deciBels
In this tutorial we to look at two important mixing factors: computer sound internal formats and sound levels and explore how they’re related.
by David Baer, July 2013
I pulled one of my CDs out the other day that was released in 1998. I noticed that the cover proudly announced “recorded in 20 bits!” I had to chuckle since these days recording in anything less than 24 bits is regarded as unacceptable. But it also made me recall how back then, at a time before I had done any studying of digital sound technology, I was completely puzzled by what that meant. I knew CDs used 16-bit encoding, so just how the heck did they stuff those 20 bits onto the CD? What was the point?
Now, of course, it’s easy to grasp, but that’s only after I’ve spent many hours on computer sound forums and read numerous in-depth explanations of various aspects of digital recording. And the time on the forums has made it clear that this topic is one most digital sound enthusiasts struggle with at some point in their education.
So, in this tutorial we’re going to look at two things: computer sound internal formats and sound levels and explore how they’re related. Those of you who are already familiar with computer fixed and floating point data representations will probably want to skip the first part of the discussion, but still may be interested in the second where we’ll look at the relationships between the formats and the implications they have on sound levels and recording/mixing practice.
What we’re not going to get into is recording rates or compression issues. We’ll keep things simple and just assume everything is recorded 44.1K samples per second, just like the data on a CD. Just accept that that rate is sufficient to encode sound that encompasses the range of frequencies audible to humans. Also assume we’re interested only in recorded sound that will be released in CD format. Consideration of compression schemes like mp3 would needlessly complicate the discussion at this point.
Fixed Point Representations
As mentioned above, CDs use a 16-bit number to represent one sample of sound for one channel. Before we look at wave files, where there are more options, let’s understand exactly what that means. 16 bits can represent exactly 65,536 discreet numeric values. We can represent integers between zero and 65,535 for unsigned numbers and between -32,768 to 32,767 for signed. Or, and this is the way things work with digital audio, they can represent 65,536 unique output voltages coming out of a digital to audio converter (DAC) in your sound card.
So, say your sound card outputs between exactly -1.0 and +1.0 volts for a total range of 2.0 volts, the extremes will correspond to interal values of -32,768 and 32,767. Each increment of 1 in the 16-bit value represents an increment of 2/65,536 volts in the output. That is to say, we have 2 volts divided into values that are 2/65,546 (or 1/32,768 if you prefer) volts apart in magnitude.
But, you say: there’s no number that corresponds to exactly zero volts using this scheme. Aha, well spotted. While true, in practice this is unimportant. You’ll see later that at 90dB down from the maximum, not having an exact zero value is going to be utterly insignificant.
If in the process of recording sound we capture 16-bit values, then we want to use most of the available bits to capture as much detail as possible. The release format is going to be 16 bits one way or the other. If we don’t capture a full signal, detail will be lost. A mathematically rigorous explanation could be offered, but really this is just common sense.
The fixed point numeric format has non-negotiable maximum positive and negative magnitudes. If we try to assign a number that exceeds those limits, high order information is discarded leaving unpredictable values behind. When this happens, it’s called clipping. It can sound something radio static being sent over a dodgy connection to a speaker with a damaged cone. So, long story short, 16-bit recording/mixing should be done using Goldilocks levels: just right – high enough to capture detail but not so high as to clip.
Fortunately, these days we can use a 24-bit fixed format for representing sound data … up until the time it gets distributed on CD where it necessarily needs to be trimmed to 16-bit. Happily, sound cards with 24-bit analog/digital conversion capabilities are well within the price range of even amateur enthusiasts.
With 24-bits, our value range is between -8,388,608 and 8,388,607. This does not mean that we have a signal that is 256 times as strong as with 16-bits. Our hypothetical sound card still produces between -1.0 and +1.0 volts for the two extremes. What we have is 256 times as much detail. The upshot of using 24 bits for recording is that we can stay comfortably below levels where clipping occurs and yet still have more detail in the signal than can be retained when we finally convert back to 16-bit for release.
The wave file format is flexible. It supports 16-bit value scheme identical to CD data values, but can support other formats, including 24-bit and floating point which we’ll look at next. Other sound file types also are flexible in this fashion. The point is that we can retain 24-bit recorded sound in files as long as we want on our computers. It’s not until we want to do CD distribution that reduction in number of bits will enter the picture.
Computers have an alternative numeric representation that is preferable to fixed point for many types of calculations; it is called floating point. Consider the number 1,234. We can also write this as 1.234×1000 or 1.234×103. This is the idea behind floating point. There’s a base number that’s always close to 1 and a multiplication factor, called a mantissa. Floating point comes in two varieties: single and double precision, which use 32 and 64 bits respectively. The term “floating point” without a “single” or “double” qualifier normally implies single precision. The range of magnitudes representable using even single precision floating point is between incomprehensibly small and incomprehensibly large. The double precision limits are even more incomprehensible.
24-bit fixed numbers can be thought of as having 23 numeric bits plus a sign bit. For comparison, floating point offers 24 numeric bits and double precision offers 53. So, floating point offers more detail than 24-bit fixed. But wait … there’s more. There are always 24 bits available for recording detail using floating point. If you’re dealing with low levels, you are not “wasting” high order bits as you would with fixed point. As to high levels, with floating point clipping is essentially nonexistent!
Clipping occurs using fixed point when the values needing to be represented exceed the capacity of the format. But with floating point, numbers can get arbitrarily large, so as long as you remain in floating point, clipping is effectively impossible. Now, the moment we convert back to fixed, making sure the conversion does not produce clipping is imperative. But at least we only need to worry about it at that one point.
So, does this mean you can run floating point signals through your DAW and not worry about gain settings until the final output? To a certain extent this is true. But your metering will be difficult to read (although we may not care about metering anyway if we’re ignoring levels). A more compelling reason is your control over a mix. Sit down at your computer and bring up your DAW (or just look at the pictured SONAR faders). Set a fader at zero and adjust it either way by 3dB. No problem. Now set the level down to, say, -30dB. Try the 3dB adjustment now. Not so easy, I think you’ll agree. So, even with the safety net of floating point, a Goldilocks approach to things has advantages.
Formats and Levels
At this point you have a sufficient understanding of the capacities of the available numeric data types. Tying this knowledge to the concept of decibels should fill in the remaining piece of the puzzle. The decibel (more commonly written “dB”) permeates the language of sound technology. By itself, the dB does not connote a unit the way “gallon”, “meter” or “ounce” does. It always denotes a relationship between two values of like units or, if you prefer, a ratio. Those units can be volts or watts, air pressure units or anything else.
The dB is a logarithmic unit. Don’t know what that means? Don’t worry. It works like this:
- “B is 20dB less than A” means B is 1/10th the intensity of A. Alternately we can say: “B is 20dB down from A.”
- “B is down 40dB from A” means B is 1/100th the intensity of A.
- “B is down 60dB from A” means B is 1/1000th the intensity of A.
- And so on …
Note that the above numbers are valid for voltage and for sound levels of tracks within a DAW. Power (watts) is another story, but the explanation will have to wait for another time.
In your DAW and other sound software, scales will often be calibrated in dB, such as the faders pictured earlier. You will probably not even see “dB” anywhere, but it’s usually obvious because the numbers are not evenly distributed on the scale. Here the level 0 is associated with the maximum signal before clipping happens (or would happen if the underlying numeric format were fixed point).
Again, dB implies a comparison of two levels and it’s always important to understand to what the comparison is being made. Most places in sound software, the implication is that zero on a dB scale means “loudest advisable”.
To illustrate, consider the figure of the single cycle of a saw wave in SoundForge. The scale on the left is clearly logarithmic and the maximum value is 0 dB. Note also we have 0 dB pictured at both the positive and negative extremes. I saved this single cycle as a wave file (one channel, 16-bit), extracted the data into a spreadsheet in Excel and graphed it. The slope is just about what we see in the SoundForge depiction, but in this case the y-axis units are the fixed-point values encoded in the wave file. So, you can readily see that whether we’re using dB or integer values, the essential information is the same.
The Most Important Things …
Let’s take a brief detour and consider the compelling demonstration that can be seen/heard here:
This article by audio authority Ethan Winer demonstrates, with audio examples, how a signal that’s down 45 to 55 dB in a mix becomes inaudible. This is true even when the lower-level signal is blatantly irritating and the higher-level content is smooth music devoid of percussive transients. This is such a useful piece of information that you should never forget it when making day-to-day decisions in your tracking and mixing.
Let’s look at some further basic truths. The following figures were derived using the dB Voltage Ratio calculator available at:
Our DAW quantities are digital numbers, not volts, but they end up in volts coming out of your sound card, so a voltage calculator (as opposed to wattage) is the correct one to use.
The largest value that can be encoded on a CD is over 90dB greater than the smallest in magnitude (closest to zero). Thus, we see how the commonly quoted fact that the dynamic range of a CD is 90dB was arrived at.
Likewise, we can see that a 24-bit format (23 bits numeric plus sign) offers an immense dynamic range of over 138dB, well in excess of the range of human hearing in the most capable of listeners.
Using 20 bit fixed format vs. 24 bit causes us to lose about 24dB in detail. But we’ve still got a dynamic range of 114dB, comfortably in excess of what can end up on a CD.
So, finally we are armed with some iron-clad facts we can apply to making informed decisions about recording practice.
Let’s assume we want to record to 24-bit fixed point. If we set our peaks at 24dB down from maximum, we’re effectively keeping the high order four bits for headroom, and as a result, we’re accepting that we have four fewer bits at the other end for detail. But even with 20 bits of usable precision, we’ve still got 4 bits of “extra detail” compared to 16 bit format. In practice, however, it’s far more common to leave about 12dB (or 2 high order bits) of headroom, which is normally plenty for unanticipated peaks.
Your DAW may give you the choice of using single or double point floating point for internal signal processing (even if the tracks are stored externally as fixed integer). While state-of-the-art PCs can handle double precision without too much extra strain, it does require more processing power, memory, and if exported as such, more disk space when using double precision. How much does using double precision buy you?
Figure it this way, in single precision, you’ve always got 24 bits, irrespective of signal level. Eventually that will be reduced to 16 fixed point bits for final export. Computations in effects processor calculations will result in some level of round-off error. Staying with double effectively reduces that accumulated round-off error to zero in the final export. But you’ve already got a “buffer” of 8 bits in single point floating with which to shield the significant high-level bits from seeing the effects of the round-off error. So, I’m not asserting that you will never, ever, ever hear no difference between using single and double floating point. But I am suggesting that the majority of time, common sense tells us there will be no audible difference.
This is not to say that you’re not completely avoiding double precision when you tell your DAW to use single. Some plug-in developers will determine that double is advantageous or even essential in calculations for their purposes. In such cases they may use it “under the covers” without your knowledge or consent, converting back to single precision only when passing the results to output ports.
Finally, there’s the subject of dithering. This is a technique wherein very low-level noise is added when reducing the number of bits, usually when exporting from our DAW. The low level noise is in the form of random assignment of values to the lowest level bit in each data value. Some people assert that dithering, while it does no harm, is completely unnecessary. Others claim it is important and a few individuals have even boasted they can tell whether it’s being used or not. Well, let’s see. Dithering would be 90dB down from the loudest signal on a CD, and let’s say 65dB to 70dB down from a typical level. But Ethan Winer has shown how even at 45dB to 55dB down, a weak signal becomes inaudible in the presence of a stronger one. So, go ahead and agonize over which dithering algorithm you’re going to go with if you’re truly concerned. Personally, I’m not going to be losing any sleep over it, one way or the other.