Posted on

A Layman’s Explanation of an Audio Engine

If you’ve worked with professional audio, you’ve certainly used a DAW before.  If you’ve worked in games, you’ve likely worked with middleware and certainly heard of an audio engine before.

But do you actually know what the heck an audio engine really is, does, and how programmers control and manipulate it?

For the vast majority of you, I bet you would shrug your shoulders or try and quickly search for a definition or explanation online.  Not too long ago, I would’ve done the same exact thing.

But after quite a bit of time working with code, fumbling in darkness, and trying hard to understand what the heck an audio engine really is – I’m happy to say I can explain it, at least in layman’s terms.

Mind you – any semi-serious audio software engineer will likely read this and either cringe or want to crucify me, because it’s not exactly right.  I’m not talking about timing, threading, memory, or anything else.  Can be, and almost always is, much more complicated than what I’m about to explain.

But I want you to understand, at least on some level, what you’re actually working with when you use audio software every day.  It will help you get better at what you do because you’ll understand more of what’s going on.  You’ll also be able to walk up to programmers you work with and say “I UNDERSTAND WHAT A BUFFER IS!”  (please, seriously, I hope you do this)

A Buffer

If you’ve ever worked with Pro Tools, then there’s a large chance you’ve heard of the “buffer”

When I first started using Pro Tools, my instructors taught me vaguely what this is and how it’s supposed to work.  The explanation was something like…

“The buffer size determines the amount of latency in Pro Tools.  The larger the buffer, the more effects you can use without the system bottlenecking – but there’s more latency.  The smaller the buffer, the less latency, but you’re more likely to break Pro Tools.”

What they meant by “break Pro Tools” is that the system would hitch, glitch out, and playback would immediately stop.

You can see this in action if you have Pro Tools, make your buffer size small, and add a bunch of reverb effects to your session.  Press play, your session will likely crap out.

But why does it do that, exactly?

“Well, reverb requires a lot of processing power.”

Yes, but what does that mean?

Most of you imagine you’re running out of CPU, memory, hard drive space, something weird – right?

Well, kind of.

The explanation for this all comes back to defining exactly what the buffer is.

Explaining the Buffer

All audio is, within a computer system, is numbers.  No, I don’t just mean binary ones and zeroes, even though technically that’s correct.

For our example (this isn’t always exactly how it is), the audio data in your system is numbers between negative 1 and positive 1.  For example…

Waveform
Image from commons.wikimedia.org

Notice how positive 1 and negative 1 represent the polarity peaks.

If you understand how audio sampling works, you know that the system takes “snapshots” of whatever audio signal you’re working with at the given sample rate (44.1k, 48k, etc).

The audio buffer is a collection of these samples being utilized by the system at one time.  For those of you who’ve done a little programming in your life – a buffer is an array of audio samples.  That’s it.

So when you’re giving a size to your buffer in Pro Tools, you’re simply telling the system how many audio samples it should be working with at any given time.  No, the system does not load all of your audio file(s) at once into the buffer, it only loads bits and pieces, more and more over time.

How the Buffer Works

There are multiple types of buffers you can use in an audio system.  I’m going to cover two – a double buffer, and a ring buffer.

Mind you, other systems with computers also use buffers (has your Netflix or Youtube account ever “buffered”?) – so when you learn this, you’ll be able to apply it to much more than audio.

With any type of buffer, what the system is doing when it’s live and running is both writing and reading at the same time.

That means that data is being written to the buffer (aka a small chunk of your computer’s active memory) while the same time that (for this example) your sound card is reading from the buffer.

That whole “sound card is reading the buffer” part is where your computer is making noise you can hear, by the way.

Both of these writing and reading events happen rapidly, and the trick is to make sure that your system isn’t writing so slowly that your soundcard catches up and has nothing left to read.  When that happens, in Pro Tools for example, your buffer is too small!

Therefore, as you can see, these buffers are a little delicate and handled with care.

Double Buffer

With a double buffer system, you’re working with – kind of obviously – two buffers.  You have one buffer to write to, and second buffer to read from.  This helps, in some ways, to prevent bottlenecking.  In theory, you can also write tasks for how the computer should handle if the system hasn’t finished reading when the reading buffer has been read through.

Ring (or Circular) Buffer

A ring buffer is a single buffer that has a reading and writing “position”.  The system writes to the buffer, and a bit later the system begins to read from the buffer (which is the start of playback).  If you’ve ever watched the playback bar on a Youtube video load, you’ve seen more and more data get read into memory slowly over time while you watch the video (your playback position is the reading position).

Here’s a visualization of it:

As you can see in the image, for example, the “Tail” position would be writing new data to the buffer, while “Head” is the position where data is extracted from the buffer.  At no point should the head catch up to the tail.

That means dire consequences!

DSP, Plugins, and the Buffer

So cool – you now understand that a buffer is where the system writes live audio data to and reads that same data back from to play out of your sound card.

But how do audio plugins, for example, fit into this scenario?

Well, plugins affect the data just prior to the actual writing of the data into your buffer.  This is going to get a bit weird with words…

The “Write Position” is only a point on your buffer.  Let’s say it’s the 3rd audio sample into your buffer.  When your system reaches that position, it can then affect the data currently stored (or add data if there is none) at that position.

Your plugins are simply a mathematical algorithm that the system uses to process a number which it then writes to the buffer.  So when you tweak values in your plugin, you’re essentially affecting the values of a math problem that your computer is solving, and then storing that data.

Then your soundcard comes along and reads that data and turns it into noise.

Pretty cool, right?

…and you say you’re bad at math!

Caveats

Again, bear in mind this is simply how buffering works, which is also a core part of the playback of an audio system.  I would advise not walking up to your programming team and bragging that you know all about how an audio engine works now – you don’t!

But, my hope is that you walk away from this understanding a bit more about how audio works within a computer system, and you “get” what you’re actually doing a bit more!  It’s not as insanely complicated as you might think!

Interested in More?

If you like this series and you’re interested in learning more about technical implementation with Wwise and Unity, including programming your own basic custom triggers, sending messages, affecting RTPCs, etc. then enter your contact information below to get on a waiting list for “Basic C# Implementation with Wwise and Unity” – a video course I’m in the process of developing.

By entering your information below you will be signed up to my email newsletter, and your name/email will be attached to a specific interest group that will get further information on the course.  Even if you’re on my list, please enter your information again if you’re interested in more information!


Copyright 2016-2021, NIR LLC, all rights reserved.