Introduction

Sound in Chip-8 is very easy. If the sound timer isn’t 0, play a tone. While this isn’t difficult, XO-chip adds another level of complexity. It uses a 16 byte pattern buffer with an associated pitch in order to create unique tones which can change per frame. Thus, XO-Chip can create basic music with it’s expanded audio.

XO-Chip Audio

This was by far the hardest thing to support while developing Chipped-8. Especially because I have even less experience with audio processing than I do with hardware.

XO-Chip sets a pitch and also a 16 byte audio pattern. The pattern is a square wave with each bit composing the wave. Each pattern is a frame worth of sound and it will loop if audio is not turned off or the pattern changed. Turned off is when the sound timer is 0.

The pitch is turned into a frequency using this formula 4000*2^((pitch-64)/48). Thankfully, the documentation included this formula. However, the documentation doesn’t go any further or describe how to turn the and pitch and pattern into actual sound.

Generating sound

Sound from the 16 byte pattern is one frame of time. Chip-8 runs at 60 FPS so each frame is 16.666666666 ms.

The audio output device is going to have an output frequency, typically 48000 or 44100 Hz. The frequency is the number of samples per second the output uses. Using 48000 which is 48000 samples per second and a time of 1/60th of a second we have 800 samples per frame.

So far, we know we need 800 samples and we have a 128 bit pattern to create the square wave. 800 is a lot bigger than 128. So, we need to stretch the square wave over 800 samples. We advance though the 128 bit pattern using an advancment step in order to stretch it into the 800 frames. However, the pattern has an associated pitch that needs to be taken into account when we stretch.

If we stretch to 800 we’ll be at a frequency of 48000 instead of what ever was specified by the pitch. To ensure we maintain the correct frequency we’ll divide the frequency for the pitch by the output frequency. This will give us our advancement step through the pattern. Something like 0.2. Meaning, we advance one bit through the 128 bit pattern every five samples. This allows us to interpolate the short pattern into a much longer one.

However, our step won’t directly match up to 800 samples (in this case). We’ll get through all bits in the pattern via our step in 640 samples. Leaving us with 160 extra samples we need to fill in. We could do some crazy math and interpolate further. However, with XO-Chip we can just loop back to the start of our pattern to keep filling in the samples. The spec implies this is correct by saying the tone should repeat if not changed. I verified this is what the Octo emulator does (the emulator made by the creator of XO-Chip).

Rounding the square wave

At this point we have a square wave that’s very square and doesn’t sound great. It can be smoothed using a number of different transformative filters. I went with a very simple IIR (Infinite Impulse Response) filter.

The filtering will change this:

[ 0, 0, 0, 0, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 0, 0, 0, 0 ]

To this:

[ 0, 0, 0, 0, 64, 89, 99, 103, 105, 106, 106, 106, 106, 105, 103, 99, 89, 64, 0, 0, 0, 0 ]

Each non-zero value indicates a peak and directly translates to a volume level. 0 means no sound 255 means maximum.

You can see the volume does increase with only using an IIR but it does create a rounder wave that doesn’t sound as jarring.

Formula

I’m using the equation output[i] = output[i-1] * weight + input[i]. The weight as a decimal from 0-1 determines how much the current or previous sample impacts current output. A higher weight, the more influence the previous sample will have.

Weight

I tested various weighting values and I found they provide different amounts of samples that ramp up from 0 to max. This was with a pitch of 127 and 48000 Hz output frequency. Other values will have a slightly different number ramp up and down samples.

WeightNumber of smoothed samples before/after peak
0.354
0.405
0.456
0.558
0.6511
0.7516

I noticed 0.60 and higher started to sound distorted. The wave peaks were becoming too long and the pitch started to be impacted.

When I tried 0.80, it got weird. Instead of a smooth wave it started to look like this:

[ 0, 0, 0, 0, 64, 115, 156, 188, 214, 235, 252, 10, 72, 121, 160, 192, 217, 237, 253, 11, 72, 121, 160, ... 96, 76, 60, 48, 38, 30, 24, 19, 15, 12, 9, 7, 5, 4, 3, 2, 1, 0, 0, 0, 0 ]

At that weight, or higher, noise was introduced into the wave.

I ended up using 0.40 as the weight because overall it sounded the best. That said, this is super basic and while accurate, it does not sound nearly as good as Octo’s sound.

Volume

One thing to keep in mind, the amplitude of the wave will determine the volume coming out of the sound device. The higher the value in the sample the louder it will be. Using the IIR filter will increase the max value. For example, Starting with 64, a given weight could increase the value to 106. Which makes it louder.

The range can be managed in a few different ways.

I’m using unsigned 8 bit integers per sample so there is a natural max that will be reached based on the weight and starting value. This is the approach I’ve taken. Using a starting value of 16 and a weight of 0.45, you’ll get a maximum of 37. Which is a reasonable volume.

Another method is to put a maximum value and if the output goes over, use the maximum. This has the disadvantage of squaring the sine wave which we’re trying to construct from a square wave.

Finally, real deal low pass, high pass, and or band pass filtering that handles maximum output values could be used. This will make for the nicest sound but it’s way more complex than I want to try understanding right now.

Conclusion

Sound is hard. Really hard. There is still a lot more I can do with Chipped-8 sound and I probalby will in the future. But right now I think I’m at a good stopping point with sound. It works pretty well and I’d say a step up from good enough.