Introduction
A very common task when working with binary data in C is, converting it to and from Hex. It’s unfortunate that C doesn’t provide a standard function that can take care of this for us. However, it’s pretty easy to implement a set of functions to handle it.
Hex encoding is always twice the size of binary. Since hex is base 16 we can
take any unsigned char
value 0-255 and represent it in two hex digits,
0x00-0xFF.
When dealing with Hex encoding, always use two characters even if the numeric
value fits within one hex digit (0-F). It’s very important to have consistent
sizing because 0FAB could be 0F AB or 00 0F 0A 0B. I can’t stress enough that
always using the width of the largest value (FF) means you always know the
number of characters that represent each value. In this situation one unsigned char
is two hex characters. Going back to binary two hex characters will
always convert back to one binary value.
Binary to hex
char *bin2hex(const unsigned char *bin, size_t len)
{
char *out;
size_t i;
if (bin == NULL || len == 0)
return NULL;
out = malloc(len*2+1);
for (i=0; i<len; i++) {
out[i*2] = "0123456789ABCDEF"[bin[i] >> 4];
out[i*2+1] = "0123456789ABCDEF"[bin[i] & 0x0F];
}
out[len*2] = '\0';
return out;
}
“0..F” is a const
string here and we can index this as an array because it is
an array. Assignment to a const char *
variable means the variable points to
the memory address of the constant. Since it’s just a memory address we an
access it as an array.
There are a total of 16 hex characters. An unsigned char is 8 bits which is split into two 4 bit parts. 4 bits can have a value 0 to 15 which is the same number of characters for hex encoding. The right shift masks off the high part which is the first hex character and the 0x0F mask masks off the low part to get the second hex digit.
Hex to binary
int hexchr2bin(const char hex, char *out)
{
if (out == NULL)
return 0;
if (hex >= '0' && hex <= '9') {
*out = hex - '0';
} else if (hex >= 'A' && hex <= 'F') {
*out = hex - 'A' + 10;
} else if (hex >= 'a' && hex <= 'f') {
*out = hex - 'a' + 10;
} else {
return 0;
}
return 1;
}
Every hex digit needs to be turned back into a 4 bit binary value. Meaning 0 = 0, 1 = 1, … A = 10 … E = 14, F = 15. The character is subtracted from the base character in it’s range and for the alpha values 10 is added since they represent 10+. This calculation is based on the numeric values of each character in the ASCII text encoding table.
size_t hexs2bin(const char *hex, unsigned char **out)
{
size_t len;
char b1;
char b2;
size_t i;
if (hex == NULL || *hex == '\0' || out == NULL)
return 0;
len = strlen(hex);
if (len % 2 != 0)
return 0;
len /= 2;
*out = malloc(len);
memset(*out, 'A', len);
for (i=0; i<len; i++) {
if (!hexchr2bin(hex[i*2], &b1) || !hexchr2bin(hex[i*2+1], &b2)) {
return 0;
}
(*out)[i] = (b1 << 4) | b2;
}
return len;
}
The first thing we do is determine the size of the buffer and allocate it. Then we can move onto the main part where we combine the two 4 bit values into one 8 bit unsigned character.
Testing
Finally, here is a simple test app to demonstrate the use of each function.
int main(int argc, char **argv)
{
const char *a = "Test 123! - jklmn";
char *hex;
unsigned char *bin;
size_t binlen;
hex = bin2hex((unsigned char *)a, strlen(a));
printf("%sn", hex);
binlen = hexs2bin(hex, &bin);
printf("%.*sn", (int)binlen, (char *)bin);
free(bin);
free(hex);
return 0;
}
You might notice that the input variable a
is a string. We’ll, it’s still
data and we can treat it as binary. Using a string makes it easier to verify
the decode since we can print it out and see the result.