Introduction

Once again I needed to embed some text files into a C application. The right way to do this is turn the data into a byte array and compile it in. At least it’s the most portable way because some compilers have string length limitations.

xxd -i is the easiest way to format the data so it can be compiled in. However, just like the last time, I don’t have access to xxd -i. Thankfully, I already wrote a Lua implementation. Unfortunately, I couldn’t use it because my current project is written in Python. I couldn’t pull in Lua just for this conversion.

My solution is to write a Python version. This is a nice complement to the header to binary Python code I wrote. Like that one I’ll put this in my Bin-Header GitHub repo.

The Code

bin2header.py

import argparse
import sys

def bin2header(data, var_name='var'):
    out = []
    out.append('unsigned char {var_name}[] = {{'.format(var_name=var_name))
    l = [ data[i:i+12] for i in range(0, len(data), 12) ]
    for i, x in enumerate(l):
        line = ', '.join([ '0x{val:02x}'.format(val=ord(c)) for c in x ])
        out.append('  {line}{end_comma}'.format(line=line, end_comma=',' if i<len(l)-1 else ''))
    out.append('};')
    out.append('unsigned int {var_name}_len = {data_len};'.format(var_name=var_name, data_len=len(data)))
    return '\n'.join(out)

def main():
    parser = argparse.ArgumentParser(description='Generate binary header output')
    parser.add_argument('-i', '--input', required=True , help='Input file')
    parser.add_argument('-o', '--out', required=True , help='Output file')
    parser.add_argument('-v', '--var', required=True , help='Variable name to use in file')

    args = parser.parse_args()
    if not args:
        return 1

    with open(args.input, 'r') as f:
        data = f.read()

    out = bin2header(data, args.var)
    with open(args.out, 'w') as f:
        f.write(out)

    return 0

if __name__ == '__main__':
    sys.exit(main())

The code explained

I find it amusing the setup code in main is larger than the actual conversion code.

def bin2header(data, var_name='var'):

The main function that does the conversion. We need a variable name for the file but we’ll allow a default one. This really should be called with a variable name in order to avoid conflicts when running this over multiple files.

    out = []
    out.append('unsigned char {var_name}[] = {{'.format(var_name=var_name))

First we need to make a list that will hold each line of the output. It’s more efficient to build and join than constantly concatenating strings.

We start by adding the variable that’s going to store our binary output.

    l = [ data[i:i+12] for i in range(0, len(data), 12) ]

xxd -i outputs 12 columns of data. We want this output to be identical so we’re going to pull off data in 12 byte chunks. This gives us the data split into a list with elements that the size we need. The last one might be smaller which is fine.

    for i, x in enumerate(l):
        line = ', '.join([ '0x{val:02x}'.format(val=ord(c)) for c in x ])

Now we go though each element which represents a line. We’ll hexify and add comma space between each element. One thing to keep in mind is the join does not add a comma after the last element. It will only put a comma between elements.

        out.append('  {line}{end_comma}'.format(line=line, end_comma=',' if i<len(l)-1 else ''))

Which leads us to adding the line to our out list. Each line is preceded by two spaces. The {end_comma} will be placed on all lines except the last. This takes care of the previous join not placing a trailing comma. We don’t want the last line to end with the comma because, well, it’s the end of data so there isn’t anything after.

    out.append('};')
    out.append('unsigned int {var_name}_len = {data_len};'.format(var_name=var_name, data_len=len(data)))
    return '\n'.join(out)

Finally, we add the closing bracket, set the array length, and output a single string.

Conclusion

Over all this turned out to be easier than I expected. Pythons list comprehension ended up making this a lot smaller than the Lua version.