Introduction

Lists (dynamic arrays) are yet another super useful data structure that C just doesn’t have. C arrays are great for when you can get away with a fixed size but not so fun if you need to dynamically expand because you don’t know quite how many elements you’ll need.

You could use a series of reallocs and memmoves but that’s going to get old really fast. It’s also error prone and not obvious when growth is needed. Not to mention you’ll end up with them all over the place.

The biggest issue I have with trying to manage a dynamic list manually is, it’s hard to track meta-data about the list. For example, how many elements are in the list. That’s pretty important if you ask me.

One alternative to C arrays is a linked list. Not what I want because I want continues memory access and I don’t want the overhead of having a node per element.

Just like we created a generic hashtable, we can create a generic list using void pointers. This generic container will leverages void pointers to allow holding any type of object. We’ll make type safe wrappers around this generic container in order to get back the type safety we lost due to using void pointers.

Design

We can’t get around the fact that C arrays are how we need to create a dynamic list but we can abstract them so we don’t get into the mess of realloc and memmove. Plus having one generic data type means we will have less error prone and easier to understand code. Not to mention just like the generic hashtable, we can wrap it for type safety.

Header

list.h

#ifndef __LIST_H__
#define __LIST_H__

#include <stdbool.h>
#include <stddef.h>

struct list;
typedef struct list list_t;

typedef int (*list_eq)(const void *a, const void *b);
typedef void *(*list_copy_cb)(void *);
typedef void (*list_free_cb)(void *);

typedef struct {
    list_eq      leq;
    list_copy_cb lcopy;
    list_free_cb lfree;
} list_cbs_t;

typedef enum {
    LIST_NONE = 0,
    LIST_SORT = 1 << 0
} list_flags_t;

list_t *list_create(const list_cbs_t *cbs, list_flags_t flags);
void list_destroy(list_t *l);

size_t list_len(list_t *l);

bool list_append(list_t *l, void *v);
bool list_insert(list_t *l, void *v, size_t idx);
bool list_remove(list_t *l, size_t idx);

bool list_index_of(list_t *l, void *v, size_t *idx);
void *list_get(list_t *l, size_t idx);
void *list_take(list_t *l, size_t idx);

void list_start_bulk_add(list_t *l);
void list_end_bulk_add(list_t *l);

void list_sort(list_t *l, list_eq e);

#endif /* __LIST_H__ */

The header consists of callbacks, flags and functions. Lets look at the callbacks first.

Callbacks

typedef int (*list_eq)(const void *a, const void *b);
typedef void *(*list_copy_cb)(void *);
typedef void (*list_free_cb)(void *);

typedef struct {
    list_eq      leq;
    list_copy_cb lcopy;
    list_free_cb lfree;
} list_cbs_t;

The list can take ownership of anything put into it if the copy and free functions are set. Anything in the list will automatically be destroyed when it’s either removed or the list itself is destroyed. This is great because it means we don’t have to worry about managing the life of objects we’ve put into the array.

We also have a qsort style equality function which seems odd. Yes, we need it. It allows us to do two very important things. We can check if a value already exists in the list. It also allows us to have a sortable list.

Flags

typedef enum {
    LIST_NONE = 0,
    LIST_SORT = 1 << 0
} list_flags_t;

Next up we have the flags. Right now there is only one flag that does something. That’s the sort flag. In the future we can easily add more flags and extend the list’s functionality.

Functions

Most of the functions are self explanatory but we should talk about a few of them.

bool list_index_of(list_t *l, void *v, size_t *idx);

list_index_of is a special function because it servers two purposes. First, it will tell us if that object already exists in the list. Second, if it does exist it will give us the index as an out parameter. We’ll make idx optional so we can use this just to check for existence.

void *list_get(list_t *l, size_t idx);
void *list_take(list_t *l, size_t idx);

We have list_get and list_take which seem to be the same thing as decried by the prototype. We’re going to make a distinction between these two where get returns the object but leaves it in the list and take does not.

This means when using get, the list still owns the object and is responsible for it (if a free callback was set it will destroy it). We need this generic implementation to return the object non-const but most wrappers will have it const to indicate ownership.

Now for take. Basically, we’re providing a way to take things out of the list. In this case we’re giving ownership to the caller. They’re going to be responsible for any destruction (free) and what not.

Since both of these are not const and they deal with ownership we need to be very careful because we could end up in a double free situation with get and a memory leak with take. This is another reason why type safe wrappers are very important

void list_start_bulk_add(list_t *l);
void list_end_bulk_add(list_t *l);

When the list was built as a sorted list it will automatically sort new entires when they’re added. If there is a large amount of data that’s going to be added we can optimize adding by delaying the sort until everything has been added. Bulk insert start suspends the auto sorting and end will sort and reenable sorting on add. note, The list should not be used in any way other than add while bulk insert is enabled.

void list_sort(list_t *l, list_eq e);

Finally, we have a manual sort function. This allows us to sort the items in an unsorted list. Assuming an equality function was registered.

Implementation

list.c

Includes

#include <stdlib.h>
#include <string.h>

#include "list.h"

Object Data

static const size_t list_block_size = 32;

struct list {
    void         **elements;
    size_t          alloced;
    size_t          len;
    list_cbs_t      cbs;
    list_flags_t    flags;
    bool            inbulk;
};

The best way for us to handle the list internals is to just have it be an array. I mean that’s technically what a list is, right? All we’re doing is making an easy to use object which takes care of all the reallocs and what not for us in a safe and transparent way.

We’ll use an over allocation strategy but we aren’t going to use growth by a power of two. Instead we’re going to grow by blocks of list_block_size. Why not use growth by a power of two you ask. We’ll we could and there isn’t anything wrong with it but we’re not. Go ahead and tweak list_insert if you don’t like my block based growth.

Finally, we have a flag so we know if we’re in a bulk insert. This way we can disable insert sorting temporarily.

Default Callbacks

static int list_passthrough_eq(const void *a, const void *b)
{
    void *sa = *(void **)a;
    void *sb = *(void **)b;

    return sa - sb;
}

All callbacks are optional but we still need to have an equality function. We can’t just compare the value of the pointer because this is a generic container which could hold any kind of object. Normally the comparison function parameters are the address of two elements in the array. Meaning, the parameters will need to be dereferenced before they can be compared. However, we can’t do that because we don’t know the actual object type of the parameters. Instead we’re going to compare the pointer address of each object in the array.

Create and Destroy

list_t *list_create(const list_cbs_t *cbs, list_flags_t flags)
{
    list_t *l;

    l           = malloc(sizeof(*l));
    l->elements = malloc(sizeof(*l->elements)*list_block_size);
    l->alloced  = list_block_size;
    l->len      = 0;

    memset(&(l->cbs), 0, sizeof(l->cbs));
    if (cbs != NULL) {
        l->cbs.leq   = cbs->leq;
        l->cbs.lcopy = cbs->lcopy;
        l->cbs.lfree = cbs->lfree;
    }
    if (l->cbs.leq == NULL)
        l->cbs.leq = list_passthrough_eq;

    l->flags = flags;

    return l;
}

This is just some basic initialization we need to happen. There isn’t really anything interesting going on. We just allocate the object data and set the callbacks.

void list_destroy(list_t *l)
{
    size_t i;

    if (l == NULL)
        return;

    if (l->cbs.lfree != NULL) {
        for (i=0; i<l->len; i++) {
            l->cbs.lfree(l->elements[i]);
        }
    }

    free(l->elements);
    free(l);
}

Now destroy on the other hand, is slightly more interesting. Since we can have the list take ownership of everything and we can associate a free function, we’ll need to use it here. This is pretty simple, we just iterate the list of elements and destroy each one.

Length

size_t list_len(list_t *l)
{
    if (l == NULL)
        return 0;
    return l->len;
}

The object has a variable to track the number of items it’s holding. This makes it easy for a caller to know how many items are in the list.

Insertion

There are two parts to insertion. With sorting enabled and with sorting disabled. Where items are inserted differs between these two modes. When sorted, the insertion index will be found using the quality function against items currently in the list. Whereas, unsorted the insert will happen at the given index.

Insert sorting

The absolute worst way to handle sorting on insert is insertion sort and the best way is using a variation of binary search. While C has a standard bsearch function which does binary search it only finds a matching element. This is fine if we’re looking for something but we need to find where something should go in the array. Yet we still need a binary search to find elements for the index_of function.

So we end up with a binary search function that works with indexes. It will let us find the index of something within the array as well as the index where something should be inserted. Thankfully, we already have binary search and insert functions we can use.

We need it to return a bool because when searching we need to know if the value is found. Otherwise, with insert it shouldn’t fail (except for misuse). Just for fun (Okay, it’s a helpful addition) we’ll make this stable. If there are matching values insert will always return the index after the last one. Matching will always return the first match. The only downside is a little bit extra processing.

Append

bool list_append(list_t *l, void *v)
{
    if (l == NULL || v == NULL)
        return false;
    return list_insert(l, v, l->len);
}

We have insert and append function and it makes sense that append calls insert and puts the object at the end when unsorted. For a sorted list, append will function exactly the same as insert.

Insert

bool list_insert(list_t *l, void *v, size_t idx)
{
    if (l == NULL || v == NULL)
        return false;

    if (l->alloced == l->len) {
        l->alloced += list_block_size;
        l->elements = realloc(l->elements, sizeof(*l->elements)*l->alloced);
    }

    if (l->cbs.lcopy != NULL)
        v = l->cbs.lcopy(v);

    if (l->flags & LIST_SORT && !l->inbulk)
        binary_searchidx(l->elements, l->len, sizeof(*l->elements), v, &idx, true, l->leq);

    if (idx > l->len) {
        idx = l->len;
    } else if (idx < l->len) {
        memmove(l->elements+(idx+1), l->elements+idx, (l->len-idx)*sizeof(*l->elements));
    }
    l->elements[idx] = v;

    l->len++;
    return true;
}

Before we delve into the inner workings of this function lets look at the return type. Basically, this is only ever going to return false if either the list or value are NULL. So it’s safe to ignore the return. We could have made these void like the hashtable but… this is more correct. Personally I could go either way on this.

Let’s go into how this function works in a bit more depth.

    if (l->alloced == l->len) {
        l->alloced += list_block_size;
        l->elements = realloc(l->elements, sizeof(*l->elements)*l->alloced);
    }

If we’ve run out of space we, obviously, need to grow so we can put more stuff into the list. Pay special attention to the realloc. We must multiply the number of allocated elements by the size of a void pointer! I’m talking about this part: sizeof(*l->elements)*l->alloced.

Now why do we need to do this? We have an array of void pointers and a void pointer isn’t 1 byte. It can be 4 on 32 bit systems and 8 on 64 bit systems. If we had an array of chars we would have 1 byte for each element but we’re not so we need each element to be size of void *. We’ll use this logic whenever we’re manipulating array members.

    if (l->cbs.lcopy != NULL)
        v = l->cbs.lcopy(v);

If necessary we copy the value being put into the list. We could have made a passthrough function like list_passthrough_eq and always call copy but why bother. It’s less work to just check for NULL. Especially since this is the only function we’re going to ever copy the value. If this were to change, we might want to think about adding a copy passthrough function.

    if (l->flags & LIST_SORT && !l->inbulk)
        idx = binary_insert(l->elements, l->len, sizeof(*l->elements), v, l->leq);

    if (idx > l->len) {
        idx = l->len;
    } else if (idx < l->len) {
        memmove(l->elements+(idx+1), l->elements+idx, (l->len-idx)*sizeof(*l->elements));
    }
    l->elements[idx] = v;

If we’re have a sorted list we first need to determine the index we need to insert at. The index passed into this function doesn’t matter and is ignored when sorted because, well, the list is in sorted order!

Once we know the index, we need to verify it’s valid. I want to make this an easy to use object so we’re going to make any index above the last append. This means someone using this could be sloppy but I’d rather have something easy to use than something as ridged as an ironing board.

The memmove is precisely why we need an object like this! Look at that thing! Do you really want to have this all over the place in your code? All that just to make space and insert something into the list.

Remove

Think about the header a bit. We have a take function which will remove from the list without calling lfree. remove will remove from the and call lfree. These are 99% the same so we’ll have a common function which will optionally lfree.

static bool list_remove_int(list_t *l, size_t idx, bool do_free)
{
    if (l == NULL || idx >= l->len)
        return false;

    if (do_free && l->cbs.lfree != NULL)
        l->cbs.lfree(l->elements[idx]);

    l->len--;
    if (idx == l->len)
        return true;

    memmove(l->elements+idx, l->elements+(idx+1), (l->len-idx)*sizeof(*l->elements));
    return true;
}

All we need to do is call lfree if requited and shift the elements to fill the gap after removing something causes. Once again we have a very nasty memmove which is nicely abstracted.

bool list_remove(list_t *l, size_t idx)
{
    return list_remove_int(l, idx, true);
}

Finally, the actual remove is just a wrapper around remove_int.

Index of

bool list_index_of(list_t *l, void *v, size_t *idx)
{
    size_t i;
    bool   found = false;

    if (l == NULL || v == NULL)
        return 0;

    if (l->flags & LIST_SORT && !l->inbulk) {
        if (binary_search(l->elements, l->len, sizeof(*l->elements), v, idx, l->leq) != NULL) {
            return true;
        }
        return false;
    }

    for (i=0; i<l->len; i++) {
        if (l->cbs.leq(&v, &l->elements[i]) == 0) {
            found = true;
            break;
        }
    }

    if (found && idx != NULL)
        *idx = i;
    return found;
}

This is there where the binary search comes in. If the list is sorted and we’re not in bulk insert situation, we can use the binary search we implemented previously. Hooray for O(log n) performance! Finally something reasonable. Sadly binary search only works with sorted data so we can’t always use it.

Unfortunately, when not sorted, we have to use an O(n) linear search (it’s a list after all). I really hate this but there isn’t a way around it for an unsorted list. So we just need to go though each element and check it.

Get and take

void *list_get(list_t *l, size_t idx)
{
    if (l == NULL || idx >= l->len)
        return NULL;
    return l->elements[idx];
}

We want get to return the element but keep it in the list so remember we CANNOT destroy the element while it’s in the list. That could cause all sorts of problems. Don’t think for a second you can use get and destroy it if you haven’t passed a lfree function. Well, there is one instance where you can and that’s if you iterate, free and call list_destroy. But that’s the only exception and you shouldn’t do this.

If you destroy an object from get the list still thinks it’s valid so if you call index_of, for example, it will try to compare to an already freed memory address and the app will probably crash. Not to mention if you do have an lfree function then calling list_destroy will have the list try to destroy the object and we’ll have a double free followed by a crash.

void *list_take(list_t *l, size_t idx)
{
    void *out;

    if (l == NULL || idx >= l->len)
        return NULL;

    out = list_get(l, idx);
    list_remove_int(l, idx, false);
    return out;
}

If you do want to destroy an object that’s in the list and for some reason you didn’t pass an lfree function you can use take. Like get, it returns the object but unlike get, it removes it from the list. That’s why we made the internal remove_int function.

Bulk Operations

void list_start_bulk_add(list_t *l)
{
    if (l == NULL)
        return;
    l->inbulk = true;
}

void list_end_bulk_add(list_t *l)
{
    if (l == NULL)
        return;
    l->inbulk = false;
    list_sort(l, NULL);
}

These just set the inbulk flag so other functions know if they should skip sorting. When we’ve finished bulk adding we’ll just sort the list.

Manual sort

This isn’t solely manual since end_bulk also uses it but it can be used manually.

bool list_sort(list_t *l, list_eq e)
{
    if (l == NULL)
        return false;

    if (l->flags & LIST_SORT) {
        if (e != NULL) {
            l->leq = e;
        } else {
            e = l->leq;
        }
    }

    if (e == NULL)
        return false;

    mergesort(l->elements, l->len, sizeof(*l->elements), eq);
}

If the list is a sorted list, the comparison function e is optional. When passed, the sorting function will be changed to e. If the list is not sorted, then e is required. Otherwise, we wouldn’t know how to sort the list.

I’m using mergesort because it’s stable. We’re using stable insert and search so we need to use a stable sort. That said, mergesort does use some memory and might not always be the fastest. If you don’t care about stable sorting then qsort is a better option.

Next steps

Now we have a very flexible generic list, but it still uses void pointers which isn’t always ideal. What we can, and will do, is create a set of type safe wrappers. This isn’t hard at all and will give us back compiler provided type safety. We’ll go into this in detail in the future.