Introduction
Lists (dynamic arrays) are yet another super useful data structure that C just doesn’t have. C arrays are great for when you can get away with a fixed size but not so fun if you need to dynamically expand because you don’t know quite how many elements you’ll need.
You could use a series of realloc
s and memmove
s but that’s going to
get old really fast. It’s also error prone and not obvious when growth
is needed. Not to mention you’ll end up with them all over the place.
The biggest issue I have with trying to manage a dynamic list manually is, it’s hard to track meta-data about the list. For example, how many elements are in the list. That’s pretty important if you ask me.
One alternative to C arrays is a linked list. Not what I want because I want continues memory access and I don’t want the overhead of having a node per element.
Just like we created a generic hashtable, we can create a generic list using void pointers. This generic container will leverages void pointers to allow holding any type of object. We’ll make type safe wrappers around this generic container in order to get back the type safety we lost due to using void pointers.
Design
We can’t get around the fact that C arrays are how we need to create
a dynamic list but we can abstract them so we don’t get into the
mess of realloc
and memmove
. Plus having one generic data type
means we will have less error prone and easier to understand code.
Not to mention just like the generic hashtable, we can wrap it for type safety.
Header
list.h
#ifndef __LIST_H__
#define __LIST_H__
#include <stdbool.h>
#include <stddef.h>
struct list;
typedef struct list list_t;
typedef int (*list_eq)(const void *a, const void *b);
typedef void *(*list_copy_cb)(void *);
typedef void (*list_free_cb)(void *);
typedef struct {
list_eq leq;
list_copy_cb lcopy;
list_free_cb lfree;
} list_cbs_t;
typedef enum {
LIST_NONE = 0,
LIST_SORT = 1 << 0
} list_flags_t;
list_t *list_create(const list_cbs_t *cbs, list_flags_t flags);
void list_destroy(list_t *l);
size_t list_len(list_t *l);
bool list_append(list_t *l, void *v);
bool list_insert(list_t *l, void *v, size_t idx);
bool list_remove(list_t *l, size_t idx);
bool list_index_of(list_t *l, void *v, size_t *idx);
void *list_get(list_t *l, size_t idx);
void *list_take(list_t *l, size_t idx);
void list_start_bulk_add(list_t *l);
void list_end_bulk_add(list_t *l);
void list_sort(list_t *l, list_eq e);
#endif /* __LIST_H__ */
The header consists of callbacks, flags and functions. Lets look at the callbacks first.
Callbacks
typedef int (*list_eq)(const void *a, const void *b);
typedef void *(*list_copy_cb)(void *);
typedef void (*list_free_cb)(void *);
typedef struct {
list_eq leq;
list_copy_cb lcopy;
list_free_cb lfree;
} list_cbs_t;
The list can take ownership of anything put into it if the copy and free functions are set. Anything in the list will automatically be destroyed when it’s either removed or the list itself is destroyed. This is great because it means we don’t have to worry about managing the life of objects we’ve put into the array.
We also have a qsort
style equality function which seems odd. Yes, we need it.
It allows us to do two very important things. We can check if a value already exists
in the list. It also allows us to have a sortable list.
Flags
typedef enum {
LIST_NONE = 0,
LIST_SORT = 1 << 0
} list_flags_t;
Next up we have the flags. Right now there is only one flag that does something. That’s the sort flag. In the future we can easily add more flags and extend the list’s functionality.
Functions
Most of the functions are self explanatory but we should talk about a few of them.
bool list_index_of(list_t *l, void *v, size_t *idx);
list_index_of
is a special function because it servers two purposes. First, it
will tell us if that object already exists in the list. Second, if it does exist
it will give us the index as an out parameter. We’ll make idx
optional so we
can use this just to check for existence.
void *list_get(list_t *l, size_t idx);
void *list_take(list_t *l, size_t idx);
We have list_get
and list_take
which seem to be the same thing as decried
by the prototype. We’re going to make a distinction between these two where
get
returns the object but leaves it in the list and take
does not.
This means when using get
, the list still owns the object and is responsible
for it (if a free callback was set it will destroy it). We need this generic implementation
to return the object non-const
but most wrappers will have it const
to indicate
ownership.
Now for take
. Basically, we’re providing a way to take
things out of the
list. In this case we’re giving ownership to the caller. They’re going to be
responsible for any destruction (free
) and what not.
Since both of these are not const
and they deal with ownership we need to be very
careful because we could end up in a double free situation with get
and a memory
leak with take
. This is another reason why type safe wrappers are very important
void list_start_bulk_add(list_t *l);
void list_end_bulk_add(list_t *l);
When the list was built as a sorted list it will automatically sort new entires when they’re added. If there is a large amount of data that’s going to be added we can optimize adding by delaying the sort until everything has been added. Bulk insert start suspends the auto sorting and end will sort and reenable sorting on add. note, The list should not be used in any way other than add while bulk insert is enabled.
void list_sort(list_t *l, list_eq e);
Finally, we have a manual sort function. This allows us to sort the items in an unsorted list. Assuming an equality function was registered.
Implementation
list.c
Includes
#include <stdlib.h>
#include <string.h>
#include "list.h"
Object Data
static const size_t list_block_size = 32;
struct list {
void **elements;
size_t alloced;
size_t len;
list_cbs_t cbs;
list_flags_t flags;
bool inbulk;
};
The best way for us to handle the list internals is to just have it
be an array. I mean that’s technically what a list is, right? All we’re
doing is making an easy to use object which takes care of all the realloc
s
and what not for us in a safe and transparent way.
We’ll use an over allocation strategy but we aren’t going to use growth by
a power of two. Instead we’re going to grow by blocks of list_block_size
.
Why not use growth by a power of two you ask. We’ll we could and there isn’t
anything wrong with it but we’re not. Go ahead and tweak list_insert
if you don’t like my block based growth.
Finally, we have a flag so we know if we’re in a bulk insert. This way we can disable insert sorting temporarily.
Default Callbacks
static int list_passthrough_eq(const void *a, const void *b)
{
void *sa = *(void **)a;
void *sb = *(void **)b;
return sa - sb;
}
All callbacks are optional but we still need to have an equality function. We can’t just compare the value of the pointer because this is a generic container which could hold any kind of object. Normally the comparison function parameters are the address of two elements in the array. Meaning, the parameters will need to be dereferenced before they can be compared. However, we can’t do that because we don’t know the actual object type of the parameters. Instead we’re going to compare the pointer address of each object in the array.
Create and Destroy
list_t *list_create(const list_cbs_t *cbs, list_flags_t flags)
{
list_t *l;
l = malloc(sizeof(*l));
l->elements = malloc(sizeof(*l->elements)*list_block_size);
l->alloced = list_block_size;
l->len = 0;
memset(&(l->cbs), 0, sizeof(l->cbs));
if (cbs != NULL) {
l->cbs.leq = cbs->leq;
l->cbs.lcopy = cbs->lcopy;
l->cbs.lfree = cbs->lfree;
}
if (l->cbs.leq == NULL)
l->cbs.leq = list_passthrough_eq;
l->flags = flags;
return l;
}
This is just some basic initialization we need to happen. There isn’t really anything interesting going on. We just allocate the object data and set the callbacks.
void list_destroy(list_t *l)
{
size_t i;
if (l == NULL)
return;
if (l->cbs.lfree != NULL) {
for (i=0; i<l->len; i++) {
l->cbs.lfree(l->elements[i]);
}
}
free(l->elements);
free(l);
}
Now destroy on the other hand, is slightly more interesting. Since we can have
the list take ownership of everything and we can associate a free
function,
we’ll need to use it here. This is pretty simple, we just iterate the list of
elements and destroy each one.
Length
size_t list_len(list_t *l)
{
if (l == NULL)
return 0;
return l->len;
}
The object has a variable to track the number of items it’s holding. This makes it easy for a caller to know how many items are in the list.
Insertion
There are two parts to insertion. With sorting enabled and with sorting disabled. Where items are inserted differs between these two modes. When sorted, the insertion index will be found using the quality function against items currently in the list. Whereas, unsorted the insert will happen at the given index.
Insert sorting
The absolute worst way to handle sorting on insert is insertion sort and the best
way is using a variation of binary search. While C has a standard bsearch
function
which does binary search it only finds a matching element. This is fine if we’re looking
for something but we need to find where something should go in the array. Yet we still
need a binary search to find elements for the index_of
function.
So we end up with a binary search function that works with indexes. It will let us find the index of something within the array as well as the index where something should be inserted. Thankfully, we already have binary search and insert functions we can use.
We need it to return a bool
because when searching we need to know if the value
is found. Otherwise, with insert it shouldn’t fail (except for misuse). Just for fun
(Okay, it’s a helpful addition) we’ll make this stable. If there are matching values
insert will always return the index after the last one. Matching will always
return the first match. The only downside is a little bit extra processing.
Append
bool list_append(list_t *l, void *v)
{
if (l == NULL || v == NULL)
return false;
return list_insert(l, v, l->len);
}
We have insert
and append
function and it makes sense that append
calls insert
and puts the object at the end when unsorted. For a
sorted list, append
will function exactly the same as insert
.
Insert
bool list_insert(list_t *l, void *v, size_t idx)
{
if (l == NULL || v == NULL)
return false;
if (l->alloced == l->len) {
l->alloced += list_block_size;
l->elements = realloc(l->elements, sizeof(*l->elements)*l->alloced);
}
if (l->cbs.lcopy != NULL)
v = l->cbs.lcopy(v);
if (l->flags & LIST_SORT && !l->inbulk)
binary_searchidx(l->elements, l->len, sizeof(*l->elements), v, &idx, true, l->leq);
if (idx > l->len) {
idx = l->len;
} else if (idx < l->len) {
memmove(l->elements+(idx+1), l->elements+idx, (l->len-idx)*sizeof(*l->elements));
}
l->elements[idx] = v;
l->len++;
return true;
}
Before we delve into the inner workings of this function lets look at the
return type. Basically, this is only ever going to return false if either the
list or value are NULL. So it’s safe to ignore the return. We could have made
these void
like the hashtable but… this is more correct. Personally I could
go either way on this.
Let’s go into how this function works in a bit more depth.
if (l->alloced == l->len) {
l->alloced += list_block_size;
l->elements = realloc(l->elements, sizeof(*l->elements)*l->alloced);
}
If we’ve run out of space we, obviously, need to grow so we can put more stuff
into the list. Pay special attention to the realloc
. We must multiply the
number of allocated elements by the size of a void pointer! I’m talking about this part:
sizeof(*l->elements)*l->alloced
.
Now why do we need to do this? We have an array of void pointers and a void pointer
isn’t 1 byte. It can be 4 on 32 bit systems and 8 on 64 bit systems. If we had an
array of char
s we would have 1 byte for each element but we’re not so we need
each element to be size of void *
. We’ll use this logic whenever we’re manipulating
array members.
if (l->cbs.lcopy != NULL)
v = l->cbs.lcopy(v);
If necessary we copy the value being put into the list. We could have made a passthrough
function like list_passthrough_eq
and always call copy but why bother. It’s less work
to just check for NULL. Especially since this is the only function we’re going to ever
copy the value. If this were to change, we might want to think about adding a copy passthrough
function.
if (l->flags & LIST_SORT && !l->inbulk)
idx = binary_insert(l->elements, l->len, sizeof(*l->elements), v, l->leq);
if (idx > l->len) {
idx = l->len;
} else if (idx < l->len) {
memmove(l->elements+(idx+1), l->elements+idx, (l->len-idx)*sizeof(*l->elements));
}
l->elements[idx] = v;
If we’re have a sorted list we first need to determine the index we need to insert at. The index passed into this function doesn’t matter and is ignored when sorted because, well, the list is in sorted order!
Once we know the index, we need to verify it’s valid. I want to make this an easy to use object so we’re going to make any index above the last append. This means someone using this could be sloppy but I’d rather have something easy to use than something as ridged as an ironing board.
The memmove
is precisely why we need an object like this! Look at that thing! Do you really
want to have this all over the place in your code? All that just to make space and insert
something into the list.
Remove
Think about the header a bit. We have a take
function which will remove from
the list without calling lfree
. remove
will remove from the and call
lfree
. These are 99% the same so we’ll have a common function which will
optionally lfree
.
static bool list_remove_int(list_t *l, size_t idx, bool do_free)
{
if (l == NULL || idx >= l->len)
return false;
if (do_free && l->cbs.lfree != NULL)
l->cbs.lfree(l->elements[idx]);
l->len--;
if (idx == l->len)
return true;
memmove(l->elements+idx, l->elements+(idx+1), (l->len-idx)*sizeof(*l->elements));
return true;
}
All we need to do is call lfree
if requited and shift the elements to fill the gap after
removing something causes. Once again we have a very nasty memmove
which is nicely abstracted.
bool list_remove(list_t *l, size_t idx)
{
return list_remove_int(l, idx, true);
}
Finally, the actual remove
is just a wrapper around remove_int
.
Index of
bool list_index_of(list_t *l, void *v, size_t *idx)
{
size_t i;
bool found = false;
if (l == NULL || v == NULL)
return 0;
if (l->flags & LIST_SORT && !l->inbulk) {
if (binary_search(l->elements, l->len, sizeof(*l->elements), v, idx, l->leq) != NULL) {
return true;
}
return false;
}
for (i=0; i<l->len; i++) {
if (l->cbs.leq(&v, &l->elements[i]) == 0) {
found = true;
break;
}
}
if (found && idx != NULL)
*idx = i;
return found;
}
This is there where the binary search comes in. If the list is sorted and we’re not in bulk insert situation, we can use the binary search we implemented previously. Hooray for O(log n) performance! Finally something reasonable. Sadly binary search only works with sorted data so we can’t always use it.
Unfortunately, when not sorted, we have to use an O(n) linear search (it’s a list after all). I really hate this but there isn’t a way around it for an unsorted list. So we just need to go though each element and check it.
Get and take
void *list_get(list_t *l, size_t idx)
{
if (l == NULL || idx >= l->len)
return NULL;
return l->elements[idx];
}
We want get
to return the element but keep it in the list so remember we
CANNOT destroy the element while it’s in the list. That could cause all sorts
of problems. Don’t think for a second you can use get
and destroy it if you
haven’t passed a lfree
function. Well, there is one instance where you can
and that’s if you iterate, free
and call list_destroy
. But that’s the only
exception and you shouldn’t do this.
If you destroy an object from get
the list still thinks it’s valid so if you call
index_of
, for example, it will try to compare to an already free
d memory address
and the app will probably crash. Not to mention if you do have an lfree
function then
calling list_destroy
will have the list try to destroy the object and we’ll have a double free
followed by a crash.
void *list_take(list_t *l, size_t idx)
{
void *out;
if (l == NULL || idx >= l->len)
return NULL;
out = list_get(l, idx);
list_remove_int(l, idx, false);
return out;
}
If you do want to destroy an object that’s in the list and for some reason you didn’t
pass an lfree
function you can use take. Like get
, it returns the object but unlike
get
, it removes it from the list. That’s why we made the internal remove_int
function.
Bulk Operations
void list_start_bulk_add(list_t *l)
{
if (l == NULL)
return;
l->inbulk = true;
}
void list_end_bulk_add(list_t *l)
{
if (l == NULL)
return;
l->inbulk = false;
list_sort(l, NULL);
}
These just set the inbulk
flag so other functions know if they should
skip sorting. When we’ve finished bulk adding we’ll just sort the list.
Manual sort
This isn’t solely manual since end_bulk
also uses it but it can be used manually.
bool list_sort(list_t *l, list_eq e)
{
if (l == NULL)
return false;
if (l->flags & LIST_SORT) {
if (e != NULL) {
l->leq = e;
} else {
e = l->leq;
}
}
if (e == NULL)
return false;
mergesort(l->elements, l->len, sizeof(*l->elements), eq);
}
If the list is a sorted list, the comparison function e
is optional. When
passed, the sorting function will be changed to e
. If the list is not sorted,
then e
is required. Otherwise, we wouldn’t know how to sort the list.
I’m using mergesort
because it’s stable. We’re
using stable insert and search so we need to use a stable sort. That said,
mergesort
does use some memory and might not always be the fastest. If you
don’t care about stable sorting then qsort
is a better option.
Next steps
Now we have a very flexible generic list, but it still uses void pointers which isn’t always ideal. What we can, and will do, is create a set of type safe wrappers. This isn’t hard at all and will give us back compiler provided type safety. We’ll go into this in detail in the future.