Introduction

We have this amazing generic hashtable and we can put pretty much anything we want into it, but it has a few flaws. It uses void pointers and has a pretty verbose setup with that callback struct. If you’re using the same types over and over again you’ll have a lot of redundant code.

There is also a much more pressing issue of void pointers. They remove type safely. It would be really bad if you passed the wrong type to a hashtable meant for another. As I’m sure you know, the compiler will happily let you pass anything to a void pointer without a peep. I really don’t want to lose the compiler saying to me, “hey dummy you can’t pass a char * to a my_type_t.” you can’t use void pointers if you want type safety! Or can you?

Design

Let’s do some wrapping! What I mean is, wrapping the base implementation in a type safe wrapper. It’s another layer but it makes things a lot easier on us. The most useful wrapper in my opinion is one that handles string key and string values.

Header

htstrstr.h

#ifndef __HTABLE_STRSTR_H__
#define __HTABLE_STRSTR_H__

#include <stdbool.h>

struct htable_strstr;
typedef struct htable_strstr htable_strstr_t;

struct htable_strstr_enum;
typedef struct htable_strstr_enum htable_strstr_enum_t;

typedef enum {
    HTABLE_STR_NONE = 0,
    HTABLE_STR_CASECMP
} htable_strstr_flags_t;

htable_strstr_t *htable_strstr_create(unsigned int flags);
void htable_strstr_destroy(htable_strstr_t *ht);

void htable_strstr_insert(htable_strstr_t *ht, const char *key, const char *val);
void htable_strstr_remove(htable_strstr_t *ht, const char *key);

bool htable_strstr_get(htable_strstr_t *ht, const char *key, const char **val);
const char *htable_strstr_get_direct(htable_strstr_t *ht, const char *key);

htable_strstr_enum_t *htable_strstr_enum_create(htable_strstr_t *ht);
bool htable_strstr_enum_next(htable_strstr_enum_t *he, const char **key, const char **val);
void htable_strstr_enum_destroy(htable_strstr_enum_t *he);

#endif /* __HTABLE_STRSTR_H__ */

This looks pretty much exactly the same as the generic base implementation… and it should because it’s a wrapper! That said, creation is simplified because the wrapper is for a specific type, so it already knows all the callbacks to use.

Note, the create function does take flags. The base didn’t take flags but it very well could (probably should for future extensibility). By default, we’ll have keys compared case sensitive (since this is the norm) but we also want a flag to have case insensitive comparison.

I like having the option of case insensitive comparison because often I’ll work with XML data from the internet and the tags can be all sorts of cases. I don’t want to change the case when I eventually output the keys so allowing case insensitive comparison keeps the tags the way they are and lets me still use them in the hashtable.

Implementation

htstrstr.c

Includes

#include <ctype.h>
#include <string.h>
#include <stdlib.h>

#include "htable.h"
#include "htstrstr.h"

Object Data

We don’t have any object data because we’re wrapping the generic object. However, we did declare an object in the header for a very good reason.

struct htable_strstr;
typedef struct htable_strstr htable_strstr_t;

We need an object defined but we don’t have, nor do we need, an implementation! We only need a forward declaration of a type we’ll never implement. The compiler will think this is a real object that really exists. We get type safety without having do so something like wrap the base object in a string wrapper object! Just wait a bit and you’ll see how this works.

Hash Function

static unsigned int fnv1a_hash_str_int(const void *in, size_t len, unsigned int seed,
        bool ignore_case)
{
    unsigned int  h = seed; /* default offset_basis: 2166136261 */
    unsigned int  c;
    size_t        i;

    for (i=0; i<len; i++) {
        c = ((const unsigned char *)in)[i];
        if (ignore_case)
            c = tolower(c);
        h ^= c;
        h *= 16777619; /* FNV prime */
    }

    return h;
}

static unsigned int fnv1a_hash_str(const void *in, unsigned int seed) 
{
    return fnv1a_hash_str_int(in, strlen((const char *)in), seed, false);
}

static unsigned int fnv1a_hash_str_casecmp(const void *in, unsigned int seed) 
{
    return fnv1a_hash_str_int(in, strlen((const char *)in), seed, true);
}

Since this is a hashtable we need a hash function. I’m using FNV1a here because it’s easy, has good distribution, and is quick. We could have a long drawn out argument about the best hash function but you can replace this with something else if you prefer.

That said, in my testing the distribution of FNV1a being worse than other hash functions was offset by its speed. Basically, the time it took for calculating the hash and following a few chains (rehashing means we never get very many) was faster with FNV1a than just calculating the hash with other functions. FNV1a is a good general hash function but if you need to tune for your data set, it’s easy enough to swap in something else.

Since we want a case sensitive and insensitive comparison we also need the equivalent hashing. Since these are similar we can have an internal hash function that can handle both and a public wrapper for each type of comparison.

Comparison

static bool htable_str_eq(const void *a, const void *b)
{
    return (strcmp(a, b)==0)?true:false;
}

static bool htable_str_caseeq(const void *a, const void *b)
{
    return (strcasecmp(a, b)==0)?true:false;
}

Just like how we need two hash functions to accommodate our sensitivity we also need two comparison functions.

Create

htable_strstr_t *htable_strstr_create(unsigned int flags)
{
    htable_hash hash = fnv1a_hash_str;
    htable_keq  keq  = htable_str_eq;
    htable_cbs  cbs = {
        (void *(*)(void *))strdup,
        free,
        (void *(*)(void *))strdup,
        free 
    };

    if (flags & HTABLE_STR_CASECMP) {
        hash = fnv1a_hash_str_casecmp;
        keq  = htable_str_caseeq;
    }

    return (htable_strstr_t *)htable_create(hash, keq, &cbs);
}

Here we’re going to setup the base hashtable with all of our string specific stuff. Note, how we flip out the appropriate function pointers based on our flags.

Bringing It All Together (Everything Else)

void htable_strstr_destroy(htable_strstr_t *ht)
{
    htable_destroy((htable_t *)ht);
}

void htable_strstr_insert(htable_strstr_t *ht, const char *key, const char *val)
{
    htable_insert((htable_t *)ht, (void *)key, (void *)val);
}

void htable_strstr_remove(htable_strstr_t *ht, const char *key)
{
    htable_remove((htable_t *)ht, (void *)key);
}

bool htable_strstr_get(htable_strstr_t *ht, const char *key, const char **val)
{
    return htable_get((htable_t *)ht, (void *)key, (void **)val);
}

const char *htable_strstr_get_direct(htable_strstr_t *ht, const char *key)
{
    return htable_get_direct((htable_t *)ht, (void *)key);
}

htable_strstr_enum_t *htable_strstr_enum_create(htable_strstr_t *ht)
{
    return (htable_strstr_enum_t *)htable_enum_create((htable_t *)ht);
}

bool htable_strstr_enum_next(htable_strstr_enum_t *he, const char **key, const char **val)
{
    return htable_enum_next((htable_enum_t *)he, (void **)key, (void **)val);
}

void htable_strstr_enum_destroy(htable_strstr_enum_t *he)
{
    htable_enum_destroy((htable_enum_t *)he);
}

You might have noticed in create we’re casting the return value to the wrapper type. This is the magic that gives us the type safety we crave. Everything from this point forward is just, call the base function and cast the heck out of it.

The compiler happily checks the input parameter type and while passing in what is actually an htable_t. All we’re doing is hiding the actual object type but in a good way.

Testing

We didn’t make a test app for the base hashtable because it’s much easier to test it though a wrapper. We have to test the wrapper anyway because it’s hash, compare, functions and what not need to be tested. So why not kill two birds with one stone as the saying goes.

#include <stdio.h>

#include "htable_strstr.h"

int main(int argc, char **argv)
{
    htable_strstr_t      *h;
    htable_strstr_enum_t *he;
    const char           *a   = "xyz";
    const char           *b;
    size_t                num = 20;
    size_t                i;
    char                  t1[64];
    char                  t2[64];

    h = htable_strstr_create(HTABLE_STR_CASECMP);

    htable_strstr_insert(h, "Abc", a);
    b = htable_strstr_get_direct(h, "Abc");
    printf("%s: %p, %p\n", b, a, b);

    htable_strstr_insert(h, "aBc", "orange");
    b = htable_strstr_get_direct(h, "Abc");
    printf("%s: %p, %p\n", b, a, b);

    for (i=0; i<num; i++) {
        snprintf(t1, sizeof(t1), "a%zu", i);
        snprintf(t2, sizeof(t2), "%zu", (i*100)+i+(i/2));
        htable_strstr_insert(h, t1, t2);
    }

    he = htable_strstr_enum_create(h);
    while (htable_strstr_enum_next(he, &a, &b)) {
        printf("key=%s, val=%s\n", a, b);
    }
    htable_strstr_enum_destroy(he);

    htable_strstr_destroy(h);
    return 0;
}

The first thing we need to test is overwriting values. I like oranges so we’ll replace our letters with one of those. I also think we should test enumeration so we’ll shove some data in and see what comes back out.

The really cool thing about this test app is it lets us easily verify the seed is working. Each time we run the test we should see the enumerated keys output in a different order. This shows us that the keys are hashing to different values because they’re going into different buckets each time we run the test app.