Binary Search and Insert

Introduction

When dealing with arrays it’s often necessary to keep their elements in sorted order. The easiest way is to use an insert algorithm that always puts element into the array in sorted order. That said, you don’t know where elements will end up in the array because if you already knew the order you wouldn’t need to sort in the first place. At some point you will need to pull elements back out and you will want a quick way to find a given element.

Binary search and insert are great all around algorithms that do both of these. It’s a simple divide and conquer approach that is conceptually vey simple. The array is divided into two parts and the search value is compared to to each part. One side is selected based on greater or less than. Then the side is divided and the comparison takes place again. This happens until the value is found or there are two values one above and one below (for insert).

Big-O complexity:

Best Average Worst
Time O(1) O(log n) O(log n)
Space O(n)

By the way, I’m going to use C for all the cod examples.

Search

While this is part of C89 and should be always be available it’s good to know how it works. There is the off chance that you want to use a language that doesn’t have a built in binary search function.

void *binary_search(const void *base, size_t nel, size_t width, const void *key, int (*cmp)(const void *, const void *))
{
	size_t mid = 0;
	size_t left;
	size_t right;
	int    eq  = -1;

	left  = 0;
	right = nel;

	if (base == NULL || nel == 0 || key == NULL || cmp == NULL)
		return NULL;

	while (left < right) {
		mid = (left + right) / 2;
		eq  = cmp(&key, base+(mid*width));
		if (eq < 0) {
			right = mid;
		} else if (eq > 0) {
			left = mid+1;
		} else {
			break;
		}
	}

	if (eq != 0)
		return NULL;

	/* Return the first matching element. */
	while (mid > 0 && mid >= left) {
		eq = cmp(&key, base+((mid-1)*width));
		if (eq != 0) {
			break;
		}
		mid--;
	}

	return base+(mid*width);
}

This variation ensures that if multiple elements match the first is found.

Insert

Insert is very similar to searching and can be integrated into one function with a search or insert flag. I’m keeping them separate to keep the logic simple.

Also, while C89 mandates binary search it does not mandate binary insert. Finding something prebuilt for insert isn’t very common either. Like I said earlier, insert is useful because you can insert into a sorted list and maintain proper sorting. This way you don’t have to insert then run something like qsort on the list each and every time you put something in. Actually, running a sorting algorithm after every insert is horribly inefficient. If you can’t use an insert that algorithm that inserts in sorted order you should do a bulk insert and sort once at the end.

size_t binary_insert(const void *base, size_t nel, size_t width, const void *key, int (*cmp)(const void *, const void *))
{
	size_t mid = 0;
	size_t left;
	size_t right;
	int    eq = 0;

	if (base == NULL || nel == 0 || key == NULL || cmp == NULL)
		return 0;

	left  = 0;
	right = nel;

	while (left < right) {
		mid = (left + right) / 2;
		eq  = cmp(&key, base+(mid*width));
		if (eq < 0) {
			right = mid;
		} else if (eq > 0) {
			left = mid+1;
		} else {
			break;
		}
	}

	/* Insert after the index. */
	if (eq > 0)
		mid++;

	/* Make the insert stable. */
	while (mid < right && eq == 0) {
		mid++;
		eq = cmp(&key, base+(mid*width));
	}

	return mid;
}

This only finds the insertion index and does not actually insert the data. It’s up to the caller to handle shifting the data in the array to make room for the new element.

This implementation has additional logic for stable insertion. When the value is found in the array the next value will be checked for equality. This will keep happening until either there are no more elements to compare or the element is not a match. The index of the non-matching element (or the index length specifying that this needs to be appended) will be returned.

Search and Insert combined

Now that we have a good handle on how both searching and inserting, we can combine them into one function. The searching sequence itself is the same so combining them will reduce code duplication. Also, we can use the combined function to determine the index of a match. Which can be extremely useful.

bool binary_search_insert_int(const void *base, size_t nel, size_t width, const void *key, size_t *idx, bool is_insert, int (*cmp)(const void *, const void *))
{
    size_t mid = 0;
    size_t left;
    size_t right;
    int    eq  = -1;

    if (base == NULL || nel == 0 || key == NULL || idx == NULL || cmp == NULL)
        return false;

    left  = 0;
    right = nel;

    while (left < right) {
        mid = (left + right) / 2;
        eq  = cmp(&key, base+(mid*width));
        if (eq < 0) {
            right = mid;
        } else if (eq > 0) {
            left = mid+1;
        } else {
            break;
        }
    }

    if (is_insert) {
        /* Insert after the index. */
        if (eq > 0) {
            mid++;
        }

        /* Make the insert stable. */
        while (mid < right && eq == 0) {
            mid++;
            eq = cmp(&key, base+(mid*width));
        }

        *idx = mid;
        return true;
    } else {
        if (eq != 0) {
            return false;
        }

        /* Return the first matching element. */
        while (mid > 0 && mid >= left) {
            eq = cmp(&key, base+((mid-1)*width));
            if (eq != 0) {
                break;
            }
            mid--;
        }

        *idx = mid;
        return true;
    }

    return false;
}

Since this is a combined function the return type is now a `bool`. This is mainly because insert can’t return a pointer like search would. The first part of the algorithm which does the actual search is unchanged. The search and insert specific parts are largely unchanged as well.

The big change with this function is search relies on the index being returned. It no longer returns the pointer to the element. Anyone using this function would need to handle that part themselves. Most likely we’ll want to make this an internal function and provide easier to use public wrappers.

void *binary_search(const void *base, size_t nel, size_t width, const void *key, size_t *idx, int (*cmp)(const void *, const void *))
{
    size_t  myidx;

    if (idx == NULL)
        idx = &myidx;

    if (!binary_search_insert_int(base, nel, width, key, idx, false, cmp))
        return NULL;
    return (void *)base+(*idx*width);
}

size_t binary_insert(const void *base, size_t nel, size_t width, const void *key, int (*cmp)(const void *, const void *))
{
    size_t idx = 0;

    if (!binary_search_insert_int(base, nel, width, key, &idx, true, cmp))
        return 0;
    return idx;
}

These functions provides an API that people are more accustomed to using. Unlike the internal search function the public one will return the index, if requested, as well as the value. The insert API is unchanged from what we had before.