Searching in a Sorted List: A Fast Search Algorithm Derived from Optimized Binary Search With Fewer Comparisons To Condense Bounds
Eightomic developed an optimal search algorithm as a substantial improvement to Binary Search, Exponential Search, Fibonnaci Search, Interpolation Search and other sorted-list search algorithms.
Library
Source
#include <stdbool.h>
bool eightomic_search_ascending(unsigned long low, unsigned long high,
unsigned short *haystack,
unsigned short needle,
unsigned long *position) {
unsigned long _gap;
unsigned long gap;
if (haystack[high] == needle) {
*position = high;
return true;
}
if (
haystack[low] < needle &&
haystack[high] > needle
) {
high--;
gap = high - low;
while (
haystack[high] != needle &&
gap > 1
) {
_gap = high >> 3;
if (
_gap > low &&
haystack[_gap] >= needle
) {
high = _gap;
gap = high - low;
}
while (
haystack[high] > needle &&
gap > 1
) {
high -= gap >> 1;
gap = (gap + 1) >> 1;
}
while (
haystack[high] < needle &&
gap > 1
) {
high += gap >> 1;
gap = (gap + 1) >> 1;
}
}
low = high;
}
if (haystack[low] == needle) {
*position = low;
return true;
}
return false;
}
bool eightomic_search_descending(unsigned long low, unsigned long high,
unsigned short *haystack,
unsigned short needle,
unsigned long *position) {
unsigned long _gap;
unsigned long gap;
if (haystack[high] == needle) {
*position = high;
return true;
}
if (
haystack[low] > needle &&
haystack[high] < needle
) {
high--;
gap = high - low;
while (
haystack[high] != needle &&
gap > 1
) {
_gap = high >> 3;
if (
_gap > low &&
haystack[_gap] <= needle
) {
high = _gap;
gap = high - low;
}
while (
haystack[high] < needle &&
gap > 1
) {
high -= gap >> 1;
gap = (gap + 1) >> 1;
}
while (
haystack[high] > needle &&
gap > 1
) {
high += gap >> 1;
gap = (gap + 1) >> 1;
}
}
low = high;
}
if (haystack[low] == needle) {
*position = low;
return true;
}
return false;
}
Reference
eightomic_search_ascending() is the searching function for a list of elements sorted in ascending order that accepts the following 5 arguments.
low is the lowest index bound to search in haystack.
high is the highest index bound to search in haystack.
The low value must be less than or equal to the high value and both must be valid indices in haystack.
haystack is the unsigned short array of elements to search. The data type is interchangeable with any integral data type.
needle is the unsigned short element to search for in haystack. The data type should match the haystack data type.
position is the unsigned long pointer containing the index of the searched element.
The return value data type is bool.
When the return value is true, position contains the index of the found needle.
When the return value is false, a new index value isn't assigned to position.
eightomic_search_descending() is the searching function for a list of elements sorted in descending order that accepts the following 5 arguments.
low is the lowest index bound to search in haystack.
high is the highest index bound to search in haystack.
The low value must be less than or equal to the high value and both must be valid indices in haystack.
haystack is the unsigned short array of elements to search. The data type is interchangeable with any integral data type.
needle is the unsigned short element to search for in haystack. The data type should match the haystack data type.
position is the unsigned long pointer containing the index of the searched element.
The return value data type is bool.
When the return value is true, position contains the index of the found needle.
When the return value is false, a new index value isn't assigned to position.
Requirements
C compiler with C99 (ISO/IEC 9899:1999) standard compatibility.
Explanation
This search algorithm is designed to speed up Binary Search by decreasing the average number of comparisons with a heuristic based on both high and low bound values instead of haystack element values.
It's portable for both 32-bit and 64-bit systems.
It meets compliance, portability and security requirements on all devices.
It doesn't use modulus, multiplication or division arithmetic operations.
In the rare instance where the sort order is unknown, the following eightomic_search() function searches in either ascending or descending order as a convenience.
#include <stdbool.h>
bool eightomic_search(unsigned long low, unsigned long high,
unsigned short *haystack, unsigned short needle,
unsigned long *position) {
unsigned long _gap;
unsigned long gap;
if (haystack[high] == needle) {
*position = high;
return true;
}
if (
(
haystack[low] < needle &&
haystack[high] > needle
) ||
(
haystack[low] > needle &&
haystack[high] < needle
)
) {
high--;
gap = high - low;
if (haystack[low] < needle) {
while (
haystack[high] != needle &&
gap > 1
) {
_gap = high >> 3;
if (
_gap > low &&
haystack[_gap] >= needle
) {
high = _gap;
gap = high - low;
}
while (
haystack[high] > needle &&
gap > 1
) {
high -= gap >> 1;
gap = (gap + 1) >> 1;
}
while (
haystack[high] < needle &&
gap > 1
) {
high += gap >> 1;
gap = (gap + 1) >> 1;
}
}
} else {
while (
haystack[high] != needle &&
gap > 1
) {
_gap = high >> 3;
if (
_gap > low &&
haystack[_gap] <= needle
) {
high = _gap;
gap = high - low;
}
while (
haystack[high] < needle &&
gap > 1
) {
high -= gap >> 1;
gap = (gap + 1) >> 1;
}
while (
haystack[high] > needle &&
gap > 1
) {
high += gap >> 1;
gap = (gap + 1) >> 1;
}
}
}
low = high;
}
if (haystack[low] == needle) {
*position = low;
return true;
}
return false;
}
In each iteration, the high bound jumps ahead with high >> 3 whenever it doesn't exceed the position of the needle element.
This jump decreases the number of gap calculations and comparisons in the average instance without slowing down other instances.
Instead of a conditional statement that determines which direction to step, segmented while loops traverse in opposite directions until either the position of the needle element is reached or the gap is decreased to 1.
This increases speed in the average case by recalculating a new high bound jump whenever the direction switches twice.
Compared to other sorted-list search algorithms, it's 1% to 20% faster on average across a range of data types and data, both randomized and uniform.
Interpolation Search isn't recommended as some of the worst test instances were found trivially with speeds that were 10% as fast on average. This tradeoff isn't worth the marginal speed gains in specific instances.