In-Place Unstable Sorting: A Fast Sequence Calculation To Improve Shell Sort Performance Without Auxiliary Memory or Hard-Coded Increments

Eightomic developed a bitwise calculation for an optimal sequence of unstable Shell Sort gaps with a library in C99 as a substantial improvement to Ciura, Knuth, Sedgwick and Tokuda sequences.

Library

Source

void eightomic_sort_ascending(unsigned long input_count,
                              unsigned short *input) {
  unsigned short _input;
  unsigned long gap = (input_count >> 5) + (input_count >> 3) + 1;
  unsigned long i;
  unsigned long j;

  while (gap > 0) {
    i = gap;

    while (i < input_count) {
      _input = input[i];
      j = i;

      while (
        j >= gap &&
        input[j - gap] > _input
      ) {
        input[j] = input[j - gap];
        j -= gap;
      }

      input[j] = _input;
      i++;
    }

    if (
      gap > 3 ||
      gap == 1
    ) {
      gap = (gap >> 5) + (gap >> 2);
    } else {
      gap = 1;
    }
  }
}

void eightomic_sort_descending(unsigned long input_count,
                               unsigned short *input) {
  unsigned short _input;
  unsigned long gap = (input_count >> 5) + (input_count >> 3) + 1;
  unsigned long i;
  unsigned long j;

  while (gap > 0) {
    i = gap;

    while (i < input_count) {
      _input = input[i];
      j = i;

      while (
        j >= gap &&
        input[j - gap] < _input
      ) {
        input[j] = input[j - gap];
        j -= gap;
      }

      input[j] = _input;
      i++;
    }

    if (
      gap > 3 ||
      gap == 1
    ) {
      gap = (gap >> 5) + (gap >> 2);
    } else {
      gap = 1;
    }
  }
}

Reference

eightomic_sort_ascending() is the sorting function in ascending order that accepts the following 2 arguments.

input_count is the count of elements in the input array.

input is the unsigned short array to sort. The data type is interchangeable with any integral data type.

The return value data type is void.

eightomic_sort_descending() is the sorting function in descending order that accepts the following 2 arguments.

input_count is the count of elements in the input array.

input is the unsigned short array to sort. The data type is interchangeable with any integral data type.

The return value data type is void.

Requirements

C compiler with C99 (ISO/IEC 9899:1999) standard compatibility.

Explanation

This gap sequence algorithm is designed to both increase the speed and decrease resource usage for all Shell Sort implementations.

It's portable for both 32-bit and 64-bit systems.

It meets compliance, portability and security requirements without extra stack calls required from QuickSort partitions.

It doesn't use modulus, multiplication or division arithmetic operations.

Before sorting, it doesn't pre-calculate the upper limit with a loop starting from 0. Instead, the gap numbers in each sorting instance are dynamically-calculated with inconsistencies among different input_count values.

The sequence calculation formula derived from the following attempt to optimize Cocktail Shaker Sort with unstable increments, which is about 55% as fast as Shell Sort.

#include <stdbool.h>

void optimized_cocktail_shaker_sort_ascending(unsigned long input_count,
                                              unsigned short *input) {
  unsigned short _input;
  unsigned long gap;
  unsigned long i;
  unsigned long j;
  bool is_sorted;

  if (input_count > 1) {
    i = 0;
    j = (input_count >> 12) + (input_count >> 7) + 1;
    is_sorted = false;

    while (is_sorted == false) {
      gap = (input_count >> 15) + (input_count >> 14) + 1;
      is_sorted = true;

      while (i < j) {
        while (j != input_count) {
          if (input[i] > input[j]) {
            _input = input[i];
            input[i] = input[j];
            input[j] = _input;
          }

          i++;
          j++;
        }

        if (((input_count - i) >> 12) != 0) {
          gap++;
        }

        i += gap;

        if (i >= j) {
          i = j - 1;
        }

        while (i != 0) {
          i--;
          j--;

          if (input[i] > input[j]) {
            _input = input[i];
            input[i] = input[j];
            input[j] = _input;
            is_sorted = false;
          }
        }

        j--;
      }

      if (is_sorted == false) {
        i = 0;

        if (input_count > 128) {
          j = (input_count >> 15) + (input_count >> 14) + 8;
        } else {
          j = 1;
        }
      }
    }
  }
}

This sorting function calculates an optimal high and low starting index based on input_count and swaps elements while iterating with both indices.

When a bound is reached, the gap between indices is decreased based on input_count, then gap is incremented and the iteration direction is reversed.

When gap is decreased to 1, it sorts as a stable Cocktail Shaker Sort and checks if the elements are sorted.

If they're not sorted, the gap value is reset.

This process repeats until all elements are sorted.

After gap reaches 1 for the first time, the starting high index is decreased to reduce iterations in subsequent passes.

Examples with calculated starting indices and gap values for different array sizes are demonstrated in the following table.

Input     First Pass   Starting   Subsequent Pass
Count     High Index   Gap        High Index

2         1            1          8
4         1            1          8
8         1            1          8
16        1            1          8
32        1            1          8
64        1            1          8
128       2            1          8
256       3            1          8
512       5            1          8
1024      9            1          8
2048      17           1          8
4096      34           1          8
8192      67           1          8
16384     133          2          9
32768     265          4          11
65536     529          7          14
131072    1057         13         20
262144    2113         25         32
524288    4225         49         56
1048576   8449         97         104

Although this failed to exceed the performance of any Shell Sort variant, the optimal starting gap in Shell Sort was discovered as a derivative of the aforementioned Cocktail Shaker Sort high and low double-shift optimization.

When using the discovered sequence calculation algorithm, elements are always guaranteed to be sorted within bounds using conditional statements that guarantee the final pass is always a regular insertion sort with a gap value of 1.

After each pass, gap > 3 prevents any result of gap calculation from jumping from either 3 or 2 to 0 instead of 1 based on the following calculation output table.

Gap   Calculation
      Result

0     0
1     0
2     0
3     0
4     1
5     1
6     1
7     1
8     2
9     2
10    2

The following speed test results were performed with 1 million randomized elements.

Compared to the following optimized Shell Sort sequence, it's 31% faster.

void shell_sort_ascending(unsigned long input_count, unsigned short *input) {
  unsigned short _input;
  unsigned long gap = input_count >> 1;
  unsigned long i;
  unsigned long j;

  while (gap > 0) {
    i = gap;

    while (i < input_count) {
      _input = input[i];
      j = i;

      while (
        j >= gap &&
        input[j - gap] > _input
      ) {
        input[j] = input[j - gap];
        j -= gap;
      }

      input[j] = _input;
      i++;
    }

    gap >>= 1;
  }
}

Compared to the following optimized Knuth's Sequence, it's 24% faster without requiring either division operations or pre-calculated gap increments.

Furthermore, it's still faster when the gap increments are hard-coded.

void knuth_sort_ascending(unsigned long input_count, unsigned short *input) {
  unsigned short _input;
  unsigned long gap = 1;
  unsigned long i;
  unsigned long j;

  while (gap < input_count) {
    gap += (gap << 1) + 1;
  }

  while (gap > 0) {
    i = gap;

    while (i < input_count) {
      _input = input[i];
      j = i;

      while (
        j >= gap &&
        input[j - gap] > _input
      ) {
        input[j] = input[j - gap];
        j -= gap;
      }

      input[j] = _input;
      i++;
    }

    gap = gap / 3;
  }
}

Compared to the following optimized Ciura's Sequence, Sedgewick Sequence and Tokuda Sequence, it's marginally-faster without a hard-coded upper limit of elements in auxiliary memory.

void ciura_sort_ascending(unsigned long input_count, unsigned short *input) {
  unsigned short _input;
  unsigned long gaps[16] = {
    227011, 100894, 44842, 19930, 8858, 3937, 1750, 701,
    701,    301,    132,   57,    23,   10,   4,    1
  };
  unsigned long gap;
  unsigned char i = 0;
  unsigned long j;
  unsigned long k;

  while (i != 16) {
    gap = gaps[i];
    j = gap;

    while (j < input_count) {
      _input = input[j];
      k = j;

      while (
        k >= gap &&
        input[k - gap] > _input
      ) {
        input[k] = input[k - gap];
        k -= gap;
      }

      input[k] = _input;
      j++;
    }

    i++;
  }
}

void sedgewick_sort_ascending(unsigned long input_count,
                              unsigned short *input) {
  unsigned short _input;
  unsigned long gaps[16] = {
    260609, 146305, 64769, 36289, 16001, 8929, 3905, 2161,
    929,    505,    209,   109,   41,    19,   5,    1
  };
  unsigned long gap;
  unsigned char i = 0;
  unsigned long j;
  unsigned long k;

  while (i != 16) {
    gap = gaps[i];
    j = gap;

    while (j < input_count) {
      _input = input[j];
      k = j;

      while (
        k >= gap &&
        input[k - gap] > _input
      ) {
        input[k] = input[k - gap];
        k -= gap;
      }

      input[k] = _input;
      j++;
    }

    i++;
  }
}

void tokuda_sort_ascending(unsigned long input_count, unsigned short *input) {
  unsigned short _input;
  unsigned long gaps[16] = {
    345152, 153401, 68178, 30301, 13467, 5985, 2660, 1182,
    525,    233,    103,   46,    20,    9,    4,    1
  };
  unsigned long gap;
  unsigned char i = 0;
  unsigned long j;
  unsigned long k;

  while (i != 16) {
    gap = gaps[i];
    j = gap;

    while (j < input_count) {
      _input = input[j];
      k = j;

      while (
        k >= gap &&
        input[k - gap] > _input
      ) {
        input[k] = input[k - gap];
        k -= gap;
      }

      input[k] = _input;
      j++;
    }

    i++;
  }
}

The speed is faster with a variety of input_count values and input data types as shown in the following benchmarking table.

10k 2-Byte Inputs
Ciura                  0.005s   Tied
Eightomic              0.005s   Tied
Knuth                  0.005s   Tied
SedgeWick              0.005s   Tied
Tokuda                 0.005s   Tied
Shell                  0.006s   ----

100k 2-Byte Inputs
Ciura                  0.040s   ---
Eightomic              0.037s   Won
Knuth                  0.040s   ---
SedgeWick              0.039s   ---
Tokuda                 0.040s   ---
Shell                  0.047s   ---

250k 2-Byte Inputs
Ciura                  0.103s   ---
Eightomic              0.091s   Won
Knuth                  0.110s   ---
SedgeWick              0.101s   ---
Tokuda                 0.102s   ---
Shell                  0.126s   ---

500k 2-Byte Inputs
Ciura                  0.211s   ---
Eightomic              0.191s   Won
Knuth                  0.243s   ---
SedgeWick              0.212s   ---
Tokuda                 0.209s   ---
Shell                  0.291s   ---

1m 1-Byte Inputs
Ciura                  0.302s   ---
Eightomic              0.266s   Won
Knuth                  0.349s   ---
SedgeWick              0.291s   ---
Tokuda                 0.304s   ---
Shell                  0.380s   ---

1m 2-Byte Inputs
Ciura                  0.461s   ---
Eightomic              0.423s   Won
Knuth                  0.570s   ---
SedgeWick              0.453s   ---
Tokuda                 0.451s   ---
Shell                  0.634s   ---

It's faster than every other sequence as well, including Hibbard, Papernov & Stasevich and Pratt.

Games

Contrivity

Spawn into the hostile quantum laboratory and destroy wave after wave of oscillations.

v1.0.14