The first examples are demonstrated in the simplified programming language *kabas.online* and are runnable programs. There is an **interactive online tutorial**. Alternatively, you can just view them as more formal pseudocode examples.

There is some textual explanation of the different sorting algorithms. For a better understanding it is very helpful to study pictorial representations of the algorithms located at *Wikipedia* or at other sites.

The test code calls the sorting routine with random integer values initialized arrays and measures the needed time. It’s the same code for all following sorting algorithms.

```
subr sort
# sorting code
.
subr init
for i = 0 to size data[] - 1
data[i] = random 1000000
.
.
subr out
for i = 0 to size data[] - 1
pr data[i]
.
pr ""
.
len = 500
while t# < 0.3
size data[] len
call init
pr "Sorting " & len & " numbers ..."
t# = time
call sort
t# = time - t#
pr t# & "s"
pr ""
len = len * 10
.
```

We have - say 10 - random numbers in an array. We want to sort it in ascending order. It works quite simple: go through the array comparing the adjacent items and swap them, if it is necessary. After this the largest number has been swapped to the right side. Repeat this process each time excluding the rightmost element, until there is only one item left.

```
subr sort
for n = size data[] - 1 downto 1
for i = 0 to n - 1
if data[i] > data[i + 1]
swap data[i] data[i + 1]
.
.
.
.
```

```
Sorting 500 numbers ...
0.029s
Sorting 5000 numbers ...
2.455s
```

It works similar to *Bubblesort*, but is faster: search the maximum value and put it on the right side of the array. Repeat this process each time excluding the rightmost element, until there is only one item left.

```
subr sort
for n = size data[] - 1 downto 1
imax = n
max = data[n]
for i = 0 to n - 1
if data[i] > max
max = data[i]
imax = i
.
.
data[imax] = data[n]
data[n] = max
.
.
```

```
Sorting 500 numbers ...
0.015s
Sorting 5000 numbers ...
1.169s
```

It is usually faster than *Selection sort*, but that also depends on the programming language and hardware. It works like this: go through the array and keep the left side sorted. Move the current element to the left until it is in the correct position.

```
subr sort
for i = 1 to size data[] - 1
h = data[i]
j = i - 1
while j >= 0 and h < data[j]
data[j + 1] = data[j]
j -= 1
.
data[j + 1] = h
.
.
```

```
Sorting 500 numbers ...
0.016s
Sorting 5000 numbers ...
1.514s
```

*Heapsort* is much faster than *Insertion sort* for larger arrays. It can be considered a much better variant of *Selection sort*: Each round, the maximum value is moved to the right side and the size of the array is reduced by one. The big improvement is that the maximum value is selected very quickly - this is achieved by keeping the left part in a *heap* structure.

In a *heap*, each node has two children smaller than the node. In an array where the node is at *i*, the left child is at *2i+1* and the right child at *2i+2*. The first sorting step is: convert the array into a *heap* array. The maximum value in a *heap* is always at *0*. If you move the maximum value into the sorted part of the array, the left side of the array must be converted to a *heap* again. This can be done quickly.

```
subr make_heap
for i = 1 to n - 1
if data[i] > data[(i - 1) / 2]
j = i
while data[j] > data[(j - 1) / 2]
swap data[j] data[(j - 1) / 2]
j = (j - 1) / 2
.
.
.
.
subr sort
n = size data[]
call make_heap
for i = n - 1 downto 1
swap data[0] data[i]
j = 0
ind = 1
while ind < i
if ind + 1 < i and data[ind + 1] > data[ind]
ind += 1
.
if data[j] < data[ind]
swap data[j] data[ind]
.
j = ind
ind = 2 * j + 1
.
.
.
```

```
Sorting 500 numbers ...
0.004s
Sorting 5000 numbers ...
0.042s
Sorting 50000 numbers ...
0.491s
```

The speed of *Mergesort* is similar to *Heapsort*, but takes up much more space. This is how it works: imagine you have a stack of numbered cards to sort. The *Mergesort* method is: take two cards each and merge them into stacks of 2 cards. Then take two 2-card stacks and merge them into sorted 4-card stacks, and so on.

The *merge* subroutine merges two sorted subarrays into one sorted array. First the two subarrays are copied to a temporary location, then the temporary subarrays are scanned, with the lower element selected at each step and placed into the original array.

The *sort* subroutine turns many small arrays, starting with size one, into larger arrays by merging two of each into one until a single array is created.

```
subr merge
mid = left + sz
if mid > sz_data
mid = sz_data
.
right = mid + sz
if right > sz_data
right = sz_data
.
for i = left to right - 1
tmp[i] = data[i]
.
l = left
r = mid
for i = left to right - 1
if r = right or l < mid and tmp[l] < tmp[r]
data[i] = tmp[l]
l += 1
else
data[i] = tmp[r]
r += 1
.
.
.
subr sort
sz_data = size data[]
size tmp[] sz_data
sz = 1
while sz < sz_data
left = 0
while left < sz_data
call merge
left += sz + sz
.
sz += sz
.
.
```

```
Sorting 500 numbers ...
0.003s
Sorting 5000 numbers ...
0.032s
Sorting 50000 numbers ...
0.370s
```

*Quicksort* is even quicker than *Mergesort*, and does not need much space. The way it works is not so complicated: imagine you have a stack of 200 numbered cards to sort, the *Quicksort* way is: split the cards by putting the cards from 1 to 100 to one stack and the others to a second stack. Then you have only to sort stacks with 100 cards, which is easier. For this you split each 100 card stack into two stacks with 50 cards and so on.

The *partition* subroutine splits the subarray using an element as a comparison element - this is called the pivot element - into two subarrays. Lower numbers go to the left side and higher numbers to the right side. The pivot element goes between them.

The *sort* subroutine calls the *partition* subroutine for all unsorted subarrays, these are listed in the *stack* to-do list. *partition* creates two subarrays. The larger
subarray is added in the to-do list and sorting is continued with the smaller subarray. If the to-do list is empty, the job is done and the whole array is sorted.

```
subr partition
swap data[(left + right) / 2] data[left]
piv = data[left]
mid = left
for i = left + 1 to right
if data[i] < piv
mid += 1
swap data[i] data[mid]
.
.
swap data[left] data[mid]
.
subr sort
size stack[] 64
# this is sufficient to sort 4 billion numbers
left = 0
right = size data[] - 1
while left <> -1
if left < right
call partition
if mid < (right + left) / 2
stack[sp] = mid + 1
stack[sp + 1] = right
right = mid - 1
else
stack[sp] = left
stack[sp + 1] = mid - 1
left = mid + 1
.
sp += 2
elif sp > 0
sp -= 2
left = stack[sp]
right = stack[sp + 1]
else
# finished
left = -1
.
.
.
```

```
Sorting 500 numbers ...
0.001s
Sorting 5000 numbers ...
0.016s
Sorting 50000 numbers ...
0.199s
Sorting 500000 numbers ...
2.331s
```

*Quicksort* is very fast - but for small arrays *Insertion sort* is faster. So lets combine these two sorting algorithms. Subarrays generated by the *partiton* subroutine which are smaller than 15 are sorted by *Insertion sort*. In addition, improve the selection of the *pivot* element by taking it randomly.

```
subr insert_sort
for i = left + 1 to right
h = data[i]
j = i - 1
while j >= left and h < data[j]
data[j + 1] = data[j]
j -= 1
.
data[j + 1] = h
.
.
subr partition
swap data[left + random (right - left)] data[left]
piv = data[left]
mid = left
for i = left + 1 to right
if data[i] < piv
mid += 1
swap data[i] data[mid]
.
.
swap data[left] data[mid]
.
subr sort
size stack[] 64
left = 0
right = size data[] - 1
while left <> -1
if right - left < 15
call insert_sort
if sp > 0
sp -= 2
left = stack[sp]
right = stack[sp + 1]
else
left = -1
.
else
call partition
if mid < (right + left) / 2
stack[sp] = mid + 1
stack[sp + 1] = right
right = mid - 1
else
stack[sp] = left
stack[sp + 1] = mid - 1
left = mid + 1
.
sp += 2
.
.
.
```

```
Sorting 500 numbers ...
0.001s
Sorting 5000 numbers ...
0.015s
Sorting 50000 numbers ...
0.192s
Sorting 500000 numbers ...
2.076s
```

The following *Quicksort* variant is recursive. The code looks clean. But there is one dangerous thing: the *stack* and its limited size. The stack size is different on different operating systems. With embedded systems, it can be very small.

With the iterative variant and reserved space for the unfinished tasks, we were able to ensure that we could sort an array with a certain size. This is not possible with the recursive variant - with sorted lists and “first element is pivot” there is probably a stack overflow. With the iterative variant we get in this case “only” the worst-case time complexity.

```
func qsort left right . data[] .
#
if left < right
# partition
piv = data[left]
mid = left
for i = left + 1 to right
if data[i] < piv
mid += 1
swap data[i] data[mid]
.
.
swap data[left] data[mid]
#
call qsort left mid - 1 data[]
call qsort mid + 1 right data[]
.
.
subr sort
call qsort 0 size data[] - 1 data[]
.
```

```
Sorting 500 numbers ...
0.001s
Sorting 5000 numbers ...
0.013s
Sorting 50000 numbers ...
0.166s
Sorting 500000 numbers ...
1.982s
```

We see that for *Bubblesort*, *Selection sort* and *Insertion sort* the needed time increases by quadratic order - 10 times more items leads to about 100 times the time. That makes these sorting algorithms unusable for large arrays.

With *Heapsort* and *Mergesort* the time increases only by *n log(n)*. *Heapsort* and *Mergesort* need about the same time. *Heapsort* needs less memory, *Mergesort* can be parallelized better.

*Quicksort* has on average the same behaviour in time and is usually faster. It is even faster in the combination with *Insertion sort*. When selecting the pivot element randomly, the runtime worst-case - quadratic-order time growth - is very unlikely.

The following programs are compiled in Linux with the option `-O3`

and tested on an older laptop with an *Intel i3* CPU.

The C library function *qsort* can sort any data with fixed sized and comparable elements. The caller only has to support a compare function. Although the name *qsort* stands for *Quicksort*, it can also be another sorting algorithm behind.

```
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>
int cmpfunc (const void* a, const void* b) {
return *(int*)a - *(int*)b;
}
void sort(int* data, int len) {
qsort(data, len, sizeof(int), cmpfunc);
}
void init(int* data, int len) {
int i = 0;
while (i < len) {
data[i] = rand();
i += 1;
}
}
double t(void) {
static double t0;
struct timeval tv;
gettimeofday(&tv, NULL);
double h = t0;
t0 = tv.tv_sec + tv.tv_usec / 1000000.0;
return t0 - h;
}
void test(int* data, int len) {
int i = 1;
while (i < len) {
if (data[i] < data[i - 1]) {
printf("*** data not sorted ***\n");
break;
}
i += 1;
}
}
#define SIZE (50 * 1000 * 1000)
int data[SIZE];
int main(void) {
int len = SIZE;
init(data, len);
printf("Sorting %d numbers with libc qsort ...\n", len);
t();
sort(data, len);
printf("%.2f s\n", t());
test(data, len);
return 0;
}
```

```
Sorting 50000000 numbers with libc qsort ...
18.15 s
```

```
.
#include <algorithm>
void sort(int* data, int len) {
std::sort(data, data + len);
}
.
```

```
Sorting 50000000 numbers with C++ std::sort ...
10.12 s
```

That’s much faster. With c++ sort the compiler can inline the comparisons being made.

```
.
void insert_sort(int* left, int* right) {
// put minimum to left position, so we can save
// one inner loop comparsion for insert sort
int min = *left;
int* pmin = left;
int* pi = left + 1;
while (pi <= right) {
if (*pi < min) {
pmin = pi;
min = *pi;
}
pi += 1;
}
*pmin = *left;
*left = min;
pi = left + 2;
while (pi <= right) {
int h = *pi;
int* pj = pi - 1;
while (h < *pj) {
*(pj + 1) = *pj;
pj -= 1;
}
*(pj + 1) = h;
pi += 1;
}
}
#define swap(a, b) { int _h = a; a = b; b = _h; }
#define sort3fast(a, b, c) \
if (b < a) { \
if (c < a) { \
if (c < b) { swap(a, c);} \
else { int h = a; a = b; b = c; c = h;} \
} \
else { swap(a, b); } \
} \
else { \
if (c < b) { \
if (c < a) { int h = c; c = b; b = a; a = h;} \
else { swap(b, c); } \
} \
} \
int* partition(int* left, int* right) {
int* left0 = left;
int pivl = *left;
left += 1;
int* p = left0 + (right - left0) / 2;
int piv = *p;
*p = *left;
*left = piv;
int pivr = *right;
sort3fast(pivl, piv, pivr);
*right = pivr;
*left0 = pivl;
*left = piv;
while (1) {
do left += 1; while(*left < piv);
do right -= 1; while (*right > piv);
if (left >= right) break;
swap(*left, *right);
}
*(left0 + 1) = *right;
*right = piv;
return right;
}
void sort(int* data, int len) {
int* stack[64];
int sp = 0;
int* left = data;
int* right = data + len - 1;
while (1) {
// for arrays smaller than 50 use insert sort
if (right - left < 50) {
insert_sort(left, right);
if (sp == 0) break;
sp -= 2;
left = stack[sp];
right = stack[sp + 1];
}
else {
int* mid = partition(left, right);
if (mid < left + (right - left) / 2) {
stack[sp] = mid + 1;
stack[sp + 1] = right;
right = mid - 1;
}
else {
stack[sp] = left;
stack[sp + 1] = mid - 1;
left = mid + 1;
}
sp += 2;
}
}
}
.
```

```
Sorting 50000000 numbers with Quicksort ...
9.18 s
```

That’s a bit better than the *C++ std::sort*.

We use the median of three fixed elements for the pivot element. This prevents the *O(N²)* worst case from occurring with random or sorted data, but not with specially prepared input data. In this case, a good random number generator should be used for the selection of the pivot element.

An effective but too seldom used way to speed up programs is to use more than one core of the processor. This is not possible for every algorithm. *Quicksort* is a *Divide and conquer algorithm*. This kind of algorithms is usually well fitted for *parallelisation*. After the *partition* function, we have two independent arrays to sort, which could be done in parallel. Starting a *thread* needs time and anyway there are only a few processor cores. So we only start a new *thread* when the array to sort is big enough and when there are not too many threads running. When the last thread has finished its work, all the work is done.

```
.
#include <pthread.h>
// cc -O3 -lpthread
#define MAX_THREADS 8
int n_threads;
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
void* sort_thr(void *arg) {
int** stack = (int**)arg;
int sp = 0;
int* left = stack[sp];
int* right = stack[sp + 1];
while (1) {
if (right - left < 50) {
insert_sort(left, right);
if (sp == 0) break;
sp -= 2;
left = stack[sp];
right = stack[sp + 1];
}
else {
int* l;
int* r;
int* mid = partition(left, right);
if (mid < left + (right - left) / 2) {
l = mid + 1;
r = right;
right = mid - 1;
}
else {
l = left;
r = mid - 1;
left = mid + 1;
}
if (r - l > 1000000 && n_threads < MAX_THREADS) {
// start a new thread - n_threads is a soft limit
pthread_t thread;
int** stack_new = malloc(64 * sizeof(int*));
stack_new[0] = l;
stack_new[1] = r;
pthread_mutex_lock(&mutex);
n_threads += 1;
pthread_mutex_unlock(&mutex);
pthread_create(&thread, NULL, sort_thr, stack_new);
}
else {
stack[sp] = l;
stack[sp + 1] = r;
sp += 2;
}
}
}
free(stack);
pthread_mutex_lock(&mutex);
n_threads -= 1;
if (n_threads == 0) pthread_cond_signal(&cond);
pthread_mutex_unlock(&mutex);
return NULL;
}
void sort(int* data, int len) {
pthread_t thread;
int** stack = malloc(64 * sizeof(int*));
stack[0] = data;
stack[1] = data + len - 1;
n_threads = 1;
pthread_create(&thread, NULL, sort_thr, stack);
pthread_mutex_lock(&mutex);
pthread_cond_wait(&cond, &mutex);
pthread_mutex_unlock(&mutex);
}
// todo: check if 'malloc' and 'pthread_create' are successful
.
```

```
Sorting 50000000 numbers with multithreaded Quicksort ...
3.38 s
```

That’s finally pretty fast - we can be happy with that.

For the sake of completeness: *C++17* defines a parallel running *std::sort*. Since this is currently not available in *g++*, there are no test results for it.

Iterative *Quicksort* combined with *Insertion sort* is fast. Multithreading can dramatically improve application performance.

kabas.online - easy programming in the browser