An Overview of Sorting Algorithms

The first examples are demonstrated in the simplified programming language kabas.online and are runnable programs. There is also an interactive online tutorial. Alternatively, you can just view them as more formal pseudocode examples.

There is some textual explanation of the different sorting algorithms. For a better understanding it is very helpful to study pictorial representations of the algorithms located at Wikipedia or at other sites.

Test and initialization code

The test code calls the sorting routine with random integer values initialized arrays and measures the needed time. It’s the same code for all following sorting algorithms.

subr sort
  # sorting code
.
#
length = 500
while time - t# < 0.3
  len data[] length
  for i = 0 to length - 1
    data[i] = random 1000000
  .
  pr "Sorting " & length & " numbers ..."
  t# = time
  call sort
  pr time - t# & "s"
  pr ""
  length = length * 10
.

Bubblesort - an easy entry to sorting

We have - say 10 - random numbers in an array. We want to sort it in ascending order. It works quite simple: go through the array comparing the adjacent items and swap them, if it is necessary. After this the largest number has been swapped to the right side. Repeat this process each time excluding the rightmost element, until there is only one item left.

subr sort
  for n = len data[] - 1 downto 1
    for i = 0 to n - 1
      if data[i] > data[i + 1]
        swap data[i] data[i + 1]
      .
    .
  .
.
Sorting 500 numbers ...
0.029s

Sorting 5000 numbers ...
2.455s

Selection sort - easy to understand

It works similar to Bubblesort, but is faster: Find the minimum value and place it to the left side of the array. Repeat this process each time, except for the leftmost element, until there is only one item left.

subr sort
  for i = 0 to len data[] - 2
    min_pos = i
    for j = i + 1 to len data[] - 1
      if data[j] < data[min_pos]
        min_pos = j
      .
    .
    swap data[i] data[min_pos]
  .
.
Sorting 500 numbers ...
0.016s

Sorting 5000 numbers ...
1.441s

Insertion sort - fast for small arrays

It is usually faster than Selection sort, depending on the hardware and software. It works like this: go through the array and keep the left side sorted. Move the current element to the left until it is in the correct position.

subr sort
  for i = 1 to len data[] - 1
    h = data[i]
    j = i - 1
    while j >= 0 and h < data[j]
      data[j + 1] = data[j]
      j -= 1
    .
    data[j + 1] = h
  .
.
Sorting 500 numbers ...
0.016s

Sorting 5000 numbers ...
1.514s

Heapsort - fast for larger arrays

Heapsort is much faster than Insertion sort for larger arrays. It can be considered a much better variant of Selection sort: Each round, the maximum value is moved to the right side and the length of the array is reduced by one. The big improvement is that the maximum value is selected very quickly - this is achieved by keeping the left part in a heap structure.

In a heap, each node has two children smaller than the node. In an array where the node is at i, the left child is at 2i+1 and the right child at 2i+2. The first sorting step is: convert the array into a heap array. The maximum value in a heap is always at 0. If you move the maximum value into the sorted part of the array, the left side of the array must be converted to a heap again. This can be done quickly.

subr make_heap
  for i = 1 to n - 1
    if data[i] > data[(i - 1) / 2]
      j = i
      while data[j] > data[(j - 1) / 2]
        swap data[j] data[(j - 1) / 2]
        j = (j - 1) / 2
      .
    .
  .
.
subr sort
  n = len data[]
  call make_heap
  for i = n - 1 downto 1
    swap data[0] data[i]
    j = 0
    ind = 1
    while ind < i
      if ind + 1 < i and data[ind + 1] > data[ind]
        ind += 1
      .
      if data[j] < data[ind]
        swap data[j] data[ind]
      .
      j = ind
      ind = 2 * j + 1
    .
  .
.
Sorting 500 numbers ...
0.004s

Sorting 5000 numbers ...
0.042s

Sorting 50000 numbers ...
0.491s

Mergesort - fast but needs space

The speed of Mergesort is similar to Heapsort, but takes up much more space. This is how it works: imagine you have a stack of numbered cards to sort. The Mergesort method is: take two cards each and merge them into stacks of 2 cards. Then take two 2-card stacks and merge them into sorted 4-card stacks, and so on.

The merge subroutine merges two sorted subarrays into one sorted array. First the two subarrays are copied to a temporary location, then the temporary subarrays are scanned, with the lower element selected at each step and placed into the original array.

The sort subroutine turns many small arrays, starting with size one, into larger arrays by merging two of each into one until a single array is created.

subr merge
  mid = left + sz
  if mid > sz_data
    mid = sz_data
  .
  right = mid + sz
  if right > sz_data
    right = sz_data
  .
  for i = left to right - 1
    tmp[i] = data[i]
  .
  l = left
  r = mid
  for i = left to right - 1
    if r = right or l < mid and tmp[l] < tmp[r]
      data[i] = tmp[l]
      l += 1
    else
      data[i] = tmp[r]
      r += 1
    .
  .
.
subr sort
  sz_data = len data[]
  len tmp[] sz_data
  sz = 1
  while sz < sz_data
    left = 0
    while left < sz_data
      call merge
      left += sz + sz
    .
    sz += sz
  .
.
Sorting 500 numbers ...
0.003s

Sorting 5000 numbers ...
0.032s

Sorting 50000 numbers ...
0.370s

Quicksort - normally the fastest one

Quicksort is even quicker than Mergesort, and does not need much space. The way it works is not so complicated: imagine you have a stack of 200 numbered cards to sort, the Quicksort way is: split the cards by putting the cards from 1 to 100 to one stack and the others to a second stack. Then you have only to sort stacks with 100 cards, which is easier. For this you split each 100 card stack into two stacks with 50 cards and so on.

The partition subroutine splits the subarray using an element as a comparison element - this is called the pivot element - into two subarrays. Lower numbers go to the left side and higher numbers to the right side. The pivot element goes between them.

The sort subroutine calls the partition subroutine for all unsorted subarrays, these are listed in the stack to-do list. partition creates two subarrays. The larger subarray is added in the to-do list and sorting is continued with the smaller subarray. If the to-do list is empty, the job is done and the whole array is sorted.

subr partition
  swap data[(left + right) / 2] data[left]
  piv = data[left]
  mid = left
  for i = left + 1 to right
    if data[i] < piv
      mid += 1
      swap data[i] data[mid]
    .
  .
  swap data[left] data[mid]
.
subr sort
  len stack[] 64
  # this is sufficient to sort 4 billion numbers
  left = 0
  right = len data[] - 1
  while left <> -1
    if left < right
      call partition
      if mid < (right + left) / 2
        stack[sp] = mid + 1
        stack[sp + 1] = right
        right = mid - 1
      else
        stack[sp] = left
        stack[sp + 1] = mid - 1
        left = mid + 1
      .
      sp += 2
    elif sp > 0
      sp -= 2
      left = stack[sp]
      right = stack[sp + 1]
    else
      # finished
      left = -1
    .
  .
.
Sorting 500 numbers ...
0.001s

Sorting 5000 numbers ...
0.016s

Sorting 50000 numbers ...
0.199s

Sorting 500000 numbers ...
2.331s

Quicksort combined with Insertion sort - even faster

Quicksort is very fast - but for small arrays Insertion sort is faster. So lets combine these two sorting algorithms. Subarrays generated by the partiton subroutine which are smaller than 15 are sorted by Insertion sort. In addition, improve the selection of the pivot element by taking it randomly.

subr insert_sort
  for i = left + 1 to right
    h = data[i]
    j = i - 1
    while j >= left and h < data[j]
      data[j + 1] = data[j]
      j -= 1
    .
    data[j + 1] = h
  .
.
subr partition
  swap data[left + random (right - left)] data[left]
  piv = data[left]
  mid = left
  for i = left + 1 to right
    if data[i] < piv
      mid += 1
      swap data[i] data[mid]
    .
  .
  swap data[left] data[mid]
.
subr sort
  len stack[] 64
  left = 0
  right = len data[] - 1
  while left <> -1
    if right - left < 15
      call insert_sort
      if sp > 0
        sp -= 2
        left = stack[sp]
        right = stack[sp + 1]
      else
        left = -1
      .
    else
      call partition
      if mid < (right + left) / 2
        stack[sp] = mid + 1
        stack[sp + 1] = right
        right = mid - 1
      else
        stack[sp] = left
        stack[sp + 1] = mid - 1
        left = mid + 1
      .
      sp += 2
    .
  .
.
Sorting 500 numbers ...
0.001s

Sorting 5000 numbers ...
0.015s

Sorting 50000 numbers ...
0.192s

Sorting 500000 numbers ...
2.076s

Quicksort recursive - watch the stack

The following Quicksort variant is recursive. The code looks clean and is fast. But there is one dangerous thing: the stack and its limited size. The stack size is different on different operating systems. With embedded systems, it can be very small.

func qsort left right . data[] .
  if left < right
    # partition 
    piv = data[left]
    mid = left
    for i = left + 1 to right
      if data[i] < piv
        mid += 1
        swap data[i] data[mid]
      .
    .
    swap data[left] data[mid]
    # 
    call qsort left mid - 1 data[]
    call qsort mid + 1 right data[]
  .
.
subr sort
  call qsort 0 len data[] - 1 data[]
.

With the iterative variant and reserved space for the unfinished tasks, we were able to ensure that we could sort an array with a certain size. This is not possible with this recursive variant - with sorted lists and “first element is pivot” we get probably a stack overflow. With the iterative variant we get in this case “only” the worst-case time complexity.

With tail call elimination and ensuring that the smaller subarray is sorted first, we can also make the recursive variant stack safe.

func qsort left right . data[] .
  while left < right
    # partition 
    piv = data[left]
    mid = left
    for i = left + 1 to right
      if data[i] < piv
        mid += 1
        swap data[i] data[mid]
      .
    .
    swap data[left] data[mid]
    # 
    if mid < (right + left) / 2
      call qsort left mid - 1 data[]
      left = mid + 1
    else
      call qsort mid + 1 right data[]
      right = mid - 1
    .
  .
.
subr sort
  call qsort 0 len data[] - 1 data[]
.
Sorting 500 numbers ...
0.002s

Sorting 5000 numbers ...
0.016s

Sorting 50000 numbers ...
0.200s

Sorting 500000 numbers ...
2.137s

Comparison

We see that for Bubblesort, Selection sort and Insertion sort the needed time increases by quadratic order - 10 times more items leads to about 100 times the time. That makes these sorting algorithms unusable for large arrays.

With Heapsort and Mergesort the time increases only by n log(n). Heapsort and Mergesort need about the same time. Heapsort needs less memory, Mergesort can be parallelized better.

Quicksort has on average the same behaviour in time and is usually faster. It is even faster in the combination with Insertion sort. When selecting the pivot element randomly, the runtime worst-case - quadratic-order time growth - is very unlikely.


Multithreaded Quicksort in Go and in C


easyprog.app - easy programming in the browser