In many situations, we want to sort some 'things' into a certain order. We might want to put books or CDs into alphabetical order, or we may be using a packing algorithm^{1} that requires the data to be pre-sorted^{2}. With a small amount of data, this is easy to do by sight. For larger amounts of data we need the help of clearly defined algorithms.

**Ordering**

Whatever data we have, there will always be a definite order that it can be put in. As an example, let us consider the problem of putting the numbers 7, 5, 2, 4, 10, 1 into ascending numerical order.

**The Interchange Sort Algorithm**

In this method, we find the smallest number in the set and interchange it with the number in the first position. Then we find the second smallest number, and swap it with the number in the second position, and so on.

In our example, we get the following (numbers in *italics* have been verified to be in the correct position by the algorithm):

Pass | Order |
---|

0 | 7 5 2 4 10 1 |

1 | *1* 5 2 4 10 7 |

2 | *1* *2* 5 4 10 7 |

3 | *1* *2* *4* 5 10 7 |

4 | *1* *2* *4* *5* 10 7 |

5 | *1* *2* *4* *5* *7* *10* |

This example requires 15 comparisons to sort six numbers.

There are a few things to notice here. Firstly, the fourth pass did not change the order of the data - it merely checked that 5 was in the correct position. Secondly, the final pass verifies the position of two numbers - if the second largest number has been put in the correct position, the largest must also be in the correct position. Because of this, we know that a set of *m* numbers will need *m* -1 passes to be put in the correct order.

The comparisons we are making between pairs of numbers are, in a sense, hidden in this algorithm. In each pass we need to find the smallest number of the remaining (unsorted) set^{3}. This will need *m* - *n* comparisons on the *n*^{th} pass, hence the total number of comparisons needed will be:

(*m* -1) + (*m* -2) + … + (*m* -[*m* -2]) + (*m* -[*m* -1]) | = | (*m* -1) + (*m* -2) + … + 2 + 1 |

| = | ½(*m*×(*m* -1)) |

So for our six-item example, this gives:

½(*m*×(*m* -1)) | = | ½(6×5) |

| = | ½×30 |

| = | 15 |

If we had twice as much data (that is, 12 items) we would have:

½(*m*×(*m* -1)) | = | ½(12×11) |

| = | ½×132 |

| = | 66 |

**The Bubble Sort Algorithm**

This algorithm gets its name from the way each element in turn is 'floated' to the correct position. We first compare the numbers in the first and second positions - swap them if in the first is bigger than the second. We then compare the second and third, then the third and fourth... each time swapping the order if the first in the pair is larger than the second. We repeat this process (ignoring numbers that have been 'floated' to the top on previous passes) until no more swaps are made.

With our example, we get the following (here '-' is used to denote a comparison being made):

Pass | Order |
---|

1 | 7 - 5 2 4 10 1 |

5 7 - 2 4 10 1 |

5 2 7 - 4 10 1 |

5 2 4 7 - 10 1 |

5 2 4 7 10 - 1 |

2 | 5 - 2 4 7 1 *10* |

2 5 - 4 7 1 *10* |

2 4 5 - 7 1 *10* |

2 4 5 7 - 1 *10* |

3 | 2 - 4 5 1 *7* *10* |

2 4 - 5 1 *7* *10* |

2 4 5 - 1 *7* *10* |

4 | 2 - 4 1 *5* *7* *10* |

2 4 - 1 *5* *7* *10* |

5 | 2 - 1 *4* *5* *7* *10* |

| *1* *2* *4* *5* *7* *10* |

As with the Interchange Sort algorithm, this uses 15 comparisons to sort our example.

You may have noticed that we compared some pairs of numbers more than once - so this is certainly not the most efficient algorithm for *this* set of data. In each pass, we float one number into the correct position. Therefore, *m* numbers will require *m* -1 passes to be ordered.

In the *n*^{th} pass, we will make *m* - *n* comparisons. Therefore the total number of comparisons we will need is:

(*m* -1) + (*m* -2) + … + (*m* -[*m* -2]) + (*m* -[*m* -1]) | = | (*m* -1) + (*m* -2) + … + 2 + 1 |

| = | ½(*m*×(*m* -1)) |

Therefore our six-item example will need ½(6×5)=15 comparisons.

If we again double the amount of data to 12 items, we will need ½(12×11)=66 comparisons.

**The Shuttle Sort Algorithm**

The Shuttle Sort builds on the Bubble Sort - in each pass through the numbers, we compare the *n*^{th} and (*n*+1)^{st} numbers. If we have to swap them, we then compare the smallest of this pair with the *preceding* number, and swap if necessary. Each time a swap is made, we compare the previous number.

Pass | Order |
---|

1 | 7 - 5 2 4 10 1 |

2 | 5 7 - 2 4 10 1 |

5 - 2 7 4 10 1 |

3 | 2 5 7 - 4 10 1 |

2 5 - 4 7 10 1 |

2 - 4 5 7 10 1 |

4 | 2 4 5 7 - 10 1 |

5 | 2 4 5 7 10 - 1 |

2 4 5 7 - 1 *10* |

2 4 5 - 1 *7* *10* |

2 4 - 1 *5* *7* *10* |

2 - 1 *4* *5* *7* *10* |

| *1* *2* *4* *5* *7* *10* |

This uses just 12 comparisons to sort our example data.

Notice that it is not until the final pass that we can be sure that any numbers are in the correct position. Also, the fourth pass contained just a single comparison, since the two numbers being compared were already in the correct order. So the number of comparisons needed in each pass is not predictable. However, the *maximum* number of comparisons in the *n*^{th} pass is *n*. We will clearly need *m* -1 passes for *m* numbers, so the maximum number of comparisons needed to put *m* numbers in order is:

1 + 2 + … + (*m* -2) + (*m* -1) = ½(*m*×(*m* -1))

Hence our example would need *at most *½(6×5)=15 comparisons. You should note that this is a *worst case* estimate - in general, we will need less comparisons than this.

For sorting 12 items, this algorithm would use *at most *½(12×11)=66 comparisons.

**So Which Is the Best?**

Well... that rather depends on the data you're trying to sort. If the data has some *partial ordering*^{4}, Shuttle Sort will give faster results. Look at the initial order of our example data - and then see where Shuttle Sort 'missed out' comparisons that would have been made by the other two algorithms.

Conversely, both Interchange Sort and Bubble Sort will always need *m* - *n* comparisons on the *n*^{th} pass, after which we can be certain that *n* numbers are in the correct positions. This means that they will always use the maximum number of comparisons to sort the data - *even if it is already in the correct order*.

If we were to properly analyse the complexity of the algorithms, we would use the 'big O' notation, which would allow us to easily compare the complexity of many different algorithms.

We would also need to find an estimate for the time taken to actually swap the numbers over. With a large amount of data to be sorted, this will certainly need to be considered if we are to properly analyse the expected running time of an algorithm.

There are, of course, more advanced sorting algorithms. These include Quicksort (which is a recursive algorithm), and Heapsort.

**Further Information**

If you found this entry interesting, and are keen for further information, take a look at *The Art of Computer Programming (volume 3): Sorting and Searching*, by Donald E Knuth. The book examines many further algorithms, with a detailed analysis of each.