Sorting algorithms are useful and commonly used for the pre-processing of a data set. Here we look at three simply sorting algorithms that have a quadratic time complexity. We also visit the property of each of them, in particular, the stability of sorting algorithms.
Stability
Before delving into specific sorting algorithms, let’s look at one interesting metric of sorting—stability. A sorting algorithm is stable if the sorted result keeps the original order of repeated data elements. What does that matter? Well, a stable sorting algorithm is more efficient for sorting a data set with more than one key. For example, to sort a hand of cards based on first their suits (clubs, diamonds, hearts, spades) and then ranks, there can be two ways:
- We first sort the cards by rank (either stable or unstable), and then do a stable sort by suit.
- We first sort the cards by suit (either stable or unstable), and within each suit, sort them by rank (either stable or unstable).
While the second way doesn’t require stable sorts at all, compared to the first way, it involves one additional scan to identify the section scope of each suit. A scan performs in linear time, which is lighter than sorting itself. However, once the data set size becomes very large, such one additional linear scan can be heavy as well.
Bubble sort
Bubble sort keeps moving the larger data elements towards the end of a data set. It sequentially scans the whole data set and compares each pair of adjacent data elements—if the former is larger than the latter, it swaps them. It repeats such process until no swaps happen. The best-case performance of bubble sort is \(O(n)\) while the worst-case and average performance are both \(O(n^2)\). Bubble sort is stable—we swap a pair of adjacent data elements only if the former is larger than the later.
// a[0] to a[n-1] is the array to sort.
bool swapped;
do {
swapped = false;
for (int j = 0; j < n - 1; ++j)
{
if (a[j] > a[j+1])
{
swap(a[j], a[j+1]);
swapped = true;
}
}
} while (swapped);
The above implementation can be optimized by skipping the largest data element that has been selected in each round.
// a[0] to a[n-1] is the array to sort.
int i = 0;
bool swapped;
do {
swapped = false;
for (int j = 0; j < n - i - 1; ++j)
{
if (a[j] > a[j+1])
{
swap(a[j], a[j+1]);
swapped = true;
}
}
i++;
} while (swapped);
Insertion sort
Insertion sort simply takes individual data elements one by one and inserts them to their correct positions. For an in-place implementation over arrays, inserting a data element requires shifting all the following data elements. The best-case performance of insertion sort is \(O(n)\) while the worst-case and average performance are both \(O(n^2)\). Insertion sort is stable—we shift a data element only if it’s larger than the data element we are inserting.
// a[0] to a[n-1] is the array to sort.
for (int i = 1; i < n; ++i)
{
int value = a[i];
int j = i - 1;
for (; j >= 0; --j)
{
if (a[j] > value)
a[j+1] = a[j];
else
break;
}
a[j+1] = value;
}
Both bubble sort and insertion sort are stable. Next let’s look at one case of unstable sorting algorithms.
Selection sort
Similar to insertion sort, selection sort inserts individual data elements one by one, but every time it selects the minimum one from the rest data elements that are unsorted. For an in-place implementation over arrays, each insertion involves at most one swap. Since searching a minimum data element requires \(O(n)\) time and selection sort repeats that \(n\) times, the best-case, worst-case, and average performance are all \(O(n^2)\). Note that selection sort is not stable, because a swap (if any) can break the original order of repeated data elements.
// a[0] to a[n-1] is the array to sort.
for (int i = 0; i < n - 1; i++)
{
int minIndex = i;
for (int j = i + 1; j < n; j++)
{
if (a[j] < a[minIndex])
minIndex = j;
}
if (minIndex != i)
{
swap(a[i], a[minIndex]);
}
}
Applicability
First of all, all these three sorting algorithms have the same average time complexity and all of them can be implemented in-place over arrays.
Bubble sort and insertion sort are both stable. For each of them, the average time complexity of each single iteration (i.e., one round of bubbling or the insertion of one data element) is \(O(n)\). However, the specific time of bubbling is three times over that of insertion. This is because bubble sort does swapping which includes three assignment operations, whereas each data element shift of insertion sort applies only one assignment. With a very large volume of data elements, such gap can become significant.
When stability is not critical, compared to bubble sort, selection sort can be efficient where swapping is very expensive. With \(n\) data elements to sort, selection sort does at most \(n\) swaps.