Can sorting be even faster while we’ve mentioned in the previous article that the upper bound of sorting is O(nlogn)? The answer is yes! We can do sorting in linear time if the data set meets certain conditions.
Bucket sort
Bucket sort is a divide and conquer sorting algorithm. Since it allows linear time performance, we discuss it here. The basic idea of bucket sort is to distribute a data set into several buckets and then sort each bucket individually. Buckets are already ordered. Once the data elements in each bucket are sorted, the whole data set then becomes sorted.
Suppose we have a data set ranging from \(V_{min}\) to \(V_{max}\). To evenly distribute it into \(n\) buckets while ensuring that the whole range of data elements is covered, we calculate the following interval for each bucket.
$$ Interval_{data} = V_{max}-V_{min}+1 $$
$$ Interval_{bucket}=\begin{cases} Interval_{data}/n+1 &\text{if } Interval_{data}/n = 0 \\ Interval_{data}/n+1 &\text{if } Interval_{data} \mod n \ne 0 \\ Interval_{data}/n &\text{otherwise}. \end{cases} $$
Sometimes the distribution of data over buckets is all what we want and there is no need for individual bucket sorts. It can be the case where the number of buckets matches exactly the data range. For example, the score of an assignment/exam could range from 0 to 100, for which 101 buckets would be enough to sort all assignments/exams scores by simply distributing them into the buckets.
Another interesting case is to find the maximum gap between two consecutive elements from an unsorted data set:
Suppose the data set has \(n\) elements and not all of them are identical (or the maximum gap is just zero). We distribute them into \(n\) buckets that have an equal-size interval based on the above formula. If values of all data elements are consecutive, each bucket will hold exactly one element and thus the maximum gap is one. Otherwise, at least one bucket must have more than one data element. That is, at least one bucket must be empty. Consequently, the maximum gap must be larger than one bucket interval, which is the gap between two non-empty buckets that contain sequential data elements—that is, the gap between the maximum data element of the first bucket and the minimum data element of the second bucket.
For the case where only distribution is needed, bucket sort simply performs in \(O(n)\) and is stable. When sorting individual buckets is desired, the performance and stability of bucket sort depends on the sorting algorithm applied to each individual bucket.
Since buckets are independent from each other while being ordered by their scope, bucket sort enables a good foundation for parallelism and external sorting.
Counting sort
Counting sort is a special case of bucket sort where the buckets match the distinct elements of a data set. One case is the score of assignments/exams mentioned before. For example, we have 101 buckets that match the full range of scores from 0 to 100. We then count the appearance of each score from the data set. Once the counting is done, the sorted result can be generated from a sequential scan of buckets from score 0 to 100 based on their repeating times.
Here are some other examples of counting sort:
Counting sort can be done in a stable manner. For that, we maintain an array \(A\) with the size of the bucket number. It records accumulative counting results from the buckets. For example, suppose the counting result of each bucket is kept in an array \(C\), \(A[i]\) then contains the volume accumulated from \(C[0]\) to \(C[i]\). We then scan the data set reversely—from the last data element to the first one. For each data element scanned, its position in the final sorted result can be determined by the corresponding value in \(A\). After that, we decrement that value by one. Note that we are doing a reverse scan of the data set because \(A[i]\) includes the range from \(C[0]\) to \(C[i]\). In an alternative way, where \(A[i]\) is the accumulation from \(C[0]\) to \(C[i-1]\), we will scan the data set from the beginning to the end and then increment the corresponding value in \(A\) by one once a data element is positioned to the sorted result.
When the number of buckets is not too large compared to the data set size, above processes all run in linear time. Counting sort is thus a linear-time sorting algorithm. Counting sort will be efficient if the number of buckets is much smaller than the data set size.
Radix sort
Radix sort leverages the sorting stability and sorts data elements based on their individual units reversely. For example, to sort a set of phone numbers, we first sort them based on their last digit. We then apply another round of sorting to them based on their second last digit and so on. As long as we are doing stable sorting for each digit, it eventually leads to a sorted phone number sequence. Because of that, radix sort itself is stable as well. The rationale behind this strategy is that if some first digits of a phone number are larger than that of the other one, we know that phone number is larger without further comparison.
When units of a data element have a narrow scope (e.g., digits of a phone number ranges from 0 to 9), we can apply bucket sort or counting sort. When the unit number of a data element (e.g., 10 digits for a phone number) is not too large compared to the data set size, radix sort runs in linear time. Radix sort will be efficient if the unit number of a data element is much smaller than the data set size.
Since radix sort operates on individual units of data elements, it requires all data elements to have a equal size. If not, we can first try to pad the content of smaller-sized data elements with nulls/zeros.