Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements the CombSort algorithm #54

Merged
merged 15 commits into from
Oct 9, 2022
42 changes: 41 additions & 1 deletion src/SortingAlgorithms.jl
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,42 @@ using Base.Order
import Base.Sort: sort!
import DataStructures: heapify!, percolate_down!

export HeapSort, TimSort, RadixSort
export HeapSort, TimSort, RadixSort, CombSort

struct HeapSortAlg <: Algorithm end
struct TimSortAlg <: Algorithm end
struct RadixSortAlg <: Algorithm end
struct CombSortAlg <: Algorithm end

const HeapSort = HeapSortAlg()
const TimSort = TimSortAlg()
const RadixSort = RadixSortAlg()

"""
nlw0 marked this conversation as resolved.
Show resolved Hide resolved
CombSort

Indicates that a sorting function should use the comb sort
algorithm. Comb sort traverses the collection multiple times
ordering pairs of elements with a given interval between them.
The interval decreases exponentially until it becomes 1, then
it switches to insertion sort on the whole input.

Characteristics:
- *not stable* does not preserve the ordering of elements which
compare equal (e.g. "a" and "A" in a sort of letters which
ignores case).
- *in-place* in memory.
- *parallelizable* suitable for vectorization with SIMD instructions
because it performs many independent comparisons.
- *complexity* worst-case only proven to be better than quadratic, but not `n*log(n)`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This algorithm has quadratic worst-case runtime.

julia> @time sort!(4^7*repeat(1:30, 4^7));
  0.027213 seconds (8 allocations: 11.258 MiB)

julia> @time sort!(4^7*repeat(1:30, 4^7); alg=CombSort);
  4.866824 seconds (4 allocations: 7.500 MiB)

Proof

Take an arbitrary k, let m = 4k, and let n = m*4^7. Consider the first 7 intervals for an input of length n: [n*(3/4)^i for i in 1:7] == [m*4^7*(3/4)^i for i in 1:7] == [m*4^(7-i)*3^i for i in 1:7]. Notice that each interval is divisible by m.

Now, construct a pathological input v = repeat(1:m, 4^7). This input has the property v[i] == v[i+*jm] for any intergers i and j which yield inbounds indices. Consequently, the first 7 passes cannot alter v at all.

Informal interlude: There are still a lot of low numbers near the end of the list, and the remaining passes will have a hard time moving them to the beginning because their intervals are fairly small.

Consider the elements 1:k that fall in the final quarter of v. There are k*4^7/4 = n/16 such elements. Each of them must end up in the first quarter of the list once sorted, so they must each travel a total of at least n/2 slots (in reality they must each travel more than this, but all I claim is a lower bound).

To recap, we have established n/16 elements that must travel at least n/2 slots, and that they do not travel at all in the first 7 passes. The remaining comb passes have intervals no greater than [n*(3/4)^i for i in 8:inf]. The furthest an elemental can move toward the start of the vector in a single pass is the interval size of that pass, so the furthest an element can move toward the start of the vector in all remaining passes combined is sum([n*(3/4)^i for i in 8:inf]) = n*(3/4)^8 / (1 - 3/4) = 4n*(3/4)^8 < 0.401n. Thus, after all the comb passes are compete, we will still have n/16 elements that have to move at least 0.099n slots toward the start of the vector. Insertion sort, which can only move one swap at a time will require 0.099n*n/4 > .024n^2 swaps to accomplish this. Therefore, the worst case runtime of this algorithm is Ω(n^2).

It is structurally impossible for this algorithm to take more than O(n^2) time, so we can conclude Θ(n^2) is a tight asymptotic bound on the worst case runtime of this implementation of combsort. (A similar analysis holds for any geometric interval distribution).


We can verify the math in this proof empirically:

code
function comb!(v)
    lo, hi = extrema(eachindex(v))
    interval = (3 * (hi-lo+1)) >> 2
    while interval > 1
        for j in lo:hi-interval
            a, b = v[j], v[j+interval]
            v[j], v[j+interval] = b < a ? (b, a) : (a, b)
        end
        interval = (3 * interval) >> 2
    end
    v
end

function count_insertion_sort!(v)
    count = 0
    lo, hi = extrema(eachindex(v))
    for i = lo+1:hi
        j = i
        x = v[i]
        while j > lo && x < v[j-1]
            count += 1
            v[j] = v[j-1]
            j -= 1
        end
        v[j] = x
    end
    count
end

K = 1:6
M = 4 .* K
N = M .* 4^7
swaps = [count_insertion_sort!(comb!(repeat(1:m, 4^7))) for m in M]

using Plots
plot(N, swaps, label="actual swaps", xlabel="n", ylabel="swaps", legend=:topleft)
plot!(N, .024N.^2, label="theoretical minimum")

Results
Screen Shot 2022-10-05 at 5 02 03 PM


The proof conveniently provides us with a pathological input to test. So, even more empirically, we can simply measure runtime.

Code
# multiply by a large number ot avoid dispatch to counting sort
make_vector(m) = 4^7*repeat(1:m, 4^7)
x = 1:20
n       = 4^7*x
comb    = [(x = make_vector(m); @elapsed(sort!(x; alg=CombSort))) for m in x]
default = [(x = make_vector(m); @elapsed(sort!(x              ))) for m in x]
theory  = .024n.^2 / 1.6e9 # 1.6 ghz clock speed

plot(n, comb, label="comb sort", xlabel="n", ylabel="time (s)", legend=:topleft)
plot!(n, default, label="default sort")
plot!(n, theory, label="theoretical minimum")

Results

Screen Shot 2022-10-05 at 5 18 43 PM

nlw0 marked this conversation as resolved.
Show resolved Hide resolved

## References
nlw0 marked this conversation as resolved.
Show resolved Hide resolved
- Werneck, N. L., (2020). "ChipSort: a SIMD and cache-aware sorting module. JuliaCon Proceedings, 1(1), 12, https://doi.org/10.21105/jcon.00012
- H. Inoue, T. Moriyama, H. Komatsu and T. Nakatani, "AA-Sort: A New Parallel Sorting Algorithm for Multi-Core SIMD Processors," 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), 2007, pp. 189-198, doi: 10.1109/PACT.2007.4336211.
- Vitányi, Paul M. B., (2007). "Analysis of Sorting Algorithms by Kolmogorov Complexity (A Survey)." doi: 10.1007/978-3-540-32777-6
nlw0 marked this conversation as resolved.
Show resolved Hide resolved
"""
const CombSort = CombSortAlg()


## Heap sort

Expand Down Expand Up @@ -578,4 +604,18 @@ function sort!(v::AbstractVector, lo::Int, hi::Int, ::TimSortAlg, o::Ordering)
return v
end

function sort!(v::AbstractVector, lo::Int, hi::Int, ::CombSortAlg, o::Ordering)
interval = (3 * (hi-lo+1)) >> 2

while interval > 1
@inbounds for j in lo:hi-interval
a, b = v[j], v[j+interval]
v[j], v[j+interval] = lt(o, b, a) ? (b, a) : (a, b)
end
interval = (3 * interval) >> 2
end

return sort!(v, lo, hi, InsertionSort, o)
end

end # module
6 changes: 3 additions & 3 deletions test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ using Random

a = rand(1:10000, 1000)

for alg in [TimSort, HeapSort, RadixSort]
for alg in [TimSort, HeapSort, RadixSort, CombSort]
b = sort(a, alg=alg)
@test issorted(b)
ix = sortperm(a, alg=alg)
Expand Down Expand Up @@ -85,7 +85,7 @@ for n in [0:10..., 100, 101, 1000, 1001]
end

# unstable algorithms
for alg in [HeapSort]
for alg in [HeapSort, CombSort]
p = sortperm(v, alg=alg, order=ord)
@test isperm(p)
@test v[p] == si
Expand All @@ -99,7 +99,7 @@ for n in [0:10..., 100, 101, 1000, 1001]

v = randn_with_nans(n,0.1)
for ord in [Base.Order.Forward, Base.Order.Reverse],
alg in [TimSort, HeapSort, RadixSort]
alg in [TimSort, HeapSort, RadixSort, CombSort]
# test float sorting with NaNs
s = sort(v, alg=alg, order=ord)
@test issorted(s, order=ord)
Expand Down