improve std.sort.sortContext to use block sort implementation #11117

andrewrk · 2022-03-10T20:27:48Z

In #11109 I introduced a new function, std.sort.sortContext which allows the caller to override the swap function, which makes it possible to call std lib sort in the MultiArrayList.sort method:

zig/lib/std/multi_array_list.zig

Lines 395 to 421 in 0b82c02

    
           /// `ctx` has the following method: 
        
           /// `fn lessThan(ctx: @TypeOf(ctx), a_index: usize, b_index: usize) bool` 
        
           pub fn sort(self: Self, ctx: anytype) void { 
        
               const SortContext = struct { 
        
                   sub_ctx: @TypeOf(ctx), 
        
                   slice: Slice, 
        
                   pub fn swap(sc: @This(), a_index: usize, b_index: usize) void { 
        
                       inline for (fields) |field_info, i| { 
        
                           if (@sizeOf(field_info.field_type) != 0) { 
        
                               const field = @intToEnum(Field, i); 
        
                               const ptr = sc.slice.items(field); 
        
                               mem.swap(field_info.field_type, &ptr[a_index], &ptr[b_index]); 
        
                           } 
        
                       } 
        
                   } 
        
                   pub fn lessThan(sc: @This(), a_index: usize, b_index: usize) bool { 
        
                       return sc.sub_ctx.lessThan(a_index, b_index); 
        
                   } 
        
               }; 
        
               std.sort.sortContext(self.len, SortContext{ 
        
                   .sub_ctx = ctx, 
        
                   .slice = self.slice(), 
        
               }); 
        
           }

This, in turn, made it possible to add a sort method to ArrayHashMap.

However, I did not convert our main std lib sort function to utilize this context; instead it calls insertion sort:

zig/lib/std/sort.zig

Lines 835 to 839 in 0b82c02

    
           /// TODO currently this just calls `insertionSortContext`. The block sort implementation 
        
           /// in this file needs to be adapted to use the sort context. 
        
           pub fn sortContext(len: usize, context: anytype) void { 
        
               return insertionSortContext(len, context); 
        
           }

The main problem to solve here is that the block sort implementation does not exclusively swap elements; it also uses a "cache" of elements that it copies items into and out of. In order to generalize this sort implementation, and make it usable from, for example, MultiArrayList, the cache concept needs to either be eliminated (harming performance) or incorporated in a generalized manner into the context (complicating the API). Since we strive for optimality in Zig, we will pay the price of the more complicated API.

On the plus side, this will make the cache size configurable, which is something that we already should be exposing in the API.

The tricky part of this issue will be designing the Context API carefully such that block sort can be implemented in terms of it, but is as simple as possible otherwise for callers.

The text was updated successfully, but these errors were encountered:

alichraghi · 2022-03-11T06:51:19Z

there's a benchmark result in alichraghi/zort. as you can see Insertion sort is ~160 times slower than block sort. we can try other algorithms such as tim and quick

matu3ba · 2022-03-11T21:35:04Z

@alichraghi
Did you try to implement pattern-defeating quicksort aka pdqsort, timsort ?
Another optimization goal is the generated code size, for which I dont know any numbers yet.
Other possibilities are https://lib.rs/crates/dmsort for 80% sorted data.
Timesort uses a buffer to move whole sections of elements.

Notable for pd quicksort
The current algorithm is based on pattern-defeating quicksort by Orson Peters, which combines the fast average case of randomized quicksort with the fast worst case of heapsort, while achieving linear time on slices with certain patterns. It uses some randomization to avoid degenerate cases, but with a fixed seed to always provide deterministic behavior.
It is typically faster than stable sorting, except in a few special cases, e.g., when the slice consists of several concatenated sorted sequences.

Not really sure, if quadsort is worth the complexity+memory cost https://github.com/scandum/quadsort

Would also be nice to have the sorting algorithms as ascii table overview and which is (simple to ) parallelize etc.

alichraghi · 2022-09-22T19:25:48Z

Zort now has pdq, tim, and many more algorithms.

gantt
    title Sorting (ascending) 10000000 usize (Core i7-4600U CPU 2.10GHz)
    dateFormat x
    axisFormat %S s
    section random
    tim 1.932: 0,1932
    pdq 0.400: 0,400
    quick 0.882: 0,882
    radix 0.311: 0,311
    twin 1.126: 0,1126
    std_block_merge 1.402: 0,1402
    comb 1.631: 0,1631
    shell 2.943: 0,2944
    section sorted
    tim 0.010: 0,10
    pdq 0.011: 0,11
    quick 0.281: 0,281
    radix 0.303: 0,303
    twin 0.085: 0,85
    std_block_merge 0.063: 0,63
    comb 0.500: 0,500
    shell 0.332: 0,332
    section reverse
    tim 0.603: 0,603
    pdq 0.108: 0,108
    quick 0.474: 0,474
    radix 0.251: 0,251
    twin 0.431: 0,431
    std_block_merge 0.477: 0,477
    comb 0.559: 0,559
    shell 0.357: 0,357
    section ascending saw
    tim 0.325: 0,325
    pdq 0.408: 0,408
    quick 1.454: 0,1454
    radix 0.318: 0,318
    twin 0.285: 0,285
    std_block_merge 0.518: 0,518
    comb 0.836: 0,836
    shell 0.590: 0,590
    section descending saw
    tim 0.482: 0,482
    pdq 0.404: 0,404
    quick 49.500: 0,49519
    radix 0.375: 0,375
    twin 0.674: 0,674
    std_block_merge 0.746: 0,746
    comb 1.082: 0,1082
    shell 0.834: 0,834

Possible alternatives of block sort

PDQ (described by @matu3ba)
Timsort seems to be a reasonable choice since it's fast, stable and used in Python2, Java SE 7, GNU Octave, V8, Swift and Rust's current sort is an inspiration of timsort (requires allocator)
Quadsort is also stable (hasn't implemented in zig yet) but twinsort is in the benchmark above. which is a simplified, smaller and slower variant of quadsort (requires allocator)

tensorush · 2022-09-29T08:59:11Z

@alichraghi Nice work on zorting algorithms! It might be better to open another issue if there is a need for other sorting algorithms to be included in the std. Since std.sort.sort is implemented as a stable, zero-allocation, in-place one the current block sort (a.k.a WikiSort) is a pretty good fit here, however, it could be upgraded to an enhanced GrailSort (see issue #2651).

alichraghi · 2022-09-29T09:29:22Z

thank you. pdq, tail, tim and twin is @voroskoi work

Since std.sort.sort is implemented as a stable, zero-allocation, in-place one the current block sort (a.k.a WikiSort) is a pretty good fit here, however, it could be upgraded to an enhanced GrailSort (see issue #2651).

@SpexGuy has implemented GrailSort in Zig
let's see if he likes to include it into zort? so we can benchmark it with other algorithms

EDIT: bench result

gantt
    title Sorting (ascending) 10000000 usize
    dateFormat x
    axisFormat %S s
    section random
    pdq 0.403: 0,403
    grail 1.650: 0,1650
    std_block_merge 1.725: 0,1725
    section sorted
    pdq 0.009: 0,9
    grail 0.717: 0,717
    std_block_merge 0.064: 0,64
    section reverse
    pdq 0.133: 0,133
    grail 0.769: 0,769
    std_block_merge 0.513: 0,513
    section ascending saw
    pdq 0.464: 0,464
    grail 0.939: 0,939
    std_block_merge 0.367: 0,367
    section descending saw
    pdq 0.431: 0,431
    grail 1.043: 0,1043
    std_block_merge 0.688: 0,688

voroskoi · 2022-10-02T08:28:12Z

I think it would make sense to replace std.sort.sort() with std.sort.stable() and std.sort.unstable(). This would be similar to what other languages do.
PDQ is a good candidate for unstable sorting, I would like to see it more broadly tested.
Stable sort could be Timsort, but I would prefer something without allocation in the standard library, dunno.

cb22 · 2023-10-02T08:32:40Z

On the plus side, this will make the cache size configurable, which is something that we already should be exposing in the API.

This would be great for TigerBeetle (part of tigerbeetle/tigerbeetle#1191). In our workload, bumping the cache to 2MB works out to around a ~16% throughput improvement. (we can't use an unstable sort due to how we currently handle duplicates.)

It would also be nice to eliminate the stack overflow risk from [512]T that could happen with a large T, but that doesn't affect us currently.

alichraghi · 2024-07-07T01:24:47Z

Rust replaced both it's stable and unstable algorithms

andrewrk added this to the 0.11.0 milestone Mar 10, 2022

alichraghi mentioned this issue May 24, 2023

Unstable sorting in standard library should be faster #15388

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve std.sort.sortContext to use block sort implementation #11117

improve std.sort.sortContext to use block sort implementation #11117

andrewrk commented Mar 10, 2022 •

edited

Loading

alichraghi commented Mar 11, 2022 •

edited

Loading

matu3ba commented Mar 11, 2022 •

edited

Loading

alichraghi commented Sep 22, 2022 •

edited

Loading

tensorush commented Sep 29, 2022 •

edited

Loading

alichraghi commented Sep 29, 2022 •

edited

Loading

voroskoi commented Oct 2, 2022

cb22 commented Oct 2, 2023

alichraghi commented Jul 7, 2024

improve std.sort.sortContext to use block sort implementation #11117

improve std.sort.sortContext to use block sort implementation #11117

Comments

andrewrk commented Mar 10, 2022 • edited Loading

alichraghi commented Mar 11, 2022 • edited Loading

matu3ba commented Mar 11, 2022 • edited Loading

alichraghi commented Sep 22, 2022 • edited Loading

Possible alternatives of block sort

tensorush commented Sep 29, 2022 • edited Loading

alichraghi commented Sep 29, 2022 • edited Loading

voroskoi commented Oct 2, 2022

cb22 commented Oct 2, 2023

alichraghi commented Jul 7, 2024

andrewrk commented Mar 10, 2022 •

edited

Loading

alichraghi commented Mar 11, 2022 •

edited

Loading

matu3ba commented Mar 11, 2022 •

edited

Loading

alichraghi commented Sep 22, 2022 •

edited

Loading

tensorush commented Sep 29, 2022 •

edited

Loading

alichraghi commented Sep 29, 2022 •

edited

Loading