Specialize Prefix/Suffix Match for `Like/ILike` between Array and Scalar for StringViewArray #6231

xinlifoobar · 2024-08-13T03:40:55Z

Which issue does this PR close?

Parts of #5951.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

xinlifoobar · 2024-08-13T03:42:35Z

Bench from my dev machine

# xinli @ arch-dev in ~/source/repos/arrow-rs on git:dev/xinli1/optimize_prefix o [11:42:12] C:130
$ uname -a
Linux arch-dev 6.10.3-zen1-2-zen #1 ZEN SMP PREEMPT_DYNAMIC Tue, 06 Aug 2024 07:47:21 +0000 x86_64 GNU/Linux

# xinli @ arch-dev in ~/source/repos/arrow-rs on git:dev/xinli1/optimize_prefix o [12:33:29] C:130
$ critcmp master optimize_prefix optimize_prefix_boxed_iter                         
group                               master                                 optimize_prefix                        optimize_prefix_boxed_iter
-----                               ------                                 ---------------                        --------------------------
like_utf8view scalar complex        1.01    172.7±1.45ms        ? ?/sec    1.00    170.3±1.47ms        ? ?/sec    1.00    171.0±1.55ms        ? ?/sec
like_utf8view scalar contains       1.04    129.3±6.07ms        ? ?/sec    1.01    126.2±1.46ms        ? ?/sec    1.00    124.9±1.44ms        ? ?/sec
like_utf8view scalar ends with      1.02     37.6±0.45ms        ? ?/sec    1.00     36.7±0.38ms        ? ?/sec    1.02     37.6±0.39ms        ? ?/sec
like_utf8view scalar equals         1.01     26.2±0.42ms        ? ?/sec    1.02     26.5±0.38ms        ? ?/sec    1.00     26.0±0.27ms        ? ?/sec
like_utf8view scalar starts with    1.90     32.8±1.58ms        ? ?/sec    1.00     17.2±0.45ms        ? ?/sec    1.89     32.5±0.45ms        ? ?/sec

xinlifoobar · 2024-08-13T03:45:26Z

Notablely this PR is just for startwith/istartwith like between array and scalar. Array vs array is more ticky, I am looking into some feasible options...

xinlifoobar · 2024-08-13T03:49:47Z

I read through the conditions here. Ideally, when the lhs is a scalar and rhs is an array, use op_scalar with reversed order should be faster than making a scalar iterator?

https://github.com/apache/arrow-rs/blob/a693f0f9c37567b2b121e261fc0a4587776d5ca4/arrow-string/src/like.rs#L204C1-L221C14

CC @alamb @XiangpengHao

xinlifoobar · 2024-08-13T04:34:34Z

Bench from my dev machine

# xinli @ arch-dev in ~/source/repos/arrow-rs on git:dev/xinli1/optimize_prefix o [11:42:12] C:130
$ uname -a
Linux arch-dev 6.10.3-zen1-2-zen #1 ZEN SMP PREEMPT_DYNAMIC Tue, 06 Aug 2024 07:47:21 +0000 x86_64 GNU/Linux

# xinli @ arch-dev in ~/source/repos/arrow-rs on git:dev/xinli1/optimize_prefix o [12:33:29] C:130
$ critcmp master optimize_prefix optimize_prefix_boxed_iter                         
group                               master                                 optimize_prefix                        optimize_prefix_boxed_iter
-----                               ------                                 ---------------                        --------------------------
like_utf8view scalar complex        1.01    172.7±1.45ms        ? ?/sec    1.00    170.3±1.47ms        ? ?/sec    1.00    171.0±1.55ms        ? ?/sec
like_utf8view scalar contains       1.04    129.3±6.07ms        ? ?/sec    1.01    126.2±1.46ms        ? ?/sec    1.00    124.9±1.44ms        ? ?/sec
like_utf8view scalar ends with      1.02     37.6±0.45ms        ? ?/sec    1.00     36.7±0.38ms        ? ?/sec    1.02     37.6±0.39ms        ? ?/sec
like_utf8view scalar equals         1.01     26.2±0.42ms        ? ?/sec    1.02     26.5±0.38ms        ? ?/sec    1.00     26.0±0.27ms        ? ?/sec
like_utf8view scalar starts with    1.90     32.8±1.58ms        ? ?/sec    1.00     17.2±0.45ms        ? ?/sec    1.89     32.5±0.45ms        ? ?/sec

xinlifoobar · 2024-08-13T04:35:00Z

Bench from my dev machine

# xinli @ arch-dev in ~/source/repos/arrow-rs on git:dev/xinli1/optimize_prefix o [11:42:12] C:130
$ uname -a
Linux arch-dev 6.10.3-zen1-2-zen #1 ZEN SMP PREEMPT_DYNAMIC Tue, 06 Aug 2024 07:47:21 +0000 x86_64 GNU/Linux

# xinli @ arch-dev in ~/source/repos/arrow-rs on git:dev/xinli1/optimize_prefix o [12:33:29] C:130
$ critcmp master optimize_prefix optimize_prefix_boxed_iter                         
group                               master                                 optimize_prefix                        optimize_prefix_boxed_iter
-----                               ------                                 ---------------                        --------------------------
like_utf8view scalar complex        1.01    172.7±1.45ms        ? ?/sec    1.00    170.3±1.47ms        ? ?/sec    1.00    171.0±1.55ms        ? ?/sec
like_utf8view scalar contains       1.04    129.3±6.07ms        ? ?/sec    1.01    126.2±1.46ms        ? ?/sec    1.00    124.9±1.44ms        ? ?/sec
like_utf8view scalar ends with      1.02     37.6±0.45ms        ? ?/sec    1.00     36.7±0.38ms        ? ?/sec    1.02     37.6±0.39ms        ? ?/sec
like_utf8view scalar equals         1.01     26.2±0.42ms        ? ?/sec    1.02     26.5±0.38ms        ? ?/sec    1.00     26.0±0.27ms        ? ?/sec
like_utf8view scalar starts with    1.90     32.8±1.58ms        ? ?/sec    1.00     17.2±0.45ms        ? ?/sec    1.89     32.5±0.45ms        ? ?/sec

I did the benchmark on fixing the msrv issue. Either boxed iter or vector has hit the performance badly..

alamb · 2024-08-13T21:27:31Z

2x faster on starts_with. not bad!

alamb

Thanks @xinlifoobar -- this is pretty cool

I think ti would be cooler if we can figure out how to use StringView to help prefixes that are up to 12 bytes long as well

arrow-string/src/like.rs

arrow-string/src/predicate.rs

xinlifoobar · 2024-08-15T14:27:52Z

Here is an updated benchmark for the latest code. It indicates the optimizations only work on the first 4/12 bytes. Any time it reaches the buffer, the perf is down. Given the result, I suspect it won't work on complex cases like regex. I will test them though.

# xinli @ arch-dev in ~/source/repos/arrow-rs on git:dev/xinli1/optimize_prefix o [22:23:48] 
$ critcmp master optimize_prefix                                                    
group                                                 master                                 optimize_prefix
-----                                                 ------                                 ---------------
like_utf8view scalar complex                          1.00    174.5±2.43ms        ? ?/sec    1.01    176.0±2.07ms        ? ?/sec
like_utf8view scalar contains                         1.00    129.2±1.50ms        ? ?/sec    1.02    131.6±5.38ms        ? ?/sec
like_utf8view scalar ends with                        1.01     38.0±0.68ms        ? ?/sec    1.00     37.7±0.38ms        ? ?/sec
like_utf8view scalar equals                           1.00     26.2±0.34ms        ? ?/sec    1.00     26.2±0.37ms        ? ?/sec
like_utf8view scalar starts with                      1.63     32.6±0.40ms        ? ?/sec    1.00     20.0±0.29ms        ? ?/sec
like_utf8view scalar starts with more than 4 bytes    1.07     33.8±0.38ms        ? ?/sec    1.00     31.7±0.28ms        ? ?/sec

arrow/Cargo.toml

alamb · 2024-08-15T22:26:58Z

I am hoping to find time to review this in more detail tomorrow

xinlifoobar · 2024-08-17T09:43:48Z

I got some better results on other string view predicates. Will update them in batch tonight.

arrow-string/src/predicate.rs

…efix_v2

xinlifoobar · 2024-08-17T15:15:49Z

Seems the previous result are generated by falut code. Let me do more iterations for this.

xinlifoobar · 2024-08-19T09:20:18Z

Updated the Benchmark results for the latest version.

$ uname -a
Linux arch-dev 6.10.4-zen2-1-zen #1 ZEN SMP PREEMPT_DYNAMIC Sun, 11 Aug 2024 16:18:46 +0000 x86_64 GNU/Linux

# xinli @ arch-dev in ~/source/repos/arrow-rs on git:dev/xinli/prefix_v2 x [17:34:42] 
$ critcmp master_08_19 optimized_prefix_suffix                                              
group                                                 master_08_19                           optimized_prefix_suffix
-----                                                 ------------                           -----------------------
like_utf8view scalar complex                          1.06   184.5±15.76ms        ? ?/sec    1.00    174.7±2.58ms        ? ?/sec
like_utf8view scalar contains                         1.03    130.7±3.52ms        ? ?/sec    1.00    127.5±2.69ms        ? ?/sec
like_utf8view scalar ends with                        1.10     37.8±0.54ms        ? ?/sec    1.00     34.3±0.56ms        ? ?/sec
like_utf8view scalar equals                           1.00     26.4±0.59ms        ? ?/sec    1.03     27.2±0.94ms        ? ?/sec
like_utf8view scalar starts with                      1.59     32.8±0.55ms        ? ?/sec    1.00     20.6±0.65ms        ? ?/sec
like_utf8view scalar starts with more than 4 bytes    1.07     34.0±0.53ms        ? ?/sec    1.00     31.9±0.44ms        ? ?/sec

xinlifoobar · 2024-08-19T09:26:02Z

Observations Based on Changes:

The inline implementation of start_withs resulted in notable performance improvements.
The suffix_iter is slightly helpful on perf and should be able to help on memory since we don't have to have the whole str and reverse it (logically).
Switching from &str to &[u8] showed only marginal performance gains (a few milliseconds), with a trade-off in flexibility and readability. Since str is essentially a byte container, the benefits do not justify the loss in code clarity.
no significant performance difference exists between using BooleanArray::from::<Vec<bool>> and BooleanArray::fromUnary.

arrow-array/src/array/byte_view_array.rs

alamb

Thanks @xinlifoobar -- other than the suffix_iter producing potentially invalid &str I think this PR looks good to me

arrow-array/src/array/byte_view_array.rs

arrow-string/src/predicate.rs

xinlifoobar · 2024-08-20T05:53:33Z

Some minor improvements after use &[u8] directly.

like_utf8view scalar complex
----------------------------
optimized_prefix_suffix_bytes     1.00     171.9±2.78ms       ? ?/sec
optimized_prefix_suffix           1.02     174.7±2.58ms       ? ?/sec
master_08_19                      1.07    184.5±15.76ms       ? ?/sec

like_utf8view scalar contains
-----------------------------
optimized_prefix_suffix_bytes     1.00     124.7±1.66ms       ? ?/sec
optimized_prefix_suffix           1.02     127.5±2.69ms       ? ?/sec
master_08_19                      1.05     130.7±3.52ms       ? ?/sec

like_utf8view scalar ends with
------------------------------
optimized_prefix_suffix_bytes     1.00      33.6±0.52ms       ? ?/sec
optimized_prefix_suffix           1.02      34.3±0.56ms       ? ?/sec
master_08_19                      1.12      37.8±0.54ms       ? ?/sec

like_utf8view scalar equals
---------------------------
master_08_19                      1.00      26.4±0.59ms       ? ?/sec
optimized_prefix_suffix_bytes     1.01      26.7±0.56ms       ? ?/sec
optimized_prefix_suffix           1.03      27.2±0.94ms       ? ?/sec

like_utf8view scalar starts with
--------------------------------
optimized_prefix_suffix_bytes     1.00      20.4±0.27ms       ? ?/sec
optimized_prefix_suffix           1.01      20.6±0.65ms       ? ?/sec
master_08_19                      1.61      32.8±0.55ms       ? ?/sec

like_utf8view scalar starts with more than 4 bytes
--------------------------------------------------
optimized_prefix_suffix_bytes     1.00      31.8±0.35ms       ? ?/sec
optimized_prefix_suffix           1.00      31.9±0.44ms       ? ?/sec
master_08_19                      1.07      34.0±0.53ms       ? ?/sec

arrow-string/src/predicate.rs

alamb

Thanks @xinlifoobar -- I am running the benchmarks on this branch and will report back

arrow-string/src/predicate.rs

alamb · 2024-08-20T16:11:28Z

arrow-string/src/like.rs

+    // 😈 is four bytes long.
+    test_utf8_scalar!(
+        test_uff8_array_like_multibyte,
+        vec![


🤔 it occurs to me we should also be testing with Options as well (aka the test data should have nulls)

alamb · 2024-08-20T16:14:41Z

arrow-array/src/array/byte_view_array.rs

+            let len = (*v as u32) as usize;
+
+            if len < prefix_len {
+                return &[] as &[u8];


as you mentioned above, having to return an empty slice just for the function to immediate check it again might be another potential performance improvement

What do you think about making this more general and take a function? Maybe something like the following (untested)

/// Applies function `f` to the first `prefix_len` bytes for all views /// if the view length is less tha prefix_len func is invoked with None(T) pub fn prefix_bytes_iter<F, T>(&self, prefix_len: usize, func: F) -> impl Iterator<Item = T> where F: FnMut(Option<&[u8]>) -> T { ... }

I am not sure this is a good idea but figured maybe it would be more general. But maybe not...

I thought passing a function pointer to the *_iters was a bad decision. I did this actually in the first version of this PR, e.g.,

pub fn predicate(&self, func: F) -> Impl ArrayRef where F: FnMut(Option<&[u8]>) -> T { } # or pub fn predicate_prefix(&self, func: F) -> Impl ArrayRef where F: FnMut(Option<&[u8]>) -> T { }

This was good, but a circular on the crate dependencies was introduced, i.e.,

# past Predicate --evaluate_array--> Array # after Predicate --evaluate_array--> Array --predicate--> Predicate Function --evaluate--> Array Item.

This could be solved by re-layouting the code but lots of changes there.

Also, the functions are very specialized, as they should not be. The function signature is not flexible enough to generalize all such requirements.

Makes sense -- thank you for the explanation. Let's keep exploring this method for now

alamb · 2024-08-20T16:15:28Z

🤔 the benchmarks fail now for me like


Benchmarking eq scalar StringViewArray: Warming up for 3.0000 sthread 'main' panicked at arrow/benches/comparison_kernels.rs:196:50:
called `Result::unwrap()` on an `Err` value: InvalidArgumentError("Invalid comparison operation: Utf8 == Utf8View")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

error: bench failed, to rerun pass `-p arrow --bench comparison_kernels`

xinlifoobar · 2024-08-21T06:14:57Z

🤔 the benchmarks fail now for me like


Benchmarking eq scalar StringViewArray: Warming up for 3.0000 sthread 'main' panicked at arrow/benches/comparison_kernels.rs:196:50:
called `Result::unwrap()` on an `Err` value: InvalidArgumentError("Invalid comparison operation: Utf8 == Utf8View")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

error: bench failed, to rerun pass `-p arrow --bench comparison_kernels`

This issue could be repro on the master branch... I looked into the history the benchmark shouldn't work at the time it was checked in. Comment out this bench and everything works then...

https://github.com/alamb/arrow-rs/blob/8941cbf5325b380bf70ea1ee5950f570a102c873/arrow-ord/src/cmp.rs#L235-L239

xinlifoobar · 2024-08-21T07:05:36Z

🤔 the benchmarks fail now for me like
Benchmarking eq scalar StringViewArray: Warming up for 3.0000 sthread 'main' panicked at arrow/benches/comparison_kernels.rs:196:50:
called `Result::unwrap()` on an `Err` value: InvalidArgumentError("Invalid comparison operation: Utf8 == Utf8View")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

error: bench failed, to rerun pass `-p arrow --bench comparison_kernels`
This issue could be repro on the master branch... I looked into the history the benchmark shouldn't work at the time it was checked in. Comment out this bench and everything works then...

https://github.com/alamb/arrow-rs/blob/8941cbf5325b380bf70ea1ee5950f570a102c873/arrow-ord/src/cmp.rs#L235-L239

This would be a more complex fix than expected. The following functions, including appy, apply_op* are expected the lhs and rhs are of the same data type. How about doing convertions beforehand?

alamb · 2024-08-21T16:07:29Z

This issue could be repro on the master branch... I looked into the history the benchmark shouldn't work at the time it was checked in. Comment out this bench and everything works then...

Thanks @xinlifoobar -- indeed this does appear to be an issue on the master branch. I filed #6283 and will fix it shortly

alamb

Thanks again for @xinlifoobar. I think this PR is looking quite nice now.

I am sorry for all the back and forth with this PR, but I think we are on good track now. Doing performance optimizations always seems to take much longer than I think/hope it should :)

I think the high level structure / idea of this PR is looking good

What I think is needed next is to run the benchmarks and see how much better this branch is than master (and validate if bytes_iter() makes a difference, for example)

I think once we have merged #6284 and merged up to this branch we should be able to test it out.

alamb · 2024-08-21T16:40:03Z

arrow-string/src/predicate.rs

+                            .collect::<Vec<_>>(),
+                    )
+                } else {
+                    BooleanArray::from_unary(array, |haystack| {


looking more carefully at BooleanArray::from_unary it will use the ArrayAccessor impl for StringViewArray

arrow-rs/arrow-array/src/array/byte_view_array.rs

Lines 547 to 557 in 7eb3c83

impl<'a, T: ByteViewType + ?Sized> ArrayAccessor for &'a GenericByteViewArray<T> {

type Item = &'a T::Native;

fn value(&self, index: usize) -> Self::Item {

GenericByteViewArray::value(self, index)

}

unsafe fn value_unchecked(&self, index: usize) -> Self::Item {

GenericByteViewArray::value_unchecked(self, index)

}

}

It isn't clear to me that how calling bytes_iter() would make this faster as the code for value_unchecked is the same as butes_iter

I think the test should be we'll try benchmarking and see if this improves things

Ya, I thought the only differences between bytes_iter and ArrayAccessor is

For bytes_iter

self.views().has_next()? -> self.views().next() -> value_unchecked()

For ArrayAccessor

index = index + 1 -> self.views.get_unchecked(idx) -> str(value_unchecked()).as_bytes()

There are merely differences between the indexing operations and iterator methods. The benchmark also indicates in %5 ranges.

I made a PR here to refine this documentation: #6306

Another difference is that bytes_iterator() iterates over all array slots, including those that are null

alamb · 2024-08-21T16:47:26Z

arrow-array/src/array/byte_view_array.rs

+            let len = (*v as u32) as usize;
+
+            if len < prefix_len {
+                return &[] as &[u8];


Makes sense -- thank you for the explanation. Let's keep exploring this method for now

xinlifoobar · 2024-08-22T05:31:48Z

New benchmark results. It looks like the suffix_iter proves itself here.

# xinli @ arch-dev in ~/source/repos/arrow-rs on git:dev/xinli/prefix_v2 o [13:29:59] 
$ uname -a 
Linux arch-dev 6.10.4-zen2-1-zen #1 ZEN SMP PREEMPT_DYNAMIC Sun, 11 Aug 2024 16:18:46 +0000 x86_64 GNU/Linux

# xinli @ arch-dev in ~/source/repos/arrow-rs on git:dev/xinli/prefix_v2 o [13:30:07] 
$ critcmp master_08_22 optimized_like_08_22     
group                                        master_08_22                           optimized_like_08_22
-----                                        ------------                           --------------------
like_utf8view scalar complex                 1.01    175.9±6.78ms        ? ?/sec    1.00    173.7±3.55ms        ? ?/sec
like_utf8view scalar contains                1.02    131.7±1.71ms        ? ?/sec    1.00    128.6±2.73ms        ? ?/sec
like_utf8view scalar ends with 13 bytes      1.07     33.1±0.54ms        ? ?/sec    1.00     30.9±0.54ms        ? ?/sec
like_utf8view scalar ends with 4 bytes       1.16     37.1±0.65ms        ? ?/sec    1.00     32.0±0.74ms        ? ?/sec
like_utf8view scalar ends with 6 bytes       1.21     38.4±0.57ms        ? ?/sec    1.00     31.8±1.08ms        ? ?/sec
like_utf8view scalar equals                  1.00     27.2±0.88ms        ? ?/sec    1.01     27.6±1.25ms        ? ?/sec
like_utf8view scalar starts with 13 bytes    1.00     30.7±0.53ms        ? ?/sec    1.00     30.7±0.67ms        ? ?/sec
like_utf8view scalar starts with 4 bytes     1.56     34.5±1.64ms        ? ?/sec    1.00     22.1±0.40ms        ? ?/sec
like_utf8view scalar starts with 6 bytes     1.11     34.9±0.62ms        ? ?/sec    1.00     31.3±0.53ms        ? ?/sec

alamb

TLDR is I think this is good to go. Thank you @xinlifoobar this is very nice

It would be nice to avoid the need for bytes_iter if possible before release. I will explore doing so in a follow on PR

I ran the benchmarks again 👨‍🍳 👌

++ critcmp master dev_xinli1_optimize_prefix
group                                                     dev_xinli1_optimize_prefix             master
-----                                                     --------------------------             ------
ilike_utf8 scalar complex                                 1.00      2.7±0.08ms        ? ?/sec    1.00      2.7±0.09ms        ? ?/sec
ilike_utf8 scalar contains                                1.00      4.2±0.07ms        ? ?/sec    1.00      4.2±0.08ms        ? ?/sec
ilike_utf8 scalar ends with                               1.00  1235.1±38.98µs        ? ?/sec    1.00  1240.9±41.24µs        ? ?/sec
ilike_utf8 scalar equals                                  1.00   773.0±23.01µs        ? ?/sec    1.01   780.1±20.32µs        ? ?/sec
ilike_utf8 scalar starts with                             1.00  1132.5±25.82µs        ? ?/sec    1.00  1135.2±39.42µs        ? ?/sec
ilike_utf8_scalar_dyn dictionary[10] string[4])           1.00     88.3±0.16µs        ? ?/sec    1.00     88.2±0.09µs        ? ?/sec
like_utf8 scalar complex                                  1.01  1876.9±53.81µs        ? ?/sec    1.00  1859.7±27.02µs        ? ?/sec
like_utf8 scalar contains                                 1.00  1726.5±15.40µs        ? ?/sec    1.03  1774.5±19.87µs        ? ?/sec
like_utf8 scalar ends with                                1.03    440.5±6.68µs        ? ?/sec    1.00   426.8±13.04µs        ? ?/sec
like_utf8 scalar equals                                   1.00     90.9±0.22µs        ? ?/sec    1.39    126.8±0.13µs        ? ?/sec
like_utf8 scalar starts with                              1.03    341.8±5.19µs        ? ?/sec    1.00    333.4±3.99µs        ? ?/sec
like_utf8_scalar_dyn dictionary[10] string[4])            1.00     88.2±0.18µs        ? ?/sec    1.00     88.0±0.11µs        ? ?/sec
like_utf8view scalar complex                              1.03    183.7±1.27ms        ? ?/sec    1.00    179.0±0.62ms        ? ?/sec
like_utf8view scalar contains                             1.00    129.9±0.28ms        ? ?/sec    1.04    135.4±0.22ms        ? ?/sec
like_utf8view scalar ends with 13 bytes                   1.00     43.7±0.27ms        ? ?/sec    1.15     50.5±0.22ms        ? ?/sec
like_utf8view scalar ends with 4 bytes                    1.00     44.7±0.16ms        ? ?/sec    1.22     54.4±0.21ms        ? ?/sec
like_utf8view scalar ends with 6 bytes                    1.00     44.7±0.23ms        ? ?/sec    1.24     55.5±0.11ms        ? ?/sec
like_utf8view scalar equals                               1.00     32.3±0.09ms        ? ?/sec    1.07     34.5±0.07ms        ? ?/sec
like_utf8view scalar starts with 13 bytes                 1.00     45.9±0.15ms        ? ?/sec    1.00     46.1±0.30ms        ? ?/sec
like_utf8view scalar starts with 4 bytes                  1.00     25.2±0.11ms        ? ?/sec    1.93     48.6±0.10ms        ? ?/sec
like_utf8view scalar starts with 6 bytes                  1.00     46.3±0.21ms        ? ?/sec    1.07     49.5±0.19ms        ? ?/sec

alamb · 2024-08-25T13:24:10Z

🚀 -- thanks again for sticking with this @xinlifoobar

github-actions bot added parquet Changes to the parquet crate arrow Changes to the arrow crate labels Aug 13, 2024

xinlifoobar closed this Aug 13, 2024

xinlifoobar reopened this Aug 13, 2024

alamb reviewed Aug 13, 2024

View reviewed changes

arrow-string/src/like.rs Outdated Show resolved Hide resolved

arrow-string/src/like.rs Outdated Show resolved Hide resolved

arrow-string/src/predicate.rs Outdated Show resolved Hide resolved

github-actions bot removed the parquet Changes to the parquet crate label Aug 15, 2024

xinlifoobar commented Aug 15, 2024

View reviewed changes

arrow/Cargo.toml Outdated Show resolved Hide resolved

alamb mentioned this pull request Aug 16, 2024

Improve documentation on StringArrayType trait apache/datafusion#12027

Merged

xinlifoobar marked this pull request as draft August 17, 2024 09:43

alamb reviewed Aug 17, 2024

View reviewed changes

arrow-string/src/predicate.rs Show resolved Hide resolved

v2 impl

894e797

xinlifoobar force-pushed the dev/xinli1/optimize_prefix branch from 4f0f49a to 894e797 Compare August 17, 2024 14:28

xinlifoobar added 4 commits August 17, 2024 22:31

Add bench

a80431f

Merge branch 'master' of github.com:apache/arrow-rs into dev/xinli/pr…

e4ad6d9

…efix_v2

fix clippy

32d6dc2

fix endswith

f6f8f55

Finalize the prefix_v2 implementation

6bee687

xinlifoobar marked this pull request as ready for review August 19, 2024 09:27

stop reverse string for ends_with

3322905

alamb changed the title ~~Implement Prefix Match for Like/ILike between Array and Scalar~~ Specialize Prefix Match for Like/ILike between Array and Scalar for StringViewArray Aug 19, 2024

alamb reviewed Aug 19, 2024

View reviewed changes

arrow-array/src/array/byte_view_array.rs Outdated Show resolved Hide resolved

alamb reviewed Aug 19, 2024

View reviewed changes

arrow-array/src/array/byte_view_array.rs Outdated Show resolved Hide resolved

arrow-string/src/predicate.rs Outdated Show resolved Hide resolved

Fix comments

cd5886f

fix bad comment

7eb3c83

xinlifoobar changed the title ~~Specialize Prefix Match for Like/ILike between Array and Scalar for StringViewArray~~ Specialize Prefix/Suffix Match for Like/ILike between Array and Scalar for StringViewArray Aug 20, 2024

alamb reviewed Aug 20, 2024

View reviewed changes

arrow-string/src/predicate.rs Outdated Show resolved Hide resolved

alamb reviewed Aug 20, 2024

View reviewed changes

Correct equals sematics

4a27f96

alamb mentioned this pull request Aug 21, 2024

comparison_kernels benchmarks panic #6283

Closed

alamb mentioned this pull request Aug 21, 2024

Fix panic in comparison_kernel benchmarks #6284

Merged

alamb reviewed Aug 21, 2024

View reviewed changes

Merge remote-tracking branch 'origin' into dev/xinli/prefix_v2

0c9ac9a

alamb approved these changes Aug 25, 2024

View reviewed changes

alamb merged commit 855666d into apache:master Aug 25, 2024
25 checks passed

alamb mentioned this pull request Aug 25, 2024

Minor: Improve comments on GenericByteViewArray::bytes_iter(), prefix_iter() and suffix_iter() #6306

Merged

xinlifoobar mentioned this pull request Aug 27, 2024

Add Null Mask to Prefix and Suffix Iters #6312

Draft

alamb mentioned this pull request Sep 9, 2024

Optimize like/ilike kernels for StringView #5951

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specialize Prefix/Suffix Match for `Like/ILike` between Array and Scalar for StringViewArray #6231

Specialize Prefix/Suffix Match for `Like/ILike` between Array and Scalar for StringViewArray #6231

xinlifoobar commented Aug 13, 2024 •

edited

Loading

xinlifoobar commented Aug 13, 2024 •

edited

Loading

xinlifoobar commented Aug 13, 2024

xinlifoobar commented Aug 13, 2024

xinlifoobar commented Aug 13, 2024

xinlifoobar commented Aug 13, 2024 •

edited

Loading

alamb commented Aug 13, 2024

alamb left a comment

xinlifoobar commented Aug 15, 2024 •

edited

Loading

alamb commented Aug 15, 2024

xinlifoobar commented Aug 17, 2024

xinlifoobar commented Aug 17, 2024

xinlifoobar commented Aug 19, 2024 •

edited

Loading

xinlifoobar commented Aug 19, 2024 •

edited

Loading

alamb left a comment

xinlifoobar commented Aug 20, 2024

alamb left a comment

alamb Aug 20, 2024

alamb Aug 20, 2024

xinlifoobar Aug 21, 2024 •

edited

Loading

alamb Aug 21, 2024

alamb commented Aug 20, 2024

xinlifoobar commented Aug 21, 2024 •

edited

Loading

xinlifoobar commented Aug 21, 2024 •

edited

Loading

alamb commented Aug 21, 2024

alamb left a comment

alamb Aug 21, 2024

alamb Aug 21, 2024

xinlifoobar Aug 22, 2024 •

edited

Loading

alamb Aug 25, 2024

alamb Aug 25, 2024

alamb Aug 21, 2024

xinlifoobar commented Aug 22, 2024 •

edited

Loading

alamb left a comment •

edited

Loading

alamb commented Aug 25, 2024

	impl<'a, T: ByteViewType + ?Sized> ArrayAccessor for &'a GenericByteViewArray<T> {
	type Item = &'a T::Native;

	fn value(&self, index: usize) -> Self::Item {
	GenericByteViewArray::value(self, index)
	}

	unsafe fn value_unchecked(&self, index: usize) -> Self::Item {
	GenericByteViewArray::value_unchecked(self, index)
	}
	}

Specialize Prefix/Suffix Match for Like/ILike between Array and Scalar for StringViewArray #6231

Specialize Prefix/Suffix Match for Like/ILike between Array and Scalar for StringViewArray #6231

Conversation

xinlifoobar commented Aug 13, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

xinlifoobar commented Aug 13, 2024 • edited Loading

xinlifoobar commented Aug 13, 2024

xinlifoobar commented Aug 13, 2024

xinlifoobar commented Aug 13, 2024

xinlifoobar commented Aug 13, 2024 • edited Loading

alamb commented Aug 13, 2024

alamb left a comment

Choose a reason for hiding this comment

xinlifoobar commented Aug 15, 2024 • edited Loading

alamb commented Aug 15, 2024

xinlifoobar commented Aug 17, 2024

xinlifoobar commented Aug 17, 2024

xinlifoobar commented Aug 19, 2024 • edited Loading

xinlifoobar commented Aug 19, 2024 • edited Loading

alamb left a comment

Choose a reason for hiding this comment

xinlifoobar commented Aug 20, 2024

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xinlifoobar Aug 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented Aug 20, 2024

xinlifoobar commented Aug 21, 2024 • edited Loading

xinlifoobar commented Aug 21, 2024 • edited Loading

alamb commented Aug 21, 2024

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xinlifoobar Aug 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xinlifoobar commented Aug 22, 2024 • edited Loading

alamb left a comment • edited Loading

Choose a reason for hiding this comment

alamb commented Aug 25, 2024

Specialize Prefix/Suffix Match for `Like/ILike` between Array and Scalar for StringViewArray #6231

Specialize Prefix/Suffix Match for `Like/ILike` between Array and Scalar for StringViewArray #6231

xinlifoobar commented Aug 13, 2024 •

edited

Loading

xinlifoobar commented Aug 13, 2024 •

edited

Loading

xinlifoobar commented Aug 13, 2024 •

edited

Loading

xinlifoobar commented Aug 15, 2024 •

edited

Loading

xinlifoobar commented Aug 19, 2024 •

edited

Loading

xinlifoobar commented Aug 19, 2024 •

edited

Loading

xinlifoobar Aug 21, 2024 •

edited

Loading

xinlifoobar commented Aug 21, 2024 •

edited

Loading

xinlifoobar commented Aug 21, 2024 •

edited

Loading

xinlifoobar Aug 22, 2024 •

edited

Loading

xinlifoobar commented Aug 22, 2024 •

edited

Loading

alamb left a comment •

edited

Loading