Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorize text equality and LIKE #6189

Merged
merged 342 commits into from
Mar 28, 2024
Merged

Vectorize text equality and LIKE #6189

merged 342 commits into from
Mar 28, 2024

Conversation

@codecov
Copy link

codecov bot commented Oct 24, 2023

Codecov Report

Attention: Patch coverage is 87.22222% with 23 lines in your changes are missing coverage. Please review.

Project coverage is 80.95%. Comparing base (59f50f2) to head (c355c25).
Report is 87 commits behind head on main.

Files Patch % Lines
tsl/src/nodes/decompress_chunk/pred_text.c 76.00% 0 Missing and 12 partials ⚠️
tsl/src/nodes/decompress_chunk/compressed_batch.c 91.80% 2 Missing and 3 partials ⚠️
tsl/src/import/ts_like_match.c 96.07% 0 Missing and 2 partials ⚠️
tsl/src/nodes/decompress_chunk/planner.c 0.00% 1 Missing and 1 partial ⚠️
tsl/src/nodes/decompress_chunk/pred_vector_array.c 66.66% 0 Missing and 1 partial ⚠️
tsl/src/nodes/decompress_chunk/vector_predicates.c 90.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6189      +/-   ##
==========================================
+ Coverage   80.06%   80.95%   +0.88%     
==========================================
  Files         190      193       +3     
  Lines       37181    36664     -517     
  Branches     9450     9583     +133     
==========================================
- Hits        29770    29680      -90     
- Misses       2997     3168     +171     
+ Partials     4414     3816     -598     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@svenklemm svenklemm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -47,7 +47,7 @@ vector_array_predicate_impl(VectorPredicate *vector_const_predicate, bool is_or,
char typalign;
get_typlenbyvalalign(ARR_ELEMTYPE(arr), &typlen, &typbyval, &typalign);

const char *array_data = (const char *) ARR_DATA_PTR(arr);
const char *restrict array_data = (const char *) ARR_DATA_PTR(arr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment on which compiler optimizations we are getting from the restrict type qualifier?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just add restrict to every piece of data that is used in hot loops and lives on the heap. The general idea is to tell the compiler that they point to different chunks of memory, so that more efficient code can be generated. It's important for the data you write, otherwise technically everything has to be re-read for a new loop iteration, because it might have been modified by a write. I think it's not really important for read-only variables, but I find it simpler to stick it on everything to avoid the need of deeper analysis which can be error-prone. So this is the general idea, I think spelling this out in the code comments each time would be too verbose and not very useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not really important for read-only variables, but I find it simpler to stick it on everything to avoid the need of deeper analysis which can be error-prone.

Maybe I should use const for read-only objects and restrict for read/write.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static ArrowArray *
make_single_value_arrow(Oid pgtype, Datum datum, bool isnull)
make_single_value_arrow_arithmetic(Oid arithmetic_type, Datum datum, bool isnull)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment about the high-level difference between make_single_value_arrow_arithmetic and make_single_value_arrow_text? It might help understand the code better without comparing the details for the functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are different just because one creates arrow array for arithmetic types, and the other for text types, and the layouts of these arrays are different. I can write "this function creates arrow array for arithmetic types", but this is already clear from the name, so tell me if you have better ideas for a comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if you add a comment and just mention that the array layouts are different would be enough for the reader to know at this point.

{ \
(p)++; \
(plen)--; \
} while ((plen) > 0 && (*(p) &0xC0) == 0x80)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment on why the *(p) &0xC0) == 0x80 part is needed and what the 0x80 constant represents?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This follows the PG code as well, I'll add more details to the comment above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's just skipping the UTF8-encoded characters, and this check is checking for the UTF8 prefix. I try to avoid adding comments to the copied PG code, because it makes comparing with the original more difficult.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true. I compared it with the PG code and my function looked a bit different. So, I was not sure if you modified it or if this is upstream code from another version. If we have a comment about the PG version of the function, this should be fine.

@akuzm akuzm requested a review from jnidzwetzki March 27, 2024 11:42
@akuzm akuzm enabled auto-merge (squash) March 28, 2024 16:13
@akuzm akuzm merged commit a3d03ea into timescale:main Mar 28, 2024
42 of 45 checks passed
@akuzm akuzm deleted the bulk-text branch March 28, 2024 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants