Cache all query carousels #9807

jimchamp · 2024-08-26T22:25:57Z

This branch does the following:

Updates the QueryCarousel macro template to, by default, request a cached copy of the carousel's markup
Updates the function that generates cache keys such that the cached function's args and kwargs are hashed
Adds a use_cache keyword argument to the QueryCarousel macro, which allows us to avoid caching select query carousel
Updated the book page's "Related Works" carousel and the home page's "Classic Books" carousel to not be cached
Updates cache key prefix to include the macro name when macros are cached

Technical

The "Classic Books" carousel is not cached, as it appears on our cached homepage. Also, both the homepage and our macros are cached for the same amount of time (five minutes).

The "Related Works" carousel is always fetched asynchronously, and therefore not cached.

Cache key protocol

Cache keys will now be in the following $ delimited form:

{key_prefix}${hashed_args}${hashed_kwargs)

If there are no kwargs, the key will not include the second $ character and the hashed kwargs:

{key_prefix}${hashed_args}

Examples:

Cache key containing hashed key word arguments:
RawQueryCarousel.en$4587276919764249206$-1969269941747307046

Cache key when no key word arguments are provided:
ia.get_metadata$6232568801773938868

Unit test changes

The test_encode_args unit test has been removed. I'm not sure how to best test the updated encode_args. The tests that were removed asserted that encode_args returned a specific expected result. Now that we are hashing the arguments, testing by string comparisons will fail if the hash algorithm is ever updated.

Testing

Once deployed, load some of our /collections pages. Note how long the initial load for each page takes. Then, refresh the pages. Expect the load time to decrease noticeably.

Screenshot

Stakeholders

cdrini · 2024-08-27T23:07:56Z

openlibrary/core/cache.py

        # strip [ and ] from key
        a = self.json_encode(list(args))[1:-1]

        if kw:
-            return a + "-" + self.json_encode(kw)
+            return f"{hash(a)}${hash(self.json_encode(kw))}"


I think we'll want to use an md5 hash function here, apparently python's hash is non-deterministic between sessions.

openlibrary/core/cache.py

Avoids caching the book page related works carousel, which is fetched after the page loads. Also prevents the homepage "Classic Books" carousel from being cached, as the homepage itself is cached for the same amount of time.

mekarpeles · 2024-09-16T14:06:30Z

Hey @jimchamp! These changes are looking great and I'm excited for them. I think the feedback about tightening up the keys to use as few hash digests as possible is a good one as, at the lookup scale we're operating, I think it could make a difference. Otherwise, this seems to be working great on testing and I'm looking forward to getting it merged. Thank you for leading this!

for more information, see https://pre-commit.ci

mekarpeles · 2024-09-20T16:14:18Z

I think this one is good to be squashed and merged, just testing now

Just so everyone has confidence and clarity with the solution, the main changes are:

The cache.memcache_memoize() constructor now takes an overridable, optional parameter hash_args=False (i.e. default False)
The cache.memcache_memoize's encode_args method now considers if hash_args==True and return f"{hashlib.md5(a.encode('utf-8')).hexdigest()}"
render_cached_macro in utils uses cache.memcache_memoize(hash_args=True)

jimchamp added the Needs: Testing label Aug 26, 2024

github-actions bot assigned mekarpeles Aug 26, 2024

github-actions bot added the Priority: 2 Important, as time permits. [managed] label Aug 26, 2024

jimchamp marked this pull request as draft August 26, 2024 23:32

jimchamp marked this pull request as ready for review August 27, 2024 19:21

jimchamp force-pushed the cache-all-query-carousels branch from e95b08e to 551f73b Compare August 27, 2024 19:30

cdrini reviewed Aug 27, 2024

View reviewed changes

jimchamp force-pushed the cache-all-query-carousels branch from 8ef616b to d542bb0 Compare August 28, 2024 01:38

cdrini reviewed Aug 28, 2024

View reviewed changes

openlibrary/core/cache.py Outdated Show resolved Hide resolved

mekarpeles added Priority: 1 Do this week, receiving emails, time sensitive, . [managed] On testing.openlibrary.org This PR has been deployed to testing.openlibrary.org for testing and removed Priority: 2 Important, as time permits. [managed] labels Aug 30, 2024

jimchamp added 7 commits September 5, 2024 17:04

Cache all QueryCarousels

62e687a

Shorten cache key prefix

8ed32a9

Hash args and kwargs for cache keys

6db4a6e

Include name of cached macro in cache key

5874d10

Prevent caching select carousels

fe4ae83

Avoids caching the book page related works carousel, which is fetched after the page loads. Also prevents the homepage "Classic Books" carousel from being cached, as the homepage itself is cached for the same amount of time.

Remove encode_args unit test

84f34ed

Use MD5 to generate hash

d014b04

mekarpeles added the Needs: Submitter Input Waiting on input from the creator of the issue/pr [managed] label Sep 16, 2024

Hash args and kwargs in a single operation

7c09923

jimchamp force-pushed the cache-all-query-carousels branch from 436d8da to 7c09923 Compare September 16, 2024 17:42

github-actions bot removed the Needs: Submitter Input Waiting on input from the creator of the issue/pr [managed] label Sep 16, 2024

pre-commit-ci bot and others added 5 commits September 16, 2024 17:43

[pre-commit.ci] auto fixes from pre-commit.com hooks

de6cf9d

for more information, see https://pre-commit.ci

update cache hash_args to be optional

868b0b7

use new hash_args in memcache_memoize

a7c7629

Use correct type hint

4bcd103

Revert cache unit tests

32c5e55

jimchamp removed the On testing.openlibrary.org This PR has been deployed to testing.openlibrary.org for testing label Sep 20, 2024

mekarpeles added the On testing.openlibrary.org This PR has been deployed to testing.openlibrary.org for testing label Sep 20, 2024

mekarpeles merged commit d3bb158 into internetarchive:master Sep 20, 2024
3 checks passed

cdrini mentioned this pull request Sep 24, 2024

Lazily make carousel queries when a patron encounters a carousel #5728

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache all query carousels #9807

Cache all query carousels #9807

jimchamp commented Aug 26, 2024 •

edited

Loading

cdrini Aug 27, 2024

mekarpeles commented Sep 16, 2024

mekarpeles commented Sep 20, 2024

Cache all query carousels #9807

Cache all query carousels #9807

Conversation

jimchamp commented Aug 26, 2024 • edited Loading

Technical

Cache key protocol

Examples:

Unit test changes

Testing

Screenshot

Stakeholders

cdrini Aug 27, 2024

Choose a reason for hiding this comment

mekarpeles commented Sep 16, 2024

mekarpeles commented Sep 20, 2024

jimchamp commented Aug 26, 2024 •

edited

Loading