Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache all query carousels #9807

Merged
merged 13 commits into from
Sep 20, 2024

Conversation

jimchamp
Copy link
Collaborator

@jimchamp jimchamp commented Aug 26, 2024

Closes #9058

This branch does the following:

  1. Updates the QueryCarousel macro template to, by default, request a cached copy of the carousel's markup
  2. Updates the function that generates cache keys such that the cached function's args and kwargs are hashed
  3. Adds a use_cache keyword argument to the QueryCarousel macro, which allows us to avoid caching select query carousel
  4. Updated the book page's "Related Works" carousel and the home page's "Classic Books" carousel to not be cached
  5. Updates cache key prefix to include the macro name when macros are cached

Technical

The "Classic Books" carousel is not cached, as it appears on our cached homepage. Also, both the homepage and our macros are cached for the same amount of time (five minutes).

The "Related Works" carousel is always fetched asynchronously, and therefore not cached.

Cache key protocol

Cache keys will now be in the following $ delimited form:

{key_prefix}${hashed_args}${hashed_kwargs)

If there are no kwargs, the key will not include the second $ character and the hashed kwargs:

{key_prefix}${hashed_args}
Examples:

Cache key containing hashed key word arguments:
RawQueryCarousel.en$4587276919764249206$-1969269941747307046

Cache key when no key word arguments are provided:
ia.get_metadata$6232568801773938868

Unit test changes

The test_encode_args unit test has been removed. I'm not sure how to best test the updated encode_args. The tests that were removed asserted that encode_args returned a specific expected result. Now that we are hashing the arguments, testing by string comparisons will fail if the hash algorithm is ever updated.

Testing

Once deployed, load some of our /collections pages. Note how long the initial load for each page takes. Then, refresh the pages. Expect the load time to decrease noticeably.

Screenshot

Stakeholders

@github-actions github-actions bot added the Priority: 2 Important, as time permits. [managed] label Aug 26, 2024
@jimchamp jimchamp marked this pull request as draft August 26, 2024 23:32
@jimchamp jimchamp marked this pull request as ready for review August 27, 2024 19:21
# strip [ and ] from key
a = self.json_encode(list(args))[1:-1]

if kw:
return a + "-" + self.json_encode(kw)
return f"{hash(a)}${hash(self.json_encode(kw))}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll want to use an md5 hash function here, apparently python's hash is non-deterministic between sessions.

openlibrary/core/cache.py Outdated Show resolved Hide resolved
@mekarpeles mekarpeles added Priority: 1 Do this week, receiving emails, time sensitive, . [managed] On testing.openlibrary.org This PR has been deployed to testing.openlibrary.org for testing and removed Priority: 2 Important, as time permits. [managed] labels Aug 30, 2024
Avoids caching the book page related works carousel, which is fetched
after the page loads.  Also prevents the homepage "Classic Books"
carousel from being cached, as the homepage itself is cached for the
same amount of time.
@mekarpeles mekarpeles added the Needs: Submitter Input Waiting on input from the creator of the issue/pr [managed] label Sep 16, 2024
@mekarpeles
Copy link
Member

Hey @jimchamp! These changes are looking great and I'm excited for them. I think the feedback about tightening up the keys to use as few hash digests as possible is a good one as, at the lookup scale we're operating, I think it could make a difference. Otherwise, this seems to be working great on testing and I'm looking forward to getting it merged. Thank you for leading this!

@github-actions github-actions bot removed the Needs: Submitter Input Waiting on input from the creator of the issue/pr [managed] label Sep 16, 2024
@jimchamp jimchamp removed the On testing.openlibrary.org This PR has been deployed to testing.openlibrary.org for testing label Sep 20, 2024
@mekarpeles mekarpeles added the On testing.openlibrary.org This PR has been deployed to testing.openlibrary.org for testing label Sep 20, 2024
@mekarpeles
Copy link
Member

I think this one is good to be squashed and merged, just testing now

Just so everyone has confidence and clarity with the solution, the main changes are:

  1. The cache.memcache_memoize() constructor now takes an overridable, optional parameter hash_args=False (i.e. default False)
  2. The cache.memcache_memoize's encode_args method now considers if hash_args==True and return f"{hashlib.md5(a.encode('utf-8')).hexdigest()}"
  3. render_cached_macro in utils uses cache.memcache_memoize(hash_args=True)

@mekarpeles mekarpeles merged commit d3bb158 into internetarchive:master Sep 20, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs: Testing On testing.openlibrary.org This PR has been deployed to testing.openlibrary.org for testing Priority: 1 Do this week, receiving emails, time sensitive, . [managed]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update Carousel Macros to be cacheable
3 participants