Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spike generate popular tasks using BigQuery #3761

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 13 additions & 5 deletions app/helpers/browse_helper.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,23 @@ def display_popular_tasks_for_slug?(slug)
def popular_links_for_slug(slug)
browse_page = slug(slug)

# Try to fetch the cache first
popular_task_data = Rails.cache.read("popular_tasks_#{browse_page}_#{Date.yesterday.strftime("%Y-%m-%d")}")
# Cache keys for the specific browse page
cache_key_latest = "popular_tasks_#{browse_page}_#{Date.yesterday.strftime("%Y-%m-%d")}"
cache_key_backup = "popular_tasks_backup_#{browse_page}"

# If cache is empty fetch fresh data and cache it
# Try to fetch the latest cache first
popular_task_data = Rails.cache.read(cache_key_latest)

# If the latest cache doesn't exist, fall back to the backup cache
if popular_task_data.nil?
popular_task_data = PopularTasks.new.fetch_data("/browse/#{browse_page}")
# Falling back to backup cache
popular_task_data = Rails.cache.read(cache_key_backup)
end

return [] unless popular_task_data
# If both caches are empty, fetch fresh data and cache it
if popular_task_data.nil?
popular_task_data = PopularTasks.new.fetch_data("/browse/#{browse_page}")
end

popular_task_data
end
Expand Down
12 changes: 9 additions & 3 deletions app/services/popular_tasks.rb
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
class PopularTasks
CACHE_EXPIRATION = 24.hours # Set the cache expiration time
BACKUP_CACHE_EXPIRATION = 7.days # Backup cache can have a longer expiration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to think a bit more about how this would work.

If the bigquery data was unavailable for more than 7 days then what happens?

I can think of other ways to do it - but this feels like a problem that must have been solved many times before. i.e. Only expire the cache if fresh data is available to fill it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've caught up now... the cache will expire regardless of whether or not the API responds so I understand the need for a backup. And I like the idea of writing to the backup at the same time as you fetch the fresh data.


def initialize; end

Expand All @@ -11,10 +12,10 @@ def fetch_data(browse_page, date: Date.yesterday)
@fetch_data = client
@date = date.strftime("%Y-%m-%d")

# Define cache keys for the specific browse page
cache_key = "popular_tasks_#{browse_page}_#{@date}"
cache_key_latest = "popular_tasks_#{browse_page}_#{@date}"
cache_key_backup = "popular_tasks_backup_#{browse_page}"

Rails.cache.fetch(cache_key, expires_in: CACHE_EXPIRATION) do
Rails.cache.fetch(cache_key_latest, expires_in: CACHE_EXPIRATION) do
# If cache is empty, this block is executed
query = <<~SQL
WITH cte1 as (SELECT
Expand Down Expand Up @@ -60,6 +61,11 @@ def fetch_data(browse_page, date: Date.yesterday)
}
end
@results.sort_by { |link| link[:rank] } # Order the links by their rank

# Cache the results in the backup cache as well
Rails.cache.write(cache_key_backup, @results, expires_in: BACKUP_CACHE_EXPIRATION)

@results
end
end
end