Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backend usage metrics (offline entities) for 2024.2 #1198

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 71 additions & 3 deletions lib/model/query/analytics.js
Original file line number Diff line number Diff line change
Expand Up @@ -394,7 +394,6 @@ group by f."projectId"`);


// Datasets
/* eslint-disable no-tabs */
const getDatasets = () => ({ all }) => all(sql`
SELECT
ds.id, ds."projectId", COUNT(DISTINCT p.id) num_properties, COUNT(DISTINCT e.id) num_entities_total,
Expand Down Expand Up @@ -462,7 +461,6 @@ FROM datasets ds
WHERE ds."publishedAt" IS NOT NULL
GROUP BY ds.id, ds."projectId"
`);
/* eslint-enable no-tabs */

const getDatasetEvents = () => ({ all }) => all(sql`
SELECT
Expand All @@ -477,6 +475,72 @@ WHERE audits.action = 'entity.bulk.create'
GROUP BY ds.id, ds."projectId"
`);


// Offline entities

// Number of offline branches involving more than one update
// Updates from offline-enabled Collect will include branchId so it is not enough
// to count that but we can look at trunkVersion and branchBaseVersion to find
// versions that had a true (multi-step) offlne operation.
const countOfflineBranches = () => ({ oneFirst }) => oneFirst(sql`
SELECT COUNT(DISTINCT "branchId")
FROM entity_defs
WHERE "branchId" IS NOT NULL AND ("trunkVersion" IS NULL OR "branchBaseVersion" > "trunkVersion")
`);

// Look up offline branches that have another branchId
// interrupting them, E.g. abc, abc, xyz, abc
ktuite marked this conversation as resolved.
Show resolved Hide resolved
const countInterruptedBranches = () => ({ oneFirst }) => oneFirst(sql`
WITH sortedRows AS (
SELECT
"entityId",
"version",
"branchId",
LAG("branchId") OVER (PARTITION BY "entityId" ORDER BY "version") AS prevBranchId
FROM entity_defs
),
distinctRuns AS (
SELECT
"entityId",
"branchId"
FROM sortedRows
WHERE "branchId" != prevBranchId OR prevBranchId IS NULL -- Keep first row and changes
ktuite marked this conversation as resolved.
Show resolved Hide resolved
),
duplicateRuns AS (
SELECT
"entityId",
"branchId",
COUNT(*) AS runCount
FROM distinctRuns
GROUP BY "entityId", "branchId"
ktuite marked this conversation as resolved.
Show resolved Hide resolved
HAVING COUNT(*) > 1 -- Selects branchIds that occur more than once
)
SELECT COUNT(*)
FROM duplicateRuns;
`);

// Number of submissions temporarily held in backlog but were automatically
// removed from backlog when preceeding submission came in
const countSubmissionReprocess = () => ({ oneFirst }) => oneFirst(sql`
SELECT COUNT(*)
FROM audits
WHERE "action" = 'submission.reprocess'
`);

// Measure how much time entities whose source is a submission.create
// event take to process to look for a processing lag. We look at the
// submission create loggedAt timestamp (when sub was created) to when
// the event was processed, which will be after the entity version was
// created.
const measureEntityProcessingTime = () => ({ one }) => one(sql`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this exclude offline updates that are force-processed from the backlog? I think those would skew the metrics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is another place where getodk/central#720 would be helpful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the discussion below, I think this isn't an issue after all. Even if an offline update ends up in the backlog, that's different from how quickly the submission.create event is processed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the same reason, it'd be great if we could exclude submission.create events that were processed when the approvalRequired flag was toggled, as we expect a delay in that case. I'm not really sure how to identify that though. 🤔 If it helped, I think it'd be OK to exclude entity creations (entity defs whose version number is 1), since those are the only entity defs that the approvalRequired flag will affect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query is kinda indirect in a way that I think is convenient for the metric we are getting at.

It's looking at the timestamps on the submission.create audit event which only gets updated by the jobs worker, the one that runs the first time the event comes in. The other things that process submission events to make entities (like the batch processing when toggling the approvalRequired flag and batch-processing the backlog) use the event to look up information, but they don't update anything about the underlying event.

So I think it's safe to to look at this time difference like this. It's not every submission create event, it's only the ones that ultimately turn into entity versions. It doesn't capture how long it takes overall for a submission to become an entity (potentially factoring in these other sources of delay), but it does capture how long a submission would be sitting around waiting for any initial entity processing (even if the result of that was it getting skipped to be reexamined later).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe what you were expecting was a more explicit (entity def creation date - source submission def creation date). I could add a query for that, too, that could include these other ways in which entity processing can get delayed. Then we could see if there's a noticeable difference... which would mean these delay-related mechanism are activated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking at the timestamps on the submission.create audit event which only gets updated by the jobs worker, the one that runs the first time the event comes in.

Ah that's interesting, I hadn't put that together. This sounds like a good way to approach the metric then. 👍 I don't feel like we need anything more explicit: it makes sense to me to start with this.

SELECT
MAX("processed"-"loggedAt") as max_wait,
AVG("processed"-"loggedAt") as avg_wait
FROM entity_def_sources
JOIN audits ON audits.id = "auditId"
WHERE action = 'submission.create'
`);

// Other
const getProjectsWithDescriptions = () => ({ all }) => all(sql`
select id as "projectId", length(trim(description)) as description_length from projects where coalesce(trim(description),'')!=''`);
Expand Down Expand Up @@ -750,5 +814,9 @@ module.exports = {
projectMetrics,
getLatestAudit,
getDatasets,
getDatasetEvents
getDatasetEvents,
countOfflineBranches,
countInterruptedBranches,
countSubmissionReprocess,
measureEntityProcessingTime
};
Loading