-
Notifications
You must be signed in to change notification settings - Fork 210
Conversation
…ning/closing and adding caching,
…F generation it does not provide an error response, https://github.com/OpenClinica/enketo-express-oc/issues/688
args, | ||
userDataDir, | ||
}); | ||
this.browser.on('disconnected', launchBrowser); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
relaunch if the browser is shut down for any reason
.goto(urlObj.href, { waitUntil: 'networkidle0', timeout }) | ||
.catch((e) => { | ||
e.status = /timeout/i.test(e.message) ? 408 : 400; | ||
throw e; | ||
}); | ||
|
||
// Either a 401 error is thrown or goto succeeds (or encounters a real loading error) | ||
await Promise.race([detect401, goToPage]); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you can see, it was a challenge to properly catch a 401 response... I would certainly be impressed if y'all can point to a better solution :).
|
||
/** | ||
* This class approach makes it easy to open multiple browser instances with | ||
* different arguments in case that is ever required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, in OpenClinica's fork they actually do this with custom 'headless' API endpoints that serve to import records, run validation on them, and add comments to questions with errors. Those endpoints use a different headless browser config to optimize performance for that purpose (without stylesheets etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a first pass reviewing the functionality and implementation, I have a couple pretty serious concerns about this introducing denial of service risks.
-
The failure/retry logic is unconditional. In pathological failure cases, this would result in an infinite loop/recursion, only briefly yielding for however long launch failure takes. At best, this scenario would result in severe service degradation.
-
All browsers, and particularly Chromium, are notorious for consuming huge amounts of resources (especially memory) over time for long running processes. This is true even across multiple tabs (i.e. "page" in Puppeteer's terms), as multiple same-origin requests are generally serviced by a shared process. This is somewhat mitigated by the PDF logic calling
page.close
, making it more an attack risk than self-DoS, but it's a real concern.
I have a few smaller concerns about implementation details of the change, as well as the existing implementation it augments. But all of these concerns also prompted me to consider the functionality overall—how best to serve the general use case, as well as extensibility for e.g. OpenClinica. My gut instinct:
-
There is a general use case for PDF generation at a user level, but it may be better served as a client side feature without the machinery of a server orchestrated headless browser doing (roughly) the same thing a user's browser can do. There would be some work there, but I think it's worth considering.
-
To my understanding, OpenClinica has a more specific use case of batch/automated PDF generation. This may be better served by a headless browser, but I'm not sure it necessarily needs to be handled in the mainline Enketo project[s]. To the extent there's a need to extend the Enketo Express server to address this use case, the underlying express server library is already well suited for extensibility—i.e., OpenClinica or anyone else can add any route/request handlers they wish, to supplement or extend existing server functionality essentially arbitrarily.
I think it's best for now to hold off on this change, so we can put some more strategic thought into how we address both of these use cases in the longer term.
Thanks a lot for your review! I think there is indeed a risk with the automatic browser respawning (and it doesn't actually respond to a real issue that was reported). We may just remove that in the OpenClinica fork as well, so thanks for that! If you're interested in a PR without that feature, just let me know. I think the other changes will reduce the vulnerability to DOS slightly as it won't spawn a browser for each API request (compared to the current code). It will also make it a lot easier to keep sending PRs for PDF bug fixes (and keep the fork up to date). There are currently a couple of PDF fixes to offer including the one in this PR.
I don't think bringing non-batch PDF creation to the client is feasible though, unless you're okay with requiring users to load the form and use the browser's print menu manually. However, if you see no need for users to generate a PDF of an empty form or record with a click of a button in Central/etc, that may be fine. In that case I imagine you may want to consider removing the PDF API endpoints. I'm curious if you have some clever idea that 'some work' to generate a PDF on the client refers to, as I may of course be missing something! |
Offering some improvements that were made to PDF generation in OpenClinica's fork. PDF (record) generation is something their users use extensively.
I have verified this PR works with
How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?