Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are user-land callback queues in *uninstrumented* modules the responsibility of the APM agent? #2498

Closed
trentm opened this issue Dec 4, 2021 · 6 comments · Fixed by #2505
Assignees
Labels
8.1-candidate agent-nodejs Make available for APM Agents project planning. context propagation

Comments

@trentm
Copy link
Member

trentm commented Dec 4, 2021

(This stems from #2487 (comment) and the set of changes to instrumentations to change/fix some run-context handling.)

background

Many node.js packages, especially clients for databases, support internal queuing: The user-code makes one or more client requests, e.g.:

mysqlClient = new Client(...)
mysqlClient.connect()
// ...
mysqlClient.query('select foo', function onFoo (err, results) { ... })
mysqlClient.query('select bar', function onBar (err, results) { ... })

and the client internally queues those up, gets a connection to the database, dispatches the queued commands (serially or in parallel), and calls the given callbacks when results are in. Thomas dubbed these "user-land callback queues".

One of the jobs of the Node.js APM agent is to track asynchronous context, so that when the result of some action (say, reading a file via fs.readFile()) is ready sometime later, the user code that receives the results have a "current run context" the follows from the when the action was inititiated:

// If APM Transaction "t0" is current here...
fs.readFile('myfile.txt', function (err, results) {
    // ... then APM Transaction "t0" should be current here too.
})

The user-land callback queues mentioned above can break expectations. Using the MySQL example above, if the package creates its queue dispatcher when .connect()'ing, then the callbacks onFoo and onBar will be in the context of that .connect() call.

Using this "play-mysql2-run-context.js" as a demo:

const apm = require('./').start({ // elastic-apm-node
  captureSpanStackTraces: false,
  serviceName: 'play-mysql2-run-context',
  // disableInstrumentations: 'mysql2,mysql',
  // asyncHooks: false
})

var t0, t1, t2

// const mysql = require('mysql')
const mysql = require('mysql2')

t0 = apm.startTransaction('t0-connect')
const conn = mysql.createConnection({ user: 'root' })

t1 = apm.startTransaction('t1-first-select')
conn.query('SELECT 1 + 1 AS solution', function (err, results) {
  console.log('SELECT 1+1: err=%s results=%o %s', err, results, apm._instrumentation._runCtxMgr)
  var s = apm.startSpan('s1'); if (s) s.end() // manual span
})

t2 = apm.startTransaction('t2-second-select')
conn.query('SELECT 2 + 2 AS solution', function (err, results) {
  console.log('SELECT 2+2: err=%s results=%o %s', err, results, apm._instrumentation._runCtxMgr)
  var s = apm.startSpan('s2'); if (s) s.end() // manual span
})

// Lazily shutdown after everything above is finished.
setTimeout(() => {
  console.log('Done')
  t2.end()
  t1.end()
  t0.end()
  conn.end()
}, 1000)

and running with the APM agent disabled, the manual spans s1 and s2 end up on the transaction t0 that was active at the time the client connected, rather than the transactions active when the query calls were made:

transaction "t0-connect"
`- span "s1"
`- span "s2"
transaction "t1-first-select"
transaction "t2-second-select"

Normally the APM agent handles this when it instruments the module by "binding" the context to the callback, resulting in:

transaction "t0-connect"
transaction "t1-first-select"
`- span "SELECT"
`- span "s1"
transaction "t2-second-select"
`- span "SELECT"
`- span "s2"

the question

Now what should the APM agent do if the user disabled instrumentation for this mysql module?

disableInstrumentations: 'mysql2,mysql'

Should disableInstrumentations mean:

  1. don't generate spans/errors for this module; or
  2. don't monkey-patch or touch this module at all because I know or suspect the instrumentation is getting in my way (bugs, perf, lack-of-trust, binary-search debugging)?

The equivalent in OTel (not explicitly setting up a particular opentelemetry-plugin-<module>) is (2).

current state

Why this is coming up now, is that the current Elastic Node.js APM agent is doing user-land callback queue handling for some modules even when that module is listed in disableInstrumentations, but not for others. For example it does (or did) so for the mysql, pg (I think), ioredis, redis, and mysql2 packages (sometimes buggily); but not for express-queue even though its instrumentation was explicitly added for user-land callback queue handling in #373. I say "or did" because as part of #2430 work I had (somewhat unwittingly) been removing this handling because I did not fully understand it. There was a mention of "continuation patches" in the PR that added disableInstrumentations support, but no details. There are also no tests for these cases so I didn't notice breakage.

Hence this issue to discuss it and address it consistently.

At time of writing it has been removed for mysql, pg, ioredis, and redis; and PR #2487 is open which removes it for mysql2.

@trentm trentm self-assigned this Dec 4, 2021
@github-actions github-actions bot added the agent-nodejs Make available for APM Agents project planning. label Dec 4, 2021
@trentm trentm changed the title Are user-land callback queues in *uninstrumented* modules to responsibility of the APM agent? Are user-land callback queues in *uninstrumented* modules the responsibility of the APM agent? Dec 4, 2021
@AlexanderWert
Copy link
Member

My expectation would be option (2) as well. And this is also how for example the Java agent deals with certain instrumentation modules. For instance, if in Java a user disables Vert.x instrumentation this also breaks the context propagation, not only stops capturing spans.

@felixbarny
Copy link
Member

+1 on option 2

@felixbarny
Copy link
Member

If an important use-case for disabling just the mysql span creation but enabling the context propagation, we could think about splitting the instrumentation into multiple parts.

  • disable_instrumentations=mysql disables both context propagation and span creation.
  • disable_instrumentations=mysql-spans disables span creation, context propagation is enabled
  • disable_instrumentations=mysql-context disables context propagation, span creation is enabled (not really useful though)
  • disable_instrumentations=mysql-spans,mysql-context is equivalent to disable_instrumentations=mysql

@trentm
Copy link
Member Author

trentm commented Dec 7, 2021

or disabling just the mysql span creation but enabling the context propagation

For interest, Thomas thought out loud about something similar (adding more structure to disable_instrumentations) for the issue that led to the Node.js APM agent's instrumentIncomingHTTPRequests config var: #434

As an alternative we could consider if the config option instead should focus on span types, so instead of disabling pg, you would disable db.postgresql (the full type is db.postgresql.query, but I guess we can just match against the start of the string, which also means that we could write db to disable all db spans).


As a workaround for a user that was relying on "no mysql span creation, but mysql run context propagation", they could use a span filter to filter out the created MySQL spans after the fact. That works, but still implies the overhead of that span creation.

@trentm
Copy link
Member Author

trentm commented Dec 7, 2021

[Alan (on private chat)]

no strong opinions

@trentm
Copy link
Member Author

trentm commented Dec 7, 2021

To re-state:

Q: Are user-land callback queues in uninstrumented modules the responsibility of the APM agent?
A: No.

I will add a note to the changelog for the 3.26.0 release about this and resolve this issue.

trentm added a commit that referenced this issue Dec 7, 2021
Fixes: #2498
@trentm trentm mentioned this issue Dec 7, 2021
3 tasks
trentm added a commit that referenced this issue Dec 7, 2021
trentm added a commit that referenced this issue Jan 31, 2022
When disableInstrumentations support was added in #353 it only
half-disabled these instrumentations, briefly mentioning "continuation
patches". I believe this was about user-land callback queue handling
that should no longer be necessary after run-context and #2430 work.

In #2498 it was discussed and decided that if listed in
disableInstrumentations the agent should not touch the module at all.

Refs: #2498
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.1-candidate agent-nodejs Make available for APM Agents project planning. context propagation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants