Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: ibis should release resources when an ibis.memtable op is GC'd #10044

Closed
jcrist opened this issue Sep 6, 2024 · 0 comments · Fixed by #10055
Closed

bug: ibis should release resources when an ibis.memtable op is GC'd #10044

jcrist opened this issue Sep 6, 2024 · 0 comments · Fixed by #10055
Labels
performance Issues related to ibis's performance
Milestone

Comments

@jcrist
Copy link
Member

jcrist commented Sep 6, 2024

ibis.memtable is useful for constructing small tables out of in-memory data in a backend agnostic way. When ibis executes an expression containing a memtable node, a new "table" is created in the backend with the backing data (only if it wasn't created yet), and then used like any other table when executing the query. How the "table" is created is backend dependent:

  • For most SQL backends (postgres, bigquery, ...) registering a memtable is equivalent to con.create_table(unique_name, data, temp=True). These are temporary tables that are automatically cleaned up when the session is closed.
  • Some local (e.g. duckdb) have more native ways of registering in memory data that doesn't require a copy. These are also cleaned up when the session ends.

Ibis currently will avoid duplicate registrations of the same memtable in the same backend, but we don't do anything to automatically cleanup the memtables if they go out of scope, instead we rely on them being cleaned up when the connection is closed.

In #10041 (and #10042) it was noted that this behavior is nonideal in the local case, since these backends generally keep around references to the in-memory data, even if the user has no way of accessing those tables anymore. The situation would be the same in the remote SQL backend case too, except in that case the local memory usage would stay the same and there'd just be an excessive number of temp tables created in the backend. The local backend case is definitely a bigger issue, but we might want to automatically de-register memtables on GC in all cases.

Sketch of the intended memtable behavior:

# Create a new memtable. Note that this doesn't do anything 
# in any backends yet, just creates a new expr node
t = ibis.memtable(...)

# Execute a memtable-backed expression on `con`
# This registers a new memtable in `con`
con.execute(t.select("a"))

# Execute another expression on `con` backed by the same memtable.
# Since this one was already registered, the backend just reuses the
# existing table ref.
con.execute(t.select("b"))

# Execute an expression on `con2` backed by the memtable.
# Since this is a new backend, a new memtable is registered
con2.execute(t.select("b"))

# Close `con2`, releasing resources (including those used in it by `t`)
con2.disconnect()

# Drop all references to `t`, `con` can now cleanup any resources used by `t`
del t
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Issues related to ibis's performance
Projects
Status: done
Development

Successfully merging a pull request may close this issue.

2 participants