-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: sqlite-based transcript store #3402
Conversation
- refactor: hoist concise methods
- fix start / end position assertion - lint in verbose comment - never mind checking type of item in writeStreamItem; it causes a test to fail - resolve auto-commit issue
|
||
import lmdb from 'node-lmdb'; | ||
import sqlite3 from 'better-sqlite3'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems weird to have sqlite in the lmdbSwingStore file. will file this get renamed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can take that out; it's a nod to ocap discipline where the sqliteSwingStore
caller supplies the sqlite IO capability. That would make more sense if it were passed in all the way from the host application, or at least from the controller or something
The sqliteStreamStore does its own import sqlite3
as a default, though; I can just use that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As our storage story has gotten more complicated, the LMDB store vs. simple store distinction has become increasingly weird and nonsensical. Some kind of name refactoring is clearly called for, but not today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FUDCo has proposed splitting these packages up. We use LMDB for the key-value portion of a "swing store", and flat files for the stream-store portion. This PR changes the stream-store portion to use sqlite, without changing the key-value store. A better-factored world would put each portion in its own package, and make a "swing store" parent package, or something.
I'm ok with the naming weirdness for today.
closeStream, | ||
STREAM_START, | ||
}); | ||
const streamStore = sqlStreamStore(dirPath, { sqlite3 }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably makes sense to retain the fs-based stream stuff as a separate thing rather than deleting it. Assuming we can sort out the file descriptor issues, having an option to put the transcripts into simple text files has a lot of debugging utility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I think we should remove n-readlines
from package.json
and yarn.lock
unless there's something lingering and unused that needs it. Rest seems sound.
@@ -21,6 +21,7 @@ | |||
}, | |||
"dependencies": { | |||
"@agoric/assert": "^0.3.2", | |||
"better-sqlite3": "^7.4.1", | |||
"n-readlines": "^1.0.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we remove n-readlines now?
@@ -370,10 +176,6 @@ function makeLMDBSwingStore(dirPath, forceReset, options) { | |||
txn.commit(); | |||
txn = null; | |||
} | |||
for (const fd of activeStreamFds) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commit()
needs to commit the sqlite txn too, unless you're committing on every write (which would be ok).
*/ | ||
export function sqlStreamStore(dbDir, io) { | ||
const { sqlite3 = sqlite3ambient } = io || {}; | ||
const filePath = `${dbDir}/streams.db`; // ISSUE: no path.join? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's name this .sqlite
or .sqlite3
position integer, | ||
item text, | ||
primary key (streamName, position) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How hard would it be to add a version
table and insert a 1
into it? Or, how hard is it to (later) sense the lack of a version
table, so we can treat that as version 0? Every time I've used sqlite I've had the version ID in there from the very beginning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to omit this for now, hoping that "sense the lack of a version
table" is easy.
insistStreamName(streamName); | ||
// TODO: enforce exclusive access? DB transactions are isolated. | ||
// assert( | ||
// !streamStatus.get(streamName), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll channel @FUDCo , let's restore this safety check, to prevent concurrent reads/writes/other-reads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Now that I'm looking at this commented out code I'm a little confused by what's happening when this code is commented out but some of the other interlock code is not.
values (?, ?, ?) | ||
on conflict(streamName, position) do update set item = ? | ||
`, | ||
).run(streamName, item, position.itemCount, item); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this do an auto-commit? that works.
` | ||
insert into streamItem (streamName, item, position) | ||
values (?, ?, ?) | ||
on conflict(streamName, position) do update set item = ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I correct in understanding that the on conflict...
verbiage handles the case where we're writing over the top of previously committed but abandoned transcript entries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd really like to retain the text-file based stuff somehow as a live option (rather than just as a historical artifact), but other than that this looks clean. And we should create an issue for doing some name refactoring.
}; | ||
|
||
const commit = () => { | ||
// We use the sqlite3 auto-commit API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What governs (or not) the use of auto-commit? Using it makes sense to me, but what tells the world that we actually are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found https://sqlite.org/lang_transaction.html that basically says you get auto-commit unless you explicitly create a transaction object first.
Also, the changes to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, though I'm sad to lose the text file stuff. But we can deal with that later if it becomes an issue.
DRAFT TODOs:
yarn add better-sqlite3
better-sqlite3 provides synchronous sqlite3 bindings; they seem to be pretty mature: