Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: sqlite-based transcript store #3402

Merged
merged 13 commits into from
Jun 24, 2021
Merged

feat: sqlite-based transcript store #3402

merged 13 commits into from
Jun 24, 2021

Conversation

dckc
Copy link
Member

@dckc dckc commented Jun 23, 2021

DRAFT TODOs:

  • finish copying some function docstrings over
  • make sure all SDK tests pass

yarn add better-sqlite3

better-sqlite3 provides synchronous sqlite3 bindings; they seem to be pretty mature:

  • "With proper indexing, we've been able to achieve upward of 2000 queries per second with 5-way-joins in a 60 GB database, where each query was handling 5–50 kilobytes of real data."
  • 150K+ downloads/week; 452 Dependents
  • last published 18 days ago; 83 versions

@dckc dckc requested review from warner and FUDCo June 23, 2021 22:59

import lmdb from 'node-lmdb';
import sqlite3 from 'better-sqlite3';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems weird to have sqlite in the lmdbSwingStore file. will file this get renamed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can take that out; it's a nod to ocap discipline where the sqliteSwingStore caller supplies the sqlite IO capability. That would make more sense if it were passed in all the way from the host application, or at least from the controller or something

The sqliteStreamStore does its own import sqlite3 as a default, though; I can just use that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As our storage story has gotten more complicated, the LMDB store vs. simple store distinction has become increasingly weird and nonsensical. Some kind of name refactoring is clearly called for, but not today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FUDCo has proposed splitting these packages up. We use LMDB for the key-value portion of a "swing store", and flat files for the stream-store portion. This PR changes the stream-store portion to use sqlite, without changing the key-value store. A better-factored world would put each portion in its own package, and make a "swing store" parent package, or something.

I'm ok with the naming weirdness for today.

closeStream,
STREAM_START,
});
const streamStore = sqlStreamStore(dirPath, { sqlite3 });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably makes sense to retain the fs-based stream stuff as a separate thing rather than deleting it. Assuming we can sort out the file descriptor issues, having an option to put the transcripts into simple text files has a lot of debugging utility.

Copy link
Member

@warner warner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I think we should remove n-readlines from package.json and yarn.lock unless there's something lingering and unused that needs it. Rest seems sound.

@@ -21,6 +21,7 @@
},
"dependencies": {
"@agoric/assert": "^0.3.2",
"better-sqlite3": "^7.4.1",
"n-readlines": "^1.0.0",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we remove n-readlines now?

@@ -370,10 +176,6 @@ function makeLMDBSwingStore(dirPath, forceReset, options) {
txn.commit();
txn = null;
}
for (const fd of activeStreamFds) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit() needs to commit the sqlite txn too, unless you're committing on every write (which would be ok).

*/
export function sqlStreamStore(dbDir, io) {
const { sqlite3 = sqlite3ambient } = io || {};
const filePath = `${dbDir}/streams.db`; // ISSUE: no path.join?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's name this .sqlite or .sqlite3

position integer,
item text,
primary key (streamName, position)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How hard would it be to add a version table and insert a 1 into it? Or, how hard is it to (later) sense the lack of a version table, so we can treat that as version 0? Every time I've used sqlite I've had the version ID in there from the very beginning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to omit this for now, hoping that "sense the lack of a version table" is easy.

insistStreamName(streamName);
// TODO: enforce exclusive access? DB transactions are isolated.
// assert(
// !streamStatus.get(streamName),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll channel @FUDCo , let's restore this safety check, to prevent concurrent reads/writes/other-reads.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Now that I'm looking at this commented out code I'm a little confused by what's happening when this code is commented out but some of the other interlock code is not.

values (?, ?, ?)
on conflict(streamName, position) do update set item = ?
`,
).run(streamName, item, position.itemCount, item);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this do an auto-commit? that works.

`
insert into streamItem (streamName, item, position)
values (?, ?, ?)
on conflict(streamName, position) do update set item = ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct in understanding that the on conflict... verbiage handles the case where we're writing over the top of previously committed but abandoned transcript entries?

Copy link
Contributor

@FUDCo FUDCo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really like to retain the text-file based stuff somehow as a live option (rather than just as a historical artifact), but other than that this looks clean. And we should create an issue for doing some name refactoring.

};

const commit = () => {
// We use the sqlite3 auto-commit API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What governs (or not) the use of auto-commit? Using it makes sense to me, but what tells the world that we actually are?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found https://sqlite.org/lang_transaction.html that basically says you get auto-commit unless you explicitly create a transaction object first.

@FUDCo
Copy link
Contributor

FUDCo commented Jun 24, 2021

Also, the changes to yarn.lock look scary. But I never trust changes to yarn.lock so...

@warner warner requested a review from FUDCo June 24, 2021 01:09
@warner warner self-assigned this Jun 24, 2021
@warner warner added the SwingSet package: SwingSet label Jun 24, 2021
@warner warner added this to the Testnet: Stress Test Phase milestone Jun 24, 2021
@warner warner marked this pull request as ready for review June 24, 2021 01:15
Copy link
Contributor

@FUDCo FUDCo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though I'm sad to lose the text file stuff. But we can deal with that later if it becomes an issue.

@warner warner merged commit 960b013 into master Jun 24, 2021
@warner warner deleted the sqlite-stream-spike branch June 24, 2021 01:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SwingSet package: SwingSet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants