Skip to content

Refactor: Normalize Database Schema with Libraries Table (fixes #86) #95

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 14 additions & 28 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -490,41 +490,27 @@ This hierarchy ensures:

## Testing Conventions

This section outlines conventions and best practices for writing tests within this project.
Our testing philosophy emphasizes verifying public contracts and ensuring tests are robust and maintainable.

### Mocking with Vitest
### 1. Test Public API Behavior

When mocking modules or functions using `vitest`, it's crucial to follow a specific order due to how `vi.mock` hoisting works. `vi.mock` calls are moved to the top of the file before any imports. This means you cannot define helper functions _before_ `vi.mock` and then use them _within_ the mock setup directly.
- Focus tests on public methods: verify correct outputs or observable side effects for given inputs.
- Avoid assertions on internal implementation details (private methods, internal state). Tests should remain resilient to refactoring.

To correctly mock dependencies, follow these steps:
### 2. Mocking Principles

1. **Declare the Mock:** Call `vi.mock('./path/to/module-to-mock')` at the top of your test file, before any imports or other code.
2. **Define Mock Implementations:** _After_ the `vi.mock` call, define any helper functions, variables, or mock implementations you'll need.
3. **Import the Actual Module:** Import the specific functions or classes you intend to mock from the original module.
4. **Apply the Mock:** Use the defined mock implementations to replace the behavior of the imported functions/classes. You might need to cast the imported item as a `Mock` type (`import { type Mock } from 'vitest'`).
- Mock only true external dependencies (e.g., databases, external APIs, file system).
- When mocking modules with Vitest, be aware that `vi.mock` is hoisted. Define top-level mock functions/objects before the `vi.mock` call, and assign them within the mock factory.
- Set default behaviors for mocks globally; override them locally in tests or describe blocks as needed.
- Use shared spies for repeated calls (e.g., a database statement's `.all()`), and reset them in `beforeEach`.

**Example Structure:**
### 3. Test Structure & Assertions

```typescript
import { vi, type Mock } from "vitest";
- Organize related tests with `describe`, and use `beforeEach`/`afterEach` for setup and cleanup.
- Assert expected return values and observable side effects. Use `expect(...).resolves/.rejects` for async code.
- Only assert direct calls to mocks if that interaction is a key part of the contract being tested.

// 1. Declare the mock (hoisted to top)
vi.mock("./dependency");

// 2. Define mock function/variable *after* vi.mock
const mockImplementation = vi.fn(() => "mocked result");

// 3. Import the actual function/class *after* defining mocks
import { functionToMock } from "./dependency";

// 4. Apply the mock implementation
(functionToMock as Mock).mockImplementation(mockImplementation);

// ... rest of your test code using the mocked functionToMock ...
// expect(functionToMock()).toBe('mocked result');
```

This structure ensures that mocks are set up correctly before the modules that depend on them are imported and used in your tests.
These guidelines help ensure tests are clear, maintainable, and focused on the system's observable behavior.

## Releasing

Expand Down
29 changes: 29 additions & 0 deletions db/migrations/002-normalize-library-table.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
-- Migration: Normalize schema by introducing libraries table and linking documents

-- 1. Create libraries table
CREATE TABLE IF NOT EXISTS libraries (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL UNIQUE,
initial_url TEXT NULL,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
last_updated_at DATETIME NULL,
last_scrape_duration_ms INTEGER NULL
);

-- 2. Add library_id to documents
ALTER TABLE documents ADD COLUMN library_id INTEGER REFERENCES libraries(id);

-- 3. Populate libraries table from existing documents
INSERT OR IGNORE INTO libraries (name)
SELECT DISTINCT library FROM documents;

-- 4. Update documents.library_id based on libraries table
UPDATE documents
SET library_id = (
SELECT id FROM libraries WHERE libraries.name = documents.library
);

-- 5. Add index on documents.library_id
CREATE INDEX IF NOT EXISTS idx_documents_library_id ON documents(library_id);

-- Note: Handling of documents_vec and FTS triggers will be addressed in a follow-up migration.
163 changes: 127 additions & 36 deletions src/store/DocumentStore.test.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,13 @@
import { type Mock, afterAll, beforeEach, describe, expect, it, vi } from "vitest";
import {
type Mock,
afterAll,
afterEach,
beforeEach,
describe,
expect,
it,
vi,
} from "vitest";
import { VECTOR_DIMENSION } from "./types";

// --- Mocking Setup ---
Expand All @@ -16,17 +25,19 @@ import { createEmbeddingModel } from "./embeddings/EmbeddingFactory";
embedDocuments: vi.fn(),
});

// Mock better-sqlite3
/**
* Initial generic mocks for better-sqlite3.
* Will be replaced with dynamic mocks after vi.mock due to hoisting.
*/
const mockStatementAll = vi.fn().mockReturnValue([]);
// Ensure the mock statement object covers methods used by *all* statements prepared in DocumentStore
const mockStatement = {
all: mockStatementAll,
run: vi.fn().mockReturnValue({ changes: 0, lastInsertRowid: 1 }), // Mock run for insert/delete
get: vi.fn().mockReturnValue(undefined), // Mock get for getById/checkExists etc.
run: vi.fn().mockReturnValue({ changes: 0, lastInsertRowid: 1 }),
get: vi.fn().mockReturnValue(undefined),
};
const mockPrepare = vi.fn().mockReturnValue(mockStatement);
let mockPrepare = vi.fn().mockReturnValue(mockStatement);
const mockDb = {
prepare: mockPrepare,
prepare: (...args: unknown[]) => mockPrepare(...args),
exec: vi.fn(),
transaction: vi.fn(
(fn) =>
Expand All @@ -36,9 +47,20 @@ const mockDb = {
close: vi.fn(),
};
vi.mock("better-sqlite3", () => ({
default: vi.fn(() => mockDb), // Mock the default export (constructor)
default: vi.fn(() => mockDb),
}));

/**
* Simplified mockPrepare: always returns a generic statement object.
* Test-specific SQL overrides are set up in each test/describe as needed.
*/
mockPrepare = vi.fn(() => ({
get: vi.fn().mockReturnValue(undefined),
run: vi.fn().mockReturnValue({ changes: 1, lastInsertRowid: 1 }),
all: mockStatementAll,
}));
mockDb.prepare = (...args: unknown[]) => mockPrepare(...args);

// Mock sqlite-vec
vi.mock("sqlite-vec", () => ({
load: vi.fn(),
Expand All @@ -59,14 +81,14 @@ describe("DocumentStore", () => {

beforeEach(async () => {
vi.clearAllMocks(); // Clear call history etc.
mockStatementAll.mockClear();
mockStatementAll.mockReturnValue([]);

// Reset the mock factory implementation for this test run
(createEmbeddingModel as ReturnType<typeof vi.fn>).mockReturnValue({
embedQuery: mockEmbedQuery,
embedDocuments: mockEmbedDocuments,
});
mockPrepare.mockReturnValue(mockStatement); // <-- Re-configure prepare mock return value

// Reset embedQuery to handle initialization vector
mockEmbedQuery.mockResolvedValue(new Array(VECTOR_DIMENSION).fill(0.1));

Expand Down Expand Up @@ -263,13 +285,31 @@ describe("DocumentStore", () => {
.mockResolvedValueOnce(firstBatchEmbeddings)
.mockResolvedValueOnce(secondBatchEmbeddings);

// Mock insertDocument to return sequential rowids
for (let i = 0; i < numDocuments; i++) {
mockStatement.run.mockReturnValueOnce({
changes: 1,
lastInsertRowid: BigInt(i + 1),
});
}
// Patch mockPrepare for this test to handle library id resolution for test-lib-large-batch
const originalMockPrepare = mockPrepare;
mockPrepare = vi.fn((sql: string) => {
if (sql.includes("SELECT id FROM libraries WHERE name = ?")) {
return {
get: (name: string) =>
name === "test-lib-large-batch" ? { id: 1 } : undefined,
run: vi.fn(),
all: mockStatementAll,
};
}
if (sql.includes("INSERT INTO libraries")) {
return {
run: vi.fn(),
get: vi.fn(),
all: mockStatementAll,
};
}
return originalMockPrepare(sql);
});
mockDb.prepare = (...args: unknown[]) => mockPrepare(...args);

// Re-instantiate DocumentStore after patching mockPrepare
documentStore = new DocumentStore(":memory:");
await documentStore.initialize();

await documentStore.addDocuments("test-lib-large-batch", "1.0.0", documents);

Expand All @@ -284,35 +324,85 @@ describe("DocumentStore", () => {
});

describe("Embedding Model Dimensions", () => {
let getLibraryIdByNameMock: Mock;
let insertLibraryMock: Mock;
let lastInsertedVector: number[];

beforeEach(() => {
getLibraryIdByNameMock = vi.fn().mockReturnValue({ id: 1 });
insertLibraryMock = vi.fn().mockReturnValue({ changes: 1, lastInsertRowid: 1 });
lastInsertedVector = [];

mockPrepare.mockImplementation((sql: string) => {
if (sql.includes("SELECT id FROM libraries WHERE name = ?")) {
return {
get: getLibraryIdByNameMock,
run: vi.fn(),
all: mockStatementAll,
};
}
if (sql.includes("INSERT INTO libraries")) {
return {
run: insertLibraryMock,
get: vi.fn(),
all: mockStatementAll,
};
}
if (sql.includes("INSERT INTO documents_vec")) {
return {
run: vi.fn((...args) => {
if (typeof args[3] === "string") {
try {
const arr = JSON.parse(args[3]);
if (Array.isArray(arr)) lastInsertedVector = arr;
} catch {}
}
return { changes: 1, lastInsertRowid: 1 };
}),
get: vi.fn(),
all: mockStatementAll,
};
}
return {
get: vi.fn(),
run: vi.fn().mockReturnValue({ changes: 1, lastInsertRowid: 1 }),
all: mockStatementAll,
};
});
});

afterEach(() => {
mockPrepare.mockImplementation(() => ({
get: vi.fn().mockReturnValue(undefined),
run: vi.fn().mockReturnValue({ changes: 1, lastInsertRowid: 1 }),
all: mockStatementAll,
}));
});

it("should accept a model that produces ${VECTOR_DIMENSION}-dimensional vectors", async () => {
// Mock a ${VECTOR_DIMENSION}-dimensional vector
mockEmbedQuery.mockResolvedValueOnce(new Array(VECTOR_DIMENSION).fill(0.1));
documentStore = new DocumentStore(":memory:");
await expect(documentStore.initialize()).resolves.not.toThrow();
});

it("should accept and pad vectors from models with smaller dimensions", async () => {
// Mock 768-dimensional vectors
mockEmbedQuery.mockResolvedValueOnce(new Array(768).fill(0.1));
mockEmbedDocuments.mockResolvedValueOnce([new Array(768).fill(0.1)]);

documentStore = new DocumentStore(":memory:");
await documentStore.initialize();

// Should pad to ${VECTOR_DIMENSION} when inserting
const doc = {
pageContent: "test content",
metadata: { title: "test", url: "http://test.com", path: ["test"] },
};

// This should succeed (vectors are padded internally)
await expect(
documentStore.addDocuments("test-lib", "1.0.0", [doc]),
).resolves.not.toThrow();
});

it("should reject models that produce vectors larger than ${VECTOR_DIMENSION} dimensions", async () => {
// Mock a 3072-dimensional vector (like text-embedding-3-large)
mockEmbedQuery.mockResolvedValueOnce(new Array(3072).fill(0.1));
documentStore = new DocumentStore(":memory:");
await expect(documentStore.initialize()).rejects.toThrow(
Expand All @@ -321,12 +411,11 @@ describe("DocumentStore", () => {
});

it("should pad both document and query vectors consistently", async () => {
// Mock 768-dimensional vectors for both init and subsequent operations
const smallVector = new Array(768).fill(0.1);
mockEmbedQuery
.mockResolvedValueOnce(smallVector) // for initialization
.mockResolvedValueOnce(smallVector); // for search query
mockEmbedDocuments.mockResolvedValueOnce([smallVector]); // for document embeddings
.mockResolvedValueOnce(smallVector)
.mockResolvedValueOnce(smallVector);
mockEmbedDocuments.mockResolvedValueOnce([smallVector]);

documentStore = new DocumentStore(":memory:");
await documentStore.initialize();
Expand All @@ -336,24 +425,26 @@ describe("DocumentStore", () => {
metadata: { title: "test", url: "http://test.com", path: ["test"] },
};

// Add a document (this pads the document vector)
mockStatementAll.mockImplementationOnce(() => [
{
id: "id1",
content: "content",
metadata: JSON.stringify({}),
vec_score: 1,
fts_score: 1,
},
]);

await documentStore.addDocuments("test-lib", "1.0.0", [doc]);

// Search should work (query vector gets padded too)
await expect(
documentStore.findByContent("test-lib", "1.0.0", "test query", 5),
).resolves.not.toThrow();

// Verify both vectors were padded (via the JSON stringification)
const insertCall = mockStatement.run.mock.calls.find(
(call) => call[0]?.toString().startsWith("1"), // Looking for rowid=1
);
const searchCall = mockStatementAll.mock.lastCall;

// Both vectors should be stringified arrays of length ${VECTOR_DIMENSION}
const insertVector = JSON.parse(insertCall?.[3] || "[]");
const searchVector = JSON.parse(searchCall?.[2] || "[]");
expect(insertVector.length).toBe(VECTOR_DIMENSION);

expect(lastInsertedVector.length).toBe(VECTOR_DIMENSION);
expect(searchVector.length).toBe(VECTOR_DIMENSION);
});
});
Expand Down
Loading