Skip to content

[Docs Site] Use vendored Markdown for llms-full.txt #23686

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 15, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/workflows/publish-production.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,11 @@ jobs:
cd distmd && zip -r markdown.zip .
npx wrangler r2 object put vendored-markdown/markdown.zip --file=markdown.zip --remote
rm markdown.zip

cd distllms
for file in $(find . -type f); do
npx wrangler r2 object put vendored-markdown/$file --file=$file --remote
done
- name: Upload vendored Markdown files to ZT DevDocs bucket
env:
AWS_ACCESS_KEY_ID: ${{ secrets.ZT_DEVDOCS_ACCESS_KEY_ID }}
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# build output
dist/
distmd/
distllms/
# generated types
.astro/

Expand Down
48 changes: 46 additions & 2 deletions bin/generate-index-md.ts
Original file line number Diff line number Diff line change
@@ -1,12 +1,23 @@
import { readFileSync, writeFileSync, mkdirSync } from "node:fs";
import {
readFileSync,
writeFileSync,
mkdirSync,
appendFileSync,
} from "node:fs";

import glob from "fast-glob";
import { parse } from "node-html-parser";
import { htmlToMarkdown } from "~/util/markdown";

import YAML from "yaml";

const files = await glob("dist/**/*.html");

for (const file of files) {
if (file === "dist/index.html" || file === "dist/404.html") {
continue;
}

const html = readFileSync(file, "utf-8");
const dom = parse(html);

Expand All @@ -24,9 +35,42 @@ for (const file of files) {
continue;
}

const product = file.split("/")[1];
const path = file.replace("dist/", "distmd/").replace(".html", ".md");

mkdirSync(path.split("/").slice(0, -1).join("/"), { recursive: true });

writeFileSync(path, markdown);

const llmsFullContent = ["<page>", markdown, "</page>\n\n"].join("\n");

mkdirSync(`distllms/${product}`, { recursive: true });
appendFileSync("distllms/llms-full.txt", llmsFullContent);
appendFileSync(`distllms/${product}/llms-full.txt`, llmsFullContent);

try {
const path = await glob(`src/content/products/${product}.*`).then((arr) =>
arr.at(0),
);

if (!path) {
continue;
}

const yaml = YAML.parse(readFileSync(path, "utf-8"));
const group = yaml.product?.group?.replaceAll(" ", "-").toLowerCase();

if (!group) {
continue;
}

mkdirSync(`distllms/${group}`, { recursive: true });
appendFileSync(`distllms/${group}/llms-full.txt`, llmsFullContent);
} catch (error) {
if (error instanceof Error) {
console.error(
`Failed to find a product group for ${product}:`,
error.message,
);
}
}
}
62 changes: 0 additions & 62 deletions src/pages/[area]/llms-full.txt.ts

This file was deleted.

50 changes: 0 additions & 50 deletions src/pages/[product]/llms-full.txt.ts

This file was deleted.

33 changes: 0 additions & 33 deletions src/pages/llms-full.txt.ts

This file was deleted.

11 changes: 11 additions & 0 deletions worker/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,17 @@ export default class extends WorkerEntrypoint<Env> {
});
}

if (request.url.endsWith("/llms-full.txt")) {
const { pathname } = new URL(request.url);
const res = await this.env.VENDORED_MARKDOWN.get(pathname.slice(1));

return new Response(res?.body, {
headers: {
"Content-Type": "text/markdown; charset=utf-8",
},
});
}

if (request.url.endsWith("/index.md")) {
const htmlUrl = request.url.replace("index.md", "");
const res = await this.env.ASSETS.fetch(htmlUrl, request);
Expand Down
35 changes: 0 additions & 35 deletions worker/index.worker.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -204,41 +204,6 @@ describe("Cloudflare Docs", () => {
const text = await response.text();
expect(text).toContain("# Cloudflare Developer Documentation");
});

it("llms-full.txt", async () => {
const request = new Request("http://fakehost/llms-full.txt");
const response = await SELF.fetch(request);

expect(response.status).toBe(200);

const text = await response.text();
expect(text).toContain("URL: https://developers.cloudflare.com/");
expect(text).toContain('from "~/components"');
});

it("product-specific llms-full.txt", async () => {
const request = new Request("http://fakehost/workers/llms-full.txt");
const response = await SELF.fetch(request);

expect(response.status).toBe(200);

const text = await response.text();
expect(text).toContain("URL: https://developers.cloudflare.com/");
expect(text).toContain('from "~/components"');
});

it("area-specific llms-full.txt", async () => {
const request = new Request(
"http://fakehost/developer-platform/llms-full.txt",
);
const response = await SELF.fetch(request);

expect(response.status).toBe(200);

const text = await response.text();
expect(text).toContain("URL: https://developers.cloudflare.com/");
expect(text).toContain('from "~/components"');
});
});

describe("index.md handling", () => {
Expand Down