Skip to content

Create devdocs as .pdf and .html #707

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 30 commits into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
205 changes: 205 additions & 0 deletions .github/workflows/convert-docs-to-pdf.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
name: Convert DevDocs to PDF

on:
push:
branches:
- main
paths:
- '.github/workflows/convert-docs-to-pdf.yml'
pull_request:
# always build - therefore disabled
# paths:
# - '.github/workflows/convert-docs-to-pdf.yml'

concurrency:
group: "${{ github.workflow }}-${{ github.head_ref || github.ref }}"
cancel-in-progress: true

jobs:
build:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4

- uses: actions/setup-node@v4
with:
node-version: '22'

- uses: awalsh128/cache-apt-pkgs-action@latest
with:
packages: poppler-utils xmlstarlet
version: 1.0

- run: npm install puppeteer

- name: Crawl URLs from sitemap
run: |
set -e

mkdir -p urls

# Debug: Check tool versions
curl --version
xmlstarlet --version

# Step 1: Download sitemap
curl -s https://devdocs.jabref.org/sitemap.xml > urls/sitemap.xml
echo "=== sitemap.xml ==="
cat urls/sitemap.xml

# Step 2: Extract <loc> entries using xmlstarlet
xmlstarlet sel \
-N x="http://www.sitemaps.org/schemas/sitemap/0.9" \
-t -m '//x:url/x:loc' -v . -n urls/sitemap.xml \
> urls/raw.txt
echo "=== raw.txt (raw <loc> entries) ==="
cat urls/raw.txt

# Step 3: Ensure full URLs (prefix relative paths)
sed 's|^/|https://devdocs.jabref.org/|' urls/raw.txt > urls/list.txt
echo "=== list.txt (final full URLs) ==="
cat urls/list.txt

- name: Download pages as PDFs
run: |
mkdir pdfs
node <<'EOF'
const fs = require('fs');
const puppeteer = require('puppeteer');

(async () => {
const urls = fs.readFileSync('urls/list.txt', 'utf-8')
.split('\n')
.filter(Boolean);

const browser = await puppeteer.launch({
headless: "new",
args: ["--no-sandbox", "--disable-setuid-sandbox"]
});

for (let i = 0; i < urls.length; i++) {
const url = urls[i];
const urlPath = new URL(url).pathname.replace(/^\/|\/$/g, '');
const safeName = decodeURIComponent(urlPath).replace(/\//g, '--') || 'index';
const basePath = `pdfs/${safeName.replace(/\.html$/, '')}`;

console.log(`Rendering ${url} → ${basePath}.html/pdf`);

const page = await browser.newPage();
await page.goto(url, { waitUntil: 'networkidle2' });

const content = await page.content();
fs.writeFileSync(`${basePath}.html`, content);

await page.pdf({
path: `${basePath}.pdf`,
format: 'A4',
printBackground: true
});

await page.close();
}

await browser.close();
})();
EOF

- name: Install TeX Live
uses: zauguin/install-texlive@v4
with:
packages: >
bigintcalc
bitset
catchfile
cm
epstopdf-pkg
eso-pic
etexcmds
etoolbox
gettitlestring
graphics
graphics-cfg
graphics-def
hycolor
hyperref
iftex
infwarerr
intcalc
kvdefinekeys
kvoptions
kvsetkeys
l3backend
latex
latex-bin
latex-fonts
latexconfig
lm
ltxcmds
metafont
mfware
pdfescape
pdflscape
pdfpages
pdftexcmds
refcount
rerunfilecheck
stringenc
tools
uniquecounter
url
xcolor
xstring
- name: Merge PDFs into one PDF file
run: |
# Currently, filenames with % and # cannot be handled; thus we filter it
(echo pdfs/index.pdf; find pdfs/ -type f -name '*.pdf' ! -name 'index.pdf' | grep -v '%' | grep -v '#' | sort) | paste -sd, - > filelist.txt
cat filelist.txt

cat <<EOF > jabref-devdocs.tex
\documentclass[a4paper]{article}
\usepackage{pdfpages}
\usepackage[bookmarks=true,linktoc=all]{hyperref}
\usepackage{catchfile}
\usepackage{xstring}
\usepackage{etoolbox}

\begin{document}

\renewcommand{\do}[1]{%
\StrBehind{#1}{/}[\filename]%
\StrBefore{\filename}{.pdf}[\bookmarktitle]%
\clearpage
\phantomsection
\addcontentsline{toc}{section}{\bookmarktitle}
\includepdf[pages=-]{\detokenize{#1}}%
}

\CatchFileDef{\filelist}{filelist.txt}{}

\pdfbookmark[1]{Contents}{Contents}%
\tableofcontents

\expandafter\docsvlist\expandafter{\filelist}

\end{document}
EOF

pdflatex -shell-escape jabref-devdocs
pdflatex -shell-escape jabref-devdocs
mv jabref-devdocs.pdf pdfs/

- uses: actions/upload-artifact@v4
with:
name: docs
path: pdfs/

- name: Publish
if: github.ref == 'main' || github.head_ref == 'publish-devdocs-as-pdf'
uses: peaceiris/actions-gh-pages@v4
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: ./pdfs
publish_branch: devdocs
force_orphan: true
enable_jekyll: false