Automate detection of dead doc links #15257

silverwind · 2017-09-08T05:04:54Z

Links in docs get regularily broken (example), and it should be possible to have a script that iterates all links in the docs and checks for a HTTP response code of < 400.

Probably not something we want to run as part of the CI, but I could see the script being run on-demand regularily.

Trott · 2017-09-08T05:37:16Z

I wonder if this might be something for @nodejs/website to figure out.

phillipj · 2017-09-08T06:57:11Z

IIRC @mikeal once said he created something for crawling every link found on a website?

tniessen · 2017-09-08T08:59:23Z

that iterates all links in the docs and checks for a HTTP response code of < 400.

This is probably not enough, changing headings within a page causes the #hash to change and links won't jump to the correct section anymore. So we will need to parse the retrieved documents as well.

vsemozhetbyt · 2017-09-08T09:08:24Z

Maybe we can use puppeteer for this.

vsemozhetbyt · 2017-09-08T10:21:54Z

Strawman with puppeteer for simple wrong hashes detection (intra links only):

script

'use strict';

const { URL } = require('url');
const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  const { href, origin, pathname } = new URL('https://nodejs.org/api/all.html');
  await page.goto(href);

  const wrongLinks = await page.evaluate((mainOrigin, mainPathname) => {
    return [...document.body.querySelectorAll('a[href]')]
           .filter(link => link.origin === mainOrigin &&
                           link.pathname === mainPathname &&
                           link.hash !== '' &&
                           document.body.querySelector(link.hash) === null)
           .map(link => `${link.innerText} : ${link.href}`)
           .join('\n');
  }, origin, pathname);

  console.log(wrongLinks);
  browser.close();
})();

Currently, it detects these links:

output

silverwind · 2017-09-08T14:43:57Z

I was more thinking of external links when I opened this, but it's good to have checks for those relative links too. For external ones, I think a simple status code check should be enough.

For the internal ones, I could see a check like above being part of the CI, but I don't think we can include puppeteer in the repository, it's just too heavy.

ghaiklor · 2017-09-10T12:15:39Z

Also, there is a tool called html-proofer that can be used for such things. I use it in some of my static pages to check if all the resources are exists like (images, stylesheets, etc) and... it also checks for any broken links on your website.

vsemozhetbyt · 2017-09-10T13:44:01Z

A more meticulous and tangled variant for internal links checking (for hash-only links and for inter-document links inside the doc site). It still uses puppeteer, so it is not bearable inside the repo or CI, but it can be occasionally used locally.

The current run has resulted in #15293 and #15291.

TimothyGu · 2017-09-10T14:51:10Z

Could we use something Node.js like jsdom or cheerio instead of Puppeteer? The latter sounds a lot like an overkill to me, while cheerio might even be small enough to be bundled in core.

TimothyGu · 2017-09-10T14:52:33Z

Or even better, a Markdown-based solution, that can possibly be integrated with doctool.

bnb · 2017-09-25T17:22:36Z

@TimothyGu I had someone PR Danger as a CI/CD tool for a markdown-only project of mine to detect broken links - it may be useful to run on docs updates?

http://danger.systems/js/
https://github.com/danger/danger-js

jasnell · 2018-08-12T06:09:41Z

There's been zero activity on this in 11 months. I recommend closing.

timaschew · 2018-08-12T07:33:04Z

I wrote a tool, similar to the html-proofer API, based on node.
I wrote it because html-proofer too slow for > 1000 pages.
Since my tool is using cheerio which is using htmlparser2, it's super fast.

https://github.com/timaschew/link-checker

vsemozhetbyt · 2018-08-12T10:04:26Z

FWIW, the internal doc system is checked now (see #21889), so we only need external link validation.

jasnell · 2018-10-17T17:18:12Z

Still no actual activity on this, should we keep it open?

silverwind · 2018-10-17T18:31:21Z

I agree, better close it than.

silverwind added the doc Issues and PRs related to the documentations. label Sep 8, 2017

targos mentioned this issue Sep 10, 2017

doc: add missing doc for readable._destroy #15316

Closed

2 tasks

jasnell added the stalled Issues and PRs that are stalled. label Aug 12, 2018

silverwind closed this as completed Oct 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automate detection of dead doc links #15257

Automate detection of dead doc links #15257

silverwind commented Sep 8, 2017

Trott commented Sep 8, 2017

phillipj commented Sep 8, 2017

tniessen commented Sep 8, 2017

vsemozhetbyt commented Sep 8, 2017

vsemozhetbyt commented Sep 8, 2017 •

edited

Loading

silverwind commented Sep 8, 2017 •

edited

Loading

ghaiklor commented Sep 10, 2017

vsemozhetbyt commented Sep 10, 2017 •

edited

Loading

TimothyGu commented Sep 10, 2017

TimothyGu commented Sep 10, 2017

bnb commented Sep 25, 2017

jasnell commented Aug 12, 2018

timaschew commented Aug 12, 2018 •

edited

Loading

vsemozhetbyt commented Aug 12, 2018

jasnell commented Oct 17, 2018

silverwind commented Oct 17, 2018 •

edited

Loading

Automate detection of dead doc links #15257

Automate detection of dead doc links #15257

Comments

silverwind commented Sep 8, 2017

Trott commented Sep 8, 2017

phillipj commented Sep 8, 2017

tniessen commented Sep 8, 2017

vsemozhetbyt commented Sep 8, 2017

vsemozhetbyt commented Sep 8, 2017 • edited Loading

silverwind commented Sep 8, 2017 • edited Loading

ghaiklor commented Sep 10, 2017

vsemozhetbyt commented Sep 10, 2017 • edited Loading

TimothyGu commented Sep 10, 2017

TimothyGu commented Sep 10, 2017

bnb commented Sep 25, 2017

jasnell commented Aug 12, 2018

timaschew commented Aug 12, 2018 • edited Loading

vsemozhetbyt commented Aug 12, 2018

jasnell commented Oct 17, 2018

silverwind commented Oct 17, 2018 • edited Loading

vsemozhetbyt commented Sep 8, 2017 •

edited

Loading

silverwind commented Sep 8, 2017 •

edited

Loading

vsemozhetbyt commented Sep 10, 2017 •

edited

Loading

timaschew commented Aug 12, 2018 •

edited

Loading

silverwind commented Oct 17, 2018 •

edited

Loading