Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reflink-like extension #1562

Closed
Arteneko opened this issue Nov 3, 2019 · 18 comments
Closed

Reflink-like extension #1562

Arteneko opened this issue Nov 3, 2019 · 18 comments

Comments

@Arteneko
Copy link

Arteneko commented Nov 3, 2019

I'd like to add a footnotes reference tag ([^id]).

I assume it'd work like the link reference (reflink, in the source-code), but I don't really know how to extend the parser to add this custom tag.

How should I proceed to integrate a tag that is defined in two places, one building a reference body list and one replacing link tags with some HTML to point to the right reference?

@UziTech
Copy link
Member

UziTech commented Nov 3, 2019

Can you provide some markdown and resulting html to illustrate what you are looking for?

@Arteneko
Copy link
Author

Arteneko commented Nov 4, 2019

This is an example paragraph with a reference[^ref].

[...] below

[ˆref]: This is the cite reference that will be listed at bottom of the article.

Would either expose a list of {id, text} objects, as to let the user put those objects and end of page, or generate the following example HTML.

<p>This is an example paragraph with a reference<sup id="backref:ref"><a href="#ref:ref">2</a></sup></p>.

<!-- [...] -->

<hr />

<ul>
  <li id="ref:ref">This is the cite reference that will be listed at bottom of the article.
 <a href="#backref:ref">&larrhk;</a></li>
</ul>
  • Providing a stray [^id]: text value will just ignore it, as it isn't referenced anywhere.
  • Providing a stray [ˆid] reference will just ignore it, as it isn't defined anywhere.

Edit: The goal is not to integrate such a feature into marked, but rather to ask how would be the best way to integrate such a feature (including the complexity of references) into the parsing pipeline.

@UziTech
Copy link
Member

UziTech commented Nov 4, 2019

There are three ways You can change the output of marked:

  1. Convert the markdown to html before sending it to marked:

    const marked = require('marked');
    const markdown = '...';
    
    const markdownWithRefs = convertRefsToHTML(markdown);
    
    const html = marked(markdownWithRefs);
  2. Change tokens before sending to the parser:

    const marked = require('marked');
    const markdown = '...';
    
    const tokens = marked.lexer(markdown);
    
    const tokensWithRefs = addRefTokens(tokens);
    
    const html = marked.parser(tokensWithRefs);
  3. Change html after sending the markdown to marked:

    const marked = require('marked');
    const markdown = '...';
    
    const html = marked(markdown);
    
    const htmlWithRefs = convertRefsToHTML(html);

It would probably be better to combine these approaches and do some preprocessing of the markdown (like parsing and removing the footnotes) before sending it to marked then convert the references to links in the tokens and adding the footnotes back after marked is done rendering the rest.

@Arteneko
Copy link
Author

Arteneko commented Nov 5, 2019

I managed to make a custom method in front of Marked, that probably can be improved.

I decided to use the following format:

An <id> is of format a-zA-Z0-9_-
A reference link is of format [^<id>]
A reference block is of format ^<id>: <text>
A reference block cannot be multi-line

I originally wanted to be able to handle multi-line blocks, but that would require to consider a paragraph as a single reference block, i.e. to consider two successive lines with a reference block as a single line.

Basically, to say:

^ref: Test test
^ref2: Another test

=>
^ref: "Test test ^ref2: Another test"

Do you think it makes sense?

function marked(text) {
    // Reference definition
    const refblockRe = /^\^([\w\-]+): (.+)$/gm;
    // Reference link
    const reflinkRe = /\[\^([\w\-]+)\]/g;
    // Defined references
    const refs = [];
    // New token list (stripped of reference blocks)
    const editedToks = [];

    // Lexing to remove paragraph-level blocks
    const toks = _marked.lexer(text);
    editedToks.links = toks.links;
    for (const tok of toks) {
        if (tok.type !== 'paragraph'
            || !tok.text.match(refblockRe)) {
            editedToks.push(tok);
            continue;
        }

        let matches;
        while ((matches = refblockRe.exec(tok.text)) !== null) {
            refs.push({
                id: refs.length + 1,
                selector: matches[1],
                paragraph: _marked(matches[2]),
            });
        }
    }

    let parsedHtml = _marked.parser(editedToks);

    const errors = {
        refToUndefinedSelector: [], // Reference link to undefined block
        unusedSelector: [], // Reference block that is never linked
    };
    // Every block that is defined, then used.
    const usedSelectors = [];
    // Every reference link that should be transformed or removed in the HTML
    const reflinkTransformations = [];
    // Parse and replace reflinks
    let match;
    while ((match = reflinkRe.exec(parsedHtml)) !== null) {
        const selector = match[1];
        const ref = refs.find(ref => ref.selector === selector);

        // If reference to undefined selector
        // (no blockref for corresponding selector)
        if (!ref) {
            errors.refToUndefinedSelector.push(selector);
            reflinkTransformations.push({
                mode: 'delete',
                startIndex: match.index,
                length: match.input.length,
            });

            continue;
        }

        usedSelectors.push(selector);
        reflinkTransformations.push({
            mode: 'replace',
            startIndex: match.index,
            length: match[0].length,
            id: ref.id,
            selector: ref.selector,
        });
    }

    // Check for unused selectors
    errors.unusedSelector = refs
        .filter(ref => !usedSelectors.filter(selector => selector === ref.selector));

    // Inverse-order browse to apply transformations without breaking indexes
    for (const transformation of reflinkTransformations.sort((a, b) => b.id - a.id)) {
        const {id, selector} = transformation;
        const replacementValue = transformation.mode === 'replace'
            ? `<sup id="backref:${selector}"><a href="#ref:${selector}">${id}</a></sup>`
            : '';
        const before = parsedHtml.slice(0, transformation.startIndex);
        const after = parsedHtml.slice(transformation.startIndex + transformation.length, parsedHtml.length);
        parsedHtml = before + replacementValue + after;
    }

    return {
        references: refs,
        html: parsedHtml,
        errors,
    };
}

Edit: I don't know if it is possible, but it'd be nice to be able to somehow "inject" custom routines, much like middlewares, into the compiler, to simplify extension.

@UziTech
Copy link
Member

UziTech commented Nov 5, 2019

It looks like there is a spec for footnotes at markdownguide.org that uses [^label] syntax.

It looks like you have the right idea with replacing paragraph tokens. I would also run the body of the footnotes through marked so you can use markdown in the footnotes.

it'd be nice to be able to somehow "inject" custom routines, much like middlewares, into the compiler, to simplify extension.

We have talked about adding some sort of marked.use(marked-extension) method to allow extensions to hook into the process but there isn't a PR to implement that yet.

If you want to create a PR I would be happy to review it. 😁 👍

@Arteneko
Copy link
Author

Arteneko commented Nov 5, 2019

The issue with the markdownguide.org spec is that it's interpreted as a link, something I didn't like.

I would also run the body of the footnotes through marked so you can use markdown in the footnotes.

That is already done, see.

            refs.push({
                id: refs.length + 1,
                selector: matches[1],
                paragraph: _marked(matches[2]),
            });

If you want to create a PR I would be happy to review it.

I may look into that once I have a bit more time to myself.

Generators (two yields) or passing the lexer / parser would do for the extension? (should be discussed in another thread).

@Arteneko
Copy link
Author

Arteneko commented Nov 7, 2019

I tried to convert my code to instead use the [^id]: paragraph syntax as recommended by the markdownguide (which also was how I originally saw it).

That would mean that the lexer would trust those blocks as links.

Except that the current lexer implementation doesn't parse multi-word links (logical), so I'd need to internally change the lexer for that purpose.

Such a change would probably mean I'd add a references property to the lexer array, but that makes me think that there's something badly architectured: we have an array that has some plugged properties, which have nothing to do with the array itself.

IMHO, some breaking changes should ultimately be done:

  • The lexer should return an object comprised of the tokens list, the links list, and the optionally new reference list. Even without this new feature, this format would allow for easier and much cleaner extension.
  • The parser should take an object at least comprised of the tokens list, and the links list (basically, what is required).

For now, I stay with my single-line ^id: paragraph style, which is overall easier to handle, but I definitely think that there's something to change here, even if it isn't the datastructure itself.

@UziTech
Copy link
Member

UziTech commented Nov 7, 2019

Here is an implementation of footnotes that follows the spec.

marked.lexer breaks the markdown into block tokens (paragraphs, code blocks, etc) so it won't change anything inside a bracket to a link until it goes to the parser.

This code removes the footnotes from the block tokens, including multi-line footnotes, and changes the references to html before parsing the tokens. After parsing it adds the footnotes back to the html.

This code is in no way complete. There are probably edge cases that will fail but this should be a good start.

const marked = require('marked');

const markdown = `
Here's a simple footnote,[^1] and here's a longer one.[^bignote]

[^1]: This is the first footnote.

[^bignote]: Here's one with multiple paragraphs and code.

    Indent paragraphs to include them in the footnote.

    \`{ my code }\`

    Add as many paragraphs as you like.
`;

const footnotes = [];
const newTokens = [];
const footnoteTest = /^\[\^[^\]]+\]: /;
const footnoteMatch = /^\[\^([^\]]+)\]: ([\s\S]*)$/;
const referenceTest = /\[\^([^\]]+)\](?!\()/g;

// get block tokens
const tokens = marked.lexer(markdown);

// remove footnotes from tokens
for (let i = 0; i < tokens.length; i++) {
  if (tokens[i].type !== 'paragraph' || !footnoteTest.test(tokens[i].text)) {
    newTokens.push(tokens[i]);
    continue;
  }

  const match = tokens[i].text.match(footnoteMatch);
  const name = match[1].replace(/\W/g, '-');
  let note = match[2];

  // multiline notes will be considered indented code blocks
  if (i + 2 < tokens.length && tokens[i + 2].type === 'code' && tokens[i + 2].codeBlockStyle === 'indented') {
    note += '\n\n' + tokens[i + 2].text;
    i += 2;
  }

  footnotes.push({
    name,
    note: `${marked(note)} <a href="#fnref:${name}">↩</a>`
  });
}

// change references to superset links
for (let i = 0; i < newTokens.length; i++) {
  if (newTokens[i].type === 'paragraph' || newTokens[i].type === 'text') {
    newTokens[i].text = newTokens[i].text.replace(referenceTest, (ref, value) => {
      const name = value.replace(/\W/g, '-');
      let code = ref;
      for (let j = 0; j < footnotes.length; j++) {
        if (footnotes[j].name === name) {
          code = `<sup id="fnref:${name}"><a href="#fn:${name}">${j + 1}</a></sup>`;
          break;
        }
      }
      return code;
    });
  }
}

newTokens.links = tokens.links;

let html = marked.parser(newTokens);

// add footnotes back to html
if (footnotes.length > 0) {
  html += `
<hr />
<ol>
  <li>${footnotes.map(f => f.note).join('</li>\n  <li>')}</li>
</ol>
`;
}

console.log(html);

@cyanzhong
Copy link

Hi @UziTech, thanks for providing this solution, I just tried and it doesn't work for me using the latest version.

Is that possibly related to the token changes?

@UziTech UziTech assigned davisjam and unassigned davisjam May 22, 2020
@UziTech
Copy link
Member

UziTech commented May 22, 2020

Yes, the tokens returned by marked.lexer changed in v1.0.0 so they are in a tree instead of in an array. You can use marked.walkTokens instead of the for loop to iterate over all of the tokens.

@chrisparnin
Copy link

chrisparnin commented Jun 12, 2020

@cyanzhong @UziTech I updated that code to work with the newer token structure.

const marked = require('marked');

const markdown = `
Here's a simple footnote,[^1] and here's a longer one.[^bignote]

[^1]: This is the first footnote.

[^bignote]: Here's one with multiple paragraphs and code.
    \`my code\`
    Indent paragraphs to include them in the footnote.
    Add as many paragraphs as you like.
`;const footnotes = [];
const newTokens = [];
const footnoteTest = /^\[\^[^\]]+\]: /;
const footnoteMatch = /^\[\^([^\]]+)\]: ([\s\S]*)$/;
const referenceTest = /\[\^([^\]]+)\](?!\()/g;// get block tokens
const tokens = marked.lexer(markdown);// Check footnote
function checkFootnote (token) {
    if (token.type !== 'paragraph' || !footnoteTest.test(token.text)) {
      return;
    }
  
    const match = token.text.match(footnoteMatch);
    const name = match[1].replace(/\W/g, '-');
    let note = match[2];footnotes.push({
        name,
        note: `${marked(note)} <a href="#fnref:${name}">↩</a>`
    });// remove footnotes from tokens
    token.toDelete = true;};function checkReference(token)
{
    if( token.type === 'paragraph' || token.type === 'text' )
    {
        token.text = token.text.replace(referenceTest, (ref, value) => {
            const name = value.replace(/\W/g, '-');
            let code = ref;for (let j = 0; j < footnotes.length; j++) {
                if (footnotes[j].name === name) {
                    code = `<sup id="fnref:${name}"><a href="#fn:${name}">${j + 1}</a></sup>`;
                    break;
                }
            }
            return code;
        });if( token.type === 'paragraph')
        {
            // Override children
            token.tokens = marked.lexer(token.text)[0].tokens;
        }
    }
}function visit (tokens, fn)
{
    for( var token of tokens )
    {
        fn( token );
        // Visit children
        if( token.tokens )
        {
            visit( token.tokens, fn)
        }
    }
}visit( tokens, (token) => { checkFootnote(token); });
​
​
// Remove tokens from AST, starting with top-level
let workList = [ tokens ];
do {
    let tokenList = workList.pop();for(var i = tokenList.length-1; i >= 0 ; i--){
        if(tokenList[i].toDelete){
            tokenList.splice(i, 1);
        }
        else if( tokenList[i].tokens )
        {
            workList.push( tokenList[i].tokens );
        }
    }} while( workList.length != 0 )
​
​
visit( tokens, (token) => { checkReference(token); });let html = marked.parser(tokens);if (footnotes.length > 0) 
{
  html += `
  <hr />
  <ol>
    <li>${footnotes.map(f => f.note).join('</li>\n  <li>')}</li>
  </ol>
  `;
}console.log(html);

This is the output:

<p>Here&#39;s a simple footnote,<sup id="fnref:1"><a href="#fn:1">1</a></sup> and here&#39;s a longer one.<sup id="fnref:bignote"><a href="#fn:bignote">2</a></sup></p>

  <hr />
  <ol>
    <li><p>This is the first footnote.</p>
 <a href="#fnref:1"></a></li>
  <li><p>Here&#39;s one with multiple paragraphs and code.
    <code>my code</code>
    Indent paragraphs to include them in the footnote.
    Add as many paragraphs as you like.</p>
 <a href="#fnref:bignote"></a></li>
  </ol>

@jrandolf
Copy link
Contributor

jrandolf commented Jun 12, 2020

Here is some faster code (~30% more ops/sec) that uses Marked's renderer option. Just place this before parsing any markdown and after importing.

Warning Unlike the above code, this does not guarantee a footnote exists programmatically per reference.

const footnoteMatch = /^\[\^([^\]]+)\]:([\s\S]*)$/;
const referenceMatch = /\[\^([^\]]+)\](?!\()/g;
const referencePrefix = "marked-fnref";
const footnotePrefix = "marked-fn";
const footnoteTemplate = (ref, text) => {
  return `<sup id="${footnotePrefix}:${ref}">${ref}</sup>${text}`;
};
const referenceTemplate = ref => {
  return `<sup id="${referencePrefix}:${ref}"><a href="#${footnotePrefix}:${ref}">${ref}</a></sup>`;
};
const interpolateReferences = (text) => {
  return text.replace(referenceMatch, (_, ref) => {
    return referenceTemplate(ref);
  });
}
const interpolateFootnotes = (text) => {
  return text.replace(footnoteMatch, (_, value, text) => {
    return footnoteTemplate(value, text);
  });
}
const renderer = {
  paragraph(text) {
    return marked.Renderer.prototype.paragraph.apply(null, [
      interpolateReferences(interpolateFootnotes(text))
    ]);
  },
  text(text) {
    return marked.Renderer.prototype.text.apply(null, [
      interpolateReferences(interpolateFootnotes(text))
    ]);
  }
};
marked.use({ renderer });

If you want to parse footnotes in other locations, just use the following template and place this in the renderer object.

  [token_type](text) {
    return marked.Renderer.prototype[token_type].apply(null, [
      interpolateReferences(interpolateFootnotes(text))
    ]);
  }

@Reedo0910
Copy link

@jun-sheaf Thanks for your solution! It works great!
I find out that the footnote has to contain a space, which means it cannot be a single word, otherwise it won't work in that case. But I don't have a clue.

@georggrab
Copy link

Thanks a lot @jun-sheaf , can also confirm this works like a charm. Here's a version that will additionally add a section "References" (styleable with css class "marked-footnotes" see "footnoteContainerTemplate" below) around the footnotes on the bottom (I'm used to other markdown implementations doing this).

const footnoteMatch = /^\[\^([^\]]+)\]:([\s\S]*)$/;
const referenceMatch = /\[\^([^\]]+)\](?!\()/g;
const referencePrefix = "marked-fnref";
const footnotePrefix = "marked-fn";
const footnoteTemplate = (ref, text) => {
  return `<sup id="${footnotePrefix}:${ref}">${ref}</sup>${text}`;
};
const footnoteContainerTemplate = (text) => {
  return `<div class="marked-footnotes"><h2>References</h2>${text}</div>`
}
const referenceTemplate = ref => {
  return `<sup id="${referencePrefix}:${ref}"><a href="#${footnotePrefix}:${ref}">${ref}</a></sup>`;
};
const interpolateReferences = (text) => {
  return text.replace(referenceMatch, (_, ref) => {
    return referenceTemplate(ref);
  });
}
const interpolateFootnotes = (text) => {
  const found = text.match(footnoteMatch)
  if (found) {
    const replacedText = text.replace(footnoteMatch, (_, value, text) => {
        return footnoteTemplate(value, text);
    });
    return footnoteContainerTemplate(replacedText)
  }
  return text
}

const renderer = {
  paragraph(text) {
    return marked.Renderer.prototype.paragraph.apply(null, [
      interpolateReferences(interpolateFootnotes(text))
    ]);
  },
  text(text) {
    return marked.Renderer.prototype.text.apply(null, [
      interpolateReferences(interpolateFootnotes(text))
    ]);
  }
};
marked.use({ renderer });

@thediveo
Copy link

thediveo commented Aug 21, 2021

@talkdirty I'm trying to use your footnotes example #1562 (comment) with docsify. Unfortunately, it creates multiple "References" headings in my case within the same document, one for each footnote. From what I see, renderer.text gets called with the separate footnote paragraphs. Any idea?

Some footnote[^1] and[^2] here.

[^1]: and here's the footnote paragraph.

[^2]: and even more foot notes.

@sernaferna
Copy link

Thanks a lot @jun-sheaf , can also confirm this works like a charm. Here's a version that will additionally add a section "References" (styleable with css class "marked-footnotes" see "footnoteContainerTemplate" below) around the footnotes on the bottom (I'm used to other markdown implementations doing this).

...

Worked perfectly for me! Thanks so much, @talkdirty!

Here's a slightly updated TypeScript version, in which I added an ol to the footnoteContainerTemplate and li to the footnoteTemplate, as well as a link from the footnote back to the reference, to more closely match how GitHub does it.

import { marked } from 'marked';

const footnoteMatch = /^\[\^([^\]]+)\]:([\s\S]*)$/;
const referenceMatch = /\[\^([^\]]+)\](?!\()/g;
const referencePrefix = 'marked-fnref';
const footnotePrefix = 'marked-fn';

const footnoteTemplate = (ref: string, text: string) => {
  return `<li id="${footnotePrefix}:${ref}">${marked.parseInline(
    text
  )} <a href="#${referencePrefix}:${ref}">↩</a></li>`;
};
const footnoteContainerTemplate = (text: string) => {
  return `<div class="marked-footnotes"><h2>Footnotes</h2><ol>${text}</ol></div>`;
};
const referenceTemplate = (ref: string) => {
  return `<sup id="${referencePrefix}:${ref}"><a href="#${footnotePrefix}:${ref}">${ref}</a></sup>`;
};
const interpolateReferences = (text: string) => {
  return text.replace(referenceMatch, (_, ref) => {
    return referenceTemplate(ref);
  });
};
const interpolateFootnotes = (text: string) => {
  const found = text.match(footnoteMatch);
  if (!found) {
    return text;
  }
  const replacedText = text.replace(footnoteMatch, (_, value, text) => {
    return footnoteTemplate(value, text);
  });
  return footnoteContainerTemplate(replacedText);
};

export const footnotes: Partial<Omit<marked.Renderer<false>, 'options'>> = {
  paragraph(text) {
    return marked.Renderer.prototype.paragraph.apply(null, [interpolateReferences(interpolateFootnotes(text))]);
  },
  text(text) {
    return marked.Renderer.prototype.text.apply(null, [interpolateReferences(interpolateFootnotes(text))]);
  },
};

I define my marked extensions in separate files from each other, but of course it was imported and used by marked the usual way:

marked.use({ renderer: footnotes });

Note that I also had to tweak the TypeScript types, since @types/marked doesn't like the null first parameter to those last two apply() method calls.

@sdbbs
Copy link

sdbbs commented Jul 24, 2023

Tried the code in #1562 (comment) - but it does not resolve footnotes in unnumbered lists, how to get that? Also if you use a URL in the footnote text it gets broken on a new line (actually, it seems that [^1] See https://en.wikipedia.org/wiki, will cause the link parser to interpret this as a link, before the refrerence/footnote engine can process it) so irritating...

@sernaferna
Copy link

Tried the code in #1562 (comment) - but it does not resolve footnotes in unnumbered lists, how to get that? Also if you use a URL in the footnote text it gets broken on a new line (actually, it seems that [^1] See https://en.wikipedia.org/wiki, will cause the link parser to interpret this as a link, before the refrerence/footnote engine can process it) so irritating...

I've made some updates to my codebase since then, and simplified my footnote extension(s). Give the following a shot:

import { marked } from 'marked';

const fnRefRE = /^\[\^([^\]]+)\](?!:)/;
const fnRE = /^\[\^([^\]]+)\]: /;

export const FootnoteRefExtension: marked.RendererExtension | marked.TokenizerExtension = {
  name: 'FootnoteRefExtension',
  level: 'inline',
  start(src) {
    return src.match(fnRefRE)?.index || -1;
  },
  tokenizer(src, tokens) {
    const refMatch = fnRefRE.exec(src);
    if (!refMatch) {
      return;
    }

    const refToken: marked.Tokens.Generic = {
      type: 'FootnoteRefExtension',
      raw: refMatch[0],
      ref: refMatch[1],
    };
    return refToken;
  },
  renderer(token) {
    return `<sup><a href="#user-content-fn-${token.ref}" id="user-content-fnref-${token.ref}">${token.ref}</a></sup>`;
  },
};

export const Footnotes: Partial<Omit<marked.Renderer<false>, 'options'>> = {
  paragraph(text) {
    const fnMatch = fnRE.exec(text);
    if (!fnMatch) {
      return false;
    }

    let returnString = '<ul><li>';
    returnString += text.replace(fnMatch[0], '');
    returnString += ` <a href="#user-content-fnref-${fnMatch[1]}" id="user-content-fn-${fnMatch[1]}">↩</a>`;
    returnString += '</li></ul>';

    return returnString;
  },
};

I just confirmed that the following markdown works:

* list[^1] item
* list item

---
### Footnotes

[^1]: A footnote

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests