Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Markdown not escaping MD special characters #1215

Open
exiva opened this issue Jun 25, 2024 · 9 comments
Open

Markdown not escaping MD special characters #1215

exiva opened this issue Jun 25, 2024 · 9 comments

Comments

@exiva
Copy link

exiva commented Jun 25, 2024

Error details
When using markdown in fpdf2 it should respect escaped characters like underscores and asterisks. It does not currently.

Minimal code
Please include some minimal Python code reproducing your issue:

from fpdf import FPDF

pdf = FPDF(format=(101.6, 152.4))
pdf.set_margin(2)
pdf.add_page()
pdf.set_font("Times", size=16)

txt = "**This** is a markdown test. Theres\__double underscores"
pdf.multi_cell(w=50, text=txt, align='L', new_x='RIGHT',
                new_y='TOP', border=0, markdown=True,)

pdf.output("md.pdf")

Should output "This is a markdown test. Theres__double underscores" but outputs
Screenshot 2024-06-25 at 10 04 24 AM

Environment
Please provide the following information:

  • Operating System: macOS 14.5 Apple Silicon M2 Mac Mini
  • Python version: 3.11.9
  • fpdf2 version used: 2.7.9
@exiva exiva added the bug label Jun 25, 2024
@gmischler
Copy link
Collaborator

Thanks for reporting this, and welcome to fpdf2, @exiva !

Well, the documentation doesn't promise that our markup parser would honor backslash escapes, so technically this isn't a bug... 😉

it is a perfectly valid feature request, though!
Any chance you might want to submit a PR?

@andersonhc
Copy link
Collaborator

andersonhc commented Jun 25, 2024

If you want a head start, markdown markers are handled on fpdf _parse_chars() function.

fpdf2/fpdf/fpdf.py

Lines 3500 to 3515 in d574b07

if markdown:
if (
is_marker
and (not txt_frag or txt_frag[-1] != half_marker)
and (len(text) < 3 or text[2] != half_marker)
):
if txt_frag:
yield frag()
if text[:2] == self.MARKDOWN_BOLD_MARKER:
in_bold = not in_bold
if text[:2] == self.MARKDOWN_ITALICS_MARKER:
in_italics = not in_italics
if text[:2] == self.MARKDOWN_UNDERLINE_MARKER:
in_underline = not in_underline
text = text[2:]
continue

@gmischler
Copy link
Collaborator

Implemented in #1224

@Lucas-C
Copy link
Member

Lucas-C commented Aug 19, 2024

Hi.

I'd like to reopen this issue to further discuss this.

I may be mistaken, but the escaping implemented in PR #1224 does not seem to conform to the CommonMark standard.

You can test the rendering of \\**Lorem\\** \\\\__Ipsum\\\\__ --dolor-- in https://spec.commonmark.org/dingus/

The result is this:
image

Escaping double underscores in Markdown should be done this way:
\_\_Ipsum\_\_

What do you think @gmischler, @exiva & @david-fed?

@gmischler
Copy link
Collaborator

gmischler commented Aug 19, 2024

Escaping double underscores in Markdown should be done this way:
\_\_Ipsum\_\_

In "standard" markdown (including commonmark), escaping individual marker characters is necessary because both a single character as well as a double character sequence can act as a marker.

In our own decidedly non-standard markdown implementation, all markers consist of double character sequences. This has the advantage of reducing conflicts with actual text, which is why I hope we'll stick to the concept for any future feature additions. It also means that it is fundamentally incompatible with every other markdown implementation out there (which could be seen both as an advantage or disadvantage).
But the key consequence here is that there is no need - and no real reason - to adhere to any standard markdown escape specification. So as far as I am concerned, the current implementation is perfectly fine.

Added: The bug reported in #1236 obviously still needs to get fixed.

@Lucas-C
Copy link
Member

Lucas-C commented Aug 19, 2024

he key consequence here is that there is no need - and no real reason - to adhere to any standard markdown escape specification.

I do not quite agree with that 🙂

Yes, you are right, currently our Markdown implementation is relatively limited and far from following any standard.

However, I see several paths forward:

  1. we aim to progressively get closer to the standard CommonMark spec, which is why I find it problematic that fpdf2 currently renders \\\\__Ipsum\\\\__ the way it does, as it is not compatible with the CommonMark spec on backslash escapes: https://spec.commonmark.org/0.31.2/#backslash-escapes
  2. we drop the Markdown rendering feature in fpdf2 (in v2.8.0 ?), and simply recommend that users use mistletoe or any other Markdown-rendering library instead, because that's not really the goal of fpdf2 (which is what I do most of the time)
  3. we continue to have a limited / "broken" implementation of Markdown in fpdf2

IMHO we should aim for option 1. or 2., but not for option 3., which is the least interesting for fpdf2 in the long term.

@gmischler
Copy link
Collaborator

  1. we aim to progressively get closer to the standard CommonMark spec, which is why I find it problematic that fpdf2 currently renders \\\\__Ipsum\\\\__ the way it does, as it is not compatible with the CommonMark spec on backslash escapes: https://spec.commonmark.org/0.31.2/#backslash-escapes

Given the fundamental conceptual incompabilities and the way we currently handle deprecated features (without a clear deprecation strategy/policy), this would take decades and they would never actually be the same. It is better to keep it clearly distinct from commonmark, to avoid the two getting confused.

  1. we drop the Markdown rendering feature in fpdf2 (in v2.8.0 ?), and simply recommend that users use mistletoe or any other Markdown-rendering library instead, because that's not really the goal of fpdf2 (which is what I do most of the time)

  2. we continue to have a limited / "broken" implementation of Markdown in fpdf2

  1. Accept that our own markdown implementation is "special" (as mentioned, this actually has certain practical advantages vis-a-vis commonmark).

  2. If someone really absolutely wants a standards compliant markdown feature, add a "commonmark" parameter in parallel, which uses one of the several open source Python commonmark parsers, offering full compability. We have discussed this option before and you seemed to agree that it is a good idea.

@Lucas-C
Copy link
Member

Lucas-C commented Aug 20, 2024

5. If someone really absolutely wants a standards compliant markdown feature, add a "commonmark" parameter in parallel, which uses one of the several open source Python commonmark parsers, offering full compability. We have discussed this option before and you seemed to agree that it is a good idea.

In that comment I wrote:

So I'm not really sure of the path forward regarding Markdown support...

😅

We already have some documentation on how to use a Markdown library combined with fpdf2:
https://py-pdf.github.io/fpdf2/CombineWithMistletoeoToUseMarkdown.html

I'm really not sure that fpdf2 should include code specific to any Markdown renderer / CommonMark parser...
And even less sure that we should support "our own Markdown flavor" AND also be compatible with CommonMark, with an optional dependency...
This sounds a bit like feature creep to me.

But I'm honestly curious to know what end-users think about it:

  • should we keep our "very basic Markdown flavor"
  • should we get rid of it and better document how to use other Markdown libraries, that do a far better job than us, in conjunction with fpdf2?

Maybe we could start a GitHub discussion poll about this.

@gmischler
Copy link
Collaborator

So I'm not really sure of the path forward regarding Markdown support...

Ah, so I was overinterpreting your statements back then. Sorry for that.

All the same, making our markdown implementation "more like commonmark" in any meaningful way would require to change the fundamental concept of its syntax: Namely to use double characters for all formatting markers. Given our general promise to stay backwards compatible, often at great cost, I don't see this as a viable option.

So if we do want a more standard implementation (which essentially equals some flavour of commonmark), then there is no real alternative to adding it as an alternative in parallel.
If at the same time we don't want to bind ourselfes to any particular flavour/implementation of commonmark, maybe that would be another case for the data adapter concept? This one would need to be quite tightly integrated with our text preprocessing machinery, but I think it could reasonably be done.

@Lucas-C Lucas-C mentioned this issue Oct 4, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants