Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC1722: Support for displaying math(s) in messages #1722

Closed
wants to merge 4 commits into from

Conversation

uhoreg
Copy link
Member

@uhoreg uhoreg commented Nov 15, 2018

Rendered

I keep switching between preferring MathML or LaTeX. This proposal proposes MathML, but investigates LaTeX as an alternative enough to hint as to what a fully worked-out LaTeX proposal might look like.

This proposal is related to element-hq/element-web#1945

@uhoreg
Copy link
Member Author

uhoreg commented Nov 15, 2018

FWIW, to me, the question of whether to use LaTeX or MathML boils down to: would we rather make clients deal with the potential security nightmare of processing a Turing-complete language, or would we rather use a format that displays horribly for clients that don't understand it?

@uhoreg uhoreg changed the title Support for displaying math(s) in messages MSC1722: Support for displaying math(s) in messages Nov 15, 2018
@Evidlo
Copy link

Evidlo commented Nov 21, 2018

MathML should also be listed as a supported input for KaTeX: KaTeX/KaTeX#593

@uhoreg
Copy link
Member Author

uhoreg commented Nov 21, 2018

MathML should also be listed as a supported input for KaTeX: KaTeX/KaTeX#593

That issue seems to be about rendering to MathML, rather than rendering from MathML. I don't see anything about using MathML as an input to KaTeX.

Copy link
Member

@turt2live turt2live left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the detail into the alternatives here. Based on the information presented, Presentation MathML does seem to be the right answer, I think.

proposals/1722-math.md Outdated Show resolved Hide resolved
@uhoreg
Copy link
Member Author

uhoreg commented Nov 28, 2018

Summarizing some considerations with MathML vs LaTeX:

  • accessibility: Presentation MathML is not that great for accessibility, since it loses some of the semantics (e.g. $\binom{1}{2}$ becomes a fraction with linethickness=0, which loses the meaning of the expression). But some screen readers support MathML natively, and MathJax and KaTeX use MathML for accessibility when rendering LaTeX. I think that for some math, MathML would have better accessibility, whereas for others, presenting raw LaTeX code would be better, so there is no clear winner here, IMHO.
  • fallbacks: Using MathML would force clients that don't support math to parse the HTML in order to ignore the MathML and extract the fallback information, whereas with LaTeX, the LaTeX code itself could be a reasonable fallback, or it can be embedded in a way that the fallback doesn't need any special processing from the clients. I think that LaTeX is the clear winner here.
  • implementing display: or, if a client wants to implement displaying of math that it receives, and can't use a suitable library, how hard would it be to implement it? Implementing full support for either is pretty hard. However, the vast majority of math that people would be sending could be supported by supporting a smaller subset, and using a fallback rendering for displaying math that can't be supported. Most languages have an HTML parser (and if a client is already displaying the formatted_body, then it should have access to an HTML parser), so supporting MathML would be a matter of supporting individual MathML elements and attributes, which can be done incrementally. However, for LaTeX, it is likely that a client author would need to implement a TeX parser from scratch, and although it would probably not need to be a full TeX parser (e.g. it wouldn't need to implement things like \makeatletter), this would be extra work. This, along with the security considerations below, may discourage client authors from trying to implement math support. MathML seems to be the winner here.
  • implementing math input: If clients want to present some sort of fancy GUI for entering math, either format is probably equally easy to generate (although there may be existing libraries for input that will generate one but not the other). However, most clients will likely want to allow users to enter math using LaTeX since it is already a common format for math input, and is not too difficult to use. So clearly, if we use MathML, the client would need to convert from LaTeX to MathML, whereas if we use LaTeX, then the client does not need to do any conversion. LaTeX is the winner here.
  • security: MathML is just XML and does not introduce any new vulnerabilities, since clients are parsing HTML anyways. On the other hand, LaTeX is Turing complete, and malicious input could cause a LaTeX implementation that is not carefully written to behave unexpectedly. Most well-known libraries should be safe (with the notable exception that using latex itself would be very unsafe), but new implementations may be problematic. MathML seems to be the winner here.

MathML and LaTeX both have advantages and disadvantages. I think that the answer to which one is the best to use depends on the relative weights assigned to different factors.

@Half-Shot
Copy link
Contributor

Half-Shot commented Nov 29, 2018

So for fallbacks, I'm strongly inclined to want a thumbnail (optionally) as a fallback mechanism for dumb clients at the risk of it looking a little unclean. That might help solve the ugliness around MathML fallbacks, although admittedly clients need to know to display the fallback (extensible events where art thou?).

Looking at the langauges side by side though, I am deeply concerned with how verbose MathML is and it would likely bother folks to have to write a load of XML for a similar string of LaTeX. That's actually quite a major drawback for me as I don't see client's willing to spend a long time implementing complex UIs (at least initially) and this might severely reduce the pacticality of this event.

@Half-Shot
Copy link
Contributor

Hey @uhoreg, do you have a status update for this?

@uhoreg
Copy link
Member Author

uhoreg commented Jan 18, 2019

Hey @uhoreg, do you have a status update for this?

I've been focusing on E2E stuff lately. I'll probably try to poke at this again in Feb.

However, this only works for certain notation when using only the subset of
HTML allowed by Matrix, and requires that users have a font installed that
supports the necessary characters. Most importantly, one cannot write
matrices using this method, and failing to support matrices in a protocol
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤣

@Ralith
Copy link
Contributor

Ralith commented Jul 10, 2019

I am deeply concerned with how verbose MathML is and it would likely bother folks to have to write a load of XML for a similar string of LaTeX.

What about treating LaTeX as an input method for MathML, in the same vein that markdown is an input method for HTML-formatted messages?

@saad440
Copy link

saad440 commented Jul 11, 2019

I am partial towards implementing it at client side. We just transfer the Math as LaTeX which saves bandwidth, and we leave it upto the client whether it wants to render it as Math or just leave it as it it. This way it works at least for those who are used to communicating mathematical expressions regularly.
As to which markup to use, there are a few other options that simply $ and \( \).

  • $$ expression $$ which can avoid false positives although it is now deprecated TeX syntax.
  • `$ expression $` which will enclose it in code tags which we can post-process to look for Math expressions.

There are other approaches listed at cben/mathdown/wiki/math-in-markdown, Yihui Xie / 2018-07-22 and one working example at Upmath.me.

@uhoreg
Copy link
Member Author

uhoreg commented Jul 12, 2019

We're not going to debate delimiters in here, because in this part of the Math support, we're not going to be using plain text delimiters. The plain text message format should be left as-is by clients, and the HTML format should use HTML-like tags. If we use embed LaTeX in messages, it will probably be something like <mx-maths>\sin x^2</mx-maths> in the message format. It's fine to debate delimiters in the Riot issue as far as it pertains to users typing in the math, but for the part that this proposal addresses, it's out of scope.

Preferring LaTeX over MathML due to bandwidth considerations is fine, but there are a lot of other considerations, which we need to weigh bandwidth usage against. I like LaTeX too, but there are many reasons why it could end up being problematic. I'd love it if someone could come up with a compelling case as to why LaTeX is objectively better than MathML that addresses all the downsides of LaTeX.

@eMPee584
Copy link

eMPee584 commented Apr 3, 2020

[…] this would be super sweet to have right here right now 🤣

@turt2live turt2live added the kind:feature MSC for not-core and not-maintenance stuff label Apr 20, 2020
can support LaTeX, asciimath, and MathML inputs) and KaTeX (which can support LaTeX
inputs).
* Firefox and WebKit support MathML natively (though not perfectly, especially
with Content MathML), but Chrome and IE/Edge do not.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh. Nice!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://www.igalia.com/chats/ecosystem-health-ii

Can we actually make something that’s interoperable here?” At the time the MathML spec was very hand wavy, you know, the test suite was not very rigorous. And so we just pointed at the guidelines we already had for what bar do we apply when deciding whether something was mature enough to ship. And, frankly, we were worried that Igalia was under estimating how much work it was going to take to get some subset of MathML to that bar, and we we were worried that people would have, you know, some different expectations that, you know… Maybe if they could do a little bit of work, they could get to that bar… and that they would be disappointed when it came to an intent to ship. And so, we tried to spell it very clearly where we saw the risks with trying to meet that high bar that we have for interoperability. And now, the consensus is pretty clear that Igalia really stepped up, right? We were clear on what it would take to ship MathML, Igalia did the hard thankless work… With Google employees reviewing many of things the patches - but Igalia did a lot of work to meet that bar… and I think, you know, I think the jury is still out on, well.. I don’t know what the current frames are looking like, but I fully expect MathML to ship at some point and it’ll ship in Chrome at the same time… Even though from Google’s business perspective, it probably wouldn’t have been a good return on investment for us to do it… But I’m thrilled that Igalia was able to do it, even though our judgment would have been not worth it.

Sounds like great news.

@ArniDagur
Copy link

ArniDagur commented Oct 16, 2020

I am strongly in favour of transporting raw TeX math notation, instead of MathML.

For me, it boils down to the question: In a plain text environment (e.g. command line IRC clients), how do people communicate math? In my experience, for simple things, people use "programming syntax", e.g. ((1 + 3) * 8)^2 / 2 = 512. For anything more complicated, they either use TeX math notation (e.g. \sum_{i=1}^n i = \frac{n(n+1)}{2}), or perhaps most commonly a mix of both (e.g. \sum_{i=1}^n i = n(n+1) / 2), using TeX only for the parts that can't be described using the aforementioned "programming syntax".

Thus, I would imagine that the developer of a terminal-only Matrix client would like to display maths in raw TeX notation, if not simplifying to "programming syntax" whenever possible (e.g. \frac{n(n+1)}{2} becomes n(n+1)/2). That's what the user expects after all.

If maths is transferred using MathML instead TeX as one might expect, the maintainer of said terminal client would have to implement a converter from MathML to TeX, which AFAIK is a non-trivial task. Even if implemented, I'm also concerned that a TeX -> MathML -> TeX round-trip may not be entirely lossless. Since the sender's client probably uses TeX to input math, they probably expect that any clients that can't display TeX would just receive the same TeX code they sent verbatim; a lossy conversion from TeX to MathML and back is not what they'd expect or want.

Regarding security, I feel like talking about the turing completeness of LaTeX the monolithic typesetting framework is not especially relevant to a discussion regarding TeX as maths notation. The common subset of LaTeX's maths notation that libraries such as MathJax and KaTeX support is not Turing complete. Furthemore, MathML has actually been the culprit behind numerous security vulnerabilities. Just last week, for example, I read about a bug in DOMPurify that used MathML to enable cross-site scripting. Google even dropped support for MathML from its Chromium browser, citing security and maintainability concerns.

@Cadair
Copy link
Contributor

Cadair commented Oct 16, 2020

This MSC should probably be closed. The momentum is definitely behind the counter proposal for LaTeX: #2191 with WIP implementations for element-web and -android.

@uhoreg
Copy link
Member Author

uhoreg commented Oct 19, 2020

Good point. Given that we've had at least three implementations of #2191, and none for this, it seems #2191 is the one that most people want. I'm withdrawing this and renaming #2191 to be more generic.

@uhoreg uhoreg closed this Oct 19, 2020
@uhoreg uhoreg added abandoned A proposal where the author/shepherd is not responsive and removed proposal-in-review labels Oct 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
abandoned A proposal where the author/shepherd is not responsive kind:feature MSC for not-core and not-maintenance stuff proposal A matrix spec change proposal
Projects
None yet
Development

Successfully merging this pull request may close these issues.