Skip to content

Add cleanup action for "LaTeX to LaTeX aware Unicode" #8715

@JasonGross

Description

@JasonGross

Problem:

  • There is no cleanup action that allows converting (old) bibliographic data that is (still) formatted in LaTeX with Non-Unicode characters to Unicode aware LaTeX formatting (newer LaTeX engines (e.g. LaTeX2e) can now read most Unicode characters).
  • Current workarounds include converting to from LaTeX to Unicode and then back to LaTeX, while manually checking, if any characters were wrongly converted. This is inefficient and takes a long time.

Desired Solution:

  • Create cleanup action for "LaTeX to Unicode aware LaTeX".

Example workflow:

  1. Have the following entry (BEFORE using the cleanup action):

    @Article{Testkey,
      author   = {Testauthor},
      title    = {Bibliographic data that can be read by LaTeX engines},
      a = {Here is a backslashed percentage sign \% and it should be excluded from conversion},
      b = {Here is a \textcopyright{} and it should be converted to Unicode}, 
    }
    

    (Comment: \textcopyright{} can be converted to © by the inputenc package. When using the LaTeX to Unicode aware LaTeX cleanup action, the result of the conversion should also be ©)

  2. Use cleanup action "LaTeX to Unicode aware LaTeX"

  3. AFTER using the cleanup action, the following result should emerge:

    @Article{Testkey,
      author   = {Testauthor},
      title    = {Bibliographic data that can be read by LaTeX engines},
      a = {Here is a backslashed percentage sign \% and it should be excluded from conversion},
      b = {Here is a © and it should be converted to Unicode}, 
    }
    

"Special Symbols" that would need to be excluded from conversion:

  • The list should be similar to the symbols mentioned in Add integrity check for LaTeX special characters #8712.
  • At the very least Page 15 (Tables 1); Table 1 lists escapable special characters in LaTeX.
  • Maybe also Page 15 Table 2 and Page 16 Table 3.
  • There might be a lot more, but I am not knowledgable enough to list them here. If you know of any, just post it in this thread.

Additional Information

  • When working on this, The Comprehensive LATEX Symbol List will be of help. Especially chapters about "Unicode" (Page 272) and "Special Characters" (Page 15-16).
  • JabRef currently uses https://github.com/tomtung/latex2unicode; Maybe it can be adapted internally in JabRef (e.g. some pre-processing). Another solution would be to fork it or ask tomtung about creating a LaTeX2UnicodeAwareLaTeX converter.

Originally posted by @ThiloteE in #8490 (comment)

Metadata

Metadata

Assignees

No one assigned

    Projects

    Status

    Free to take

    Status

    Normal priority

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions