Hindi text is rendered incorrectly #365

namastevis · 2022-03-14T17:15:49Z

While trying to generate a pdf using FPDF2, the Hindi text is not generated correctly. I have tried using different fonts (Gargi, Mangal, Arjun-Wide, Mukta, Lohit) but all give the wrong result similar to what shown below.
Correct hindi text: इण्टरनेट पर हिन्दी के साधन
What is printed:

It seems the issue happens in the following two scenarios:
1.

When this appears before a character, while printing it moves to the next character.

When two consonants are merged to generate a ligature in Hindi, they get split into two.

gmischler · 2022-03-14T22:28:31Z

Unfortunately, this is not a trivial problem to solve, and fpdf is a deliberately simple PDF generation library.

What you're seeing is a lack of support for automatic ligatures, more specifically Devaganari conjuncts. There are hundreds of those, many more than normal characters. A supporting font will include a table of character sequences that are supposed to be substituted for a ligature glyph. The most complex example in your text is this (separated with spaces on the left side, so the browser does not combine them):
न् ् द ी ▶️ न्दी
Yes, that's four (4) individual unicode characters in the text that together should result in a single glyph.

Unfortunately, fpdf currently operates on a character-by-character basis when first determining the width of each character and later printing a suitable glyph from the selected font. Supporting ligatures would require their substitution to happen as the very first step. We would also need a custom datastructure to represent them, because they cannot be represented by a python unicode character.

Technically all of that it is certainly possible, but I wouldn't hold my breath for it right now. Anyone who knows enough about the internal structure of ttf fonts is of course welcome to contribute...

Btw: Ligatures exist in many other writing systems. And another peculiarity that might also be interesting is contextual forms, where a different glyph is used for the same character, depending on whether it appears at the beginning, the middle, or the end of a word, or isolated (common eg. in Arabic, Hebrew, Mongolian, etc.).

Lucas-C · 2022-04-23T14:03:44Z

This issue also affects Tamil text: global-healthy-liveable-cities/global_scorecards#7

MayankFawkes · 2022-08-11T12:26:49Z

@gmischler @Lucas-C unfortunately still the same problem, Pillow had the same problem rendering fonts then they added font layout engine and ImageFont.Layout.RAQM which solves the problem i am not really good but ig libraqm can help if someone can add it in fpdf2 useful link https://github.com/python-pillow/Pillow/blob/main/src/_imagingft.c#L118

Lucas-C · 2022-08-11T13:22:31Z

This is an interesting lead, thank you @MayankFawkes.
One limitation is that libraqm is a C library that hasn't been packaged as a Python package, AFAIK.
Hence it won't be straightforward for fpdf2 to have a dependency on it.

gmischler · 2022-08-11T14:21:11Z

Interesting indeed!

Especially since we already have Pillow as a dependency...
Could we possibly "borrow" their layout engine? Or can that only be used to add text to an image?

MayankFawkes · 2022-08-11T14:42:13Z

@Lucas-C there are some ways we can use it like, The build binary of libraqm is available, We can just use the binary of libraqm for Linux is it really simple apt-get install libraqm0 libraqm-dev and for windows there are third party builds available, We can release an optional update which support of libraqm engine so the problem of rendering hindi and other Unicode fonts will get solved.

First, there is the ctypes module in the standard library. It allows you to load a dynamic-link library (DLL on Windows, shared libraries .so on Linux) and call functions from these libraries, directly from Python. Such libraries are usually written in C. -- source

dependency problem: Pillow uses libraqm and doesn't care about installing it with pillow because it is optional if we want pillow to decode fonts properly then we have to manually install it, we can do the same and if we want to provide it as a dependency the best way to make our build for different architecture and put it in the pip wheel file.

to add libraqm dependency: there is a lib written in c/c++ for decoding qr and barcodes called zbar and they also have binary files so someone made a warper for that called pyzbar and this is how he building wheel file to add support of zbar binary link just adding binaries to wheel

I am dropping some more links to add c support in python with ctypes
digitalocean
betterprogramming

gmischler · 2022-08-12T08:01:55Z

If fpdf2 were linux-only, using stuff like ctypes would be no problem.
But for Windows, MacOS, and potentially other systems, only dependencies that can be installed via pip are practially realistic.

This specific issue here is "only" about ligatures, which is primarily necessary for indic scripts.
For scripts derived from aramaic (arabic, hebrew, mongolian, etc.), a more complete solution would indeed also handle bidi text and positional context, so having a feature complete "layout engine" would be nice.

A python implementation of the bidi algorithm is available in python-bidi, though it doesn't look particularly complicated, so we could easily roll our own.
There's also python-arabic-reshaper. It reverses the direction and does the necessary contextual and ligature substitutions. Unfortunately, all the substitutions are hardcoded, so it is suitable for arabic and kurdish text only. But it can still serve as an example on a possible way to proceed and what issues to look out for.

A general solution to ligatures requires a lookup of the substitutions in the font data. This seems straightforward, but we'll have to see what pitfalls we run into with it.
Note that we have some requirements that I haven't yet seen satisfied with any of the existing modules. Among other things, for any ligature glyph sequence, we need to preserve information about the original unicode code points. Those will be added as supplementary info to the PDF text, so that when you copy from a PDF viewer, you get the original unsubstituted text back.

So I suspect we can't just slap on a few more dependencies and let those do the work for us.
I'd suggest a step-by-step approach, that we essentially have already begun with:

switch to fonttools (currently being worked on: Rewriting add_font() and _putfonts() using Fonttools library #477)
support ligature substitutions
support contextual substitutions
support bidi text
support vertical text

Obviously all of this won't happen within a few weeks. Care should also be taken at any step to take the possible requirements of the following steps into account, at least as far as can be predicted at the given time.

Lucas-C · 2023-08-02T10:44:22Z

@andersonhc PR #820 has been merged today.

Could you test if that solved your issue @namastevis?

You can install fpdf2 directly from the master branch of this repo with this command:

pip install git+https://github.com/PyFPDF/fpdf2.git@master

The documentation is there: https://pyfpdf.github.io/fpdf2/TextShaping.html

mohindra9211 · 2023-09-30T14:21:49Z

I continue to encounter the same problem, and unfortunately, it remains unresolved. I have experimented with various Hindi Devanagari fonts, but the text still does not render correctly.

Original Text = परी कथाएँ काल्पनिक होते हुए भी मन को उड़ान देने वाली और शिक्षाप्रद होती हैं।

FPDF2 output

I'm perplexed by this issue. When I copy the output from FPDF2 and paste it into a web browser, it displays the correct output. I'm struggling to comprehend the source of this problem.

Browser output: परी कथाएँ काल्पनिक होते हुए भी मन को उड़ान देने वाली और शिक्षाप्रद होती हैं।

It seems the issue happens in the following two scenarios:
1.

2. When this appears before a character, while printing it moves to the next character.

When two consonants are merged to generate a ligature in Hindi, they get split into two.

I humbly seek your support. In my role as a data scientist, I have explored different libraries for PDF generation and have observed that FPDF consistently delivers better outcomes in comparison to ReportLab, which encounters the same issue.

Python Version 3.11.4
FPDF2 Latest Version

andersonhc · 2023-09-30T14:35:14Z

@mohindra9211 did you try "set_text_shaping()"?

here is the small test I did:

from fpdf import FPDF

text= "परी कथाएँ काल्पनिक होते हुए भी मन को उड़ान देने वाली और शिक्षाप्रद होती हैं।"

pdf = FPDF()
pdf.add_page()
pdf.add_font(family="Mangal", fname="C:\\Apps\\fpdf2\\test\\text_shaping\\Mangal 400.ttf")
pdf.set_font("Mangal", size=40)
pdf.set_text_shaping(False)
pdf.multi_cell(w=pdf.epw, txt=text, new_x="LEFT", new_y="NEXT")
pdf.ln()
pdf.set_text_shaping(True)
pdf.multi_cell(w=pdf.epw, txt=text, new_x="LEFT", new_y="NEXT")
pdf.output("hindi.pdf")

And the results with text shaping enabled looks correct.

mohindra9211 · 2023-09-30T14:57:12Z

andersonhc

Dear AndersonHC,

I want to express my sincere gratitude for your assistance. You've helped me resolve a significant issue. However, I've noticed a minor problem in the output, and I suspect it might be related to the font. I'll try using different fonts to see if that resolves the issue.

Thank you once again for your valuable help.

andersonhc · 2023-09-30T15:39:40Z

Can you tell me what font and text you used?
I'd love to have all those glitches corrected.

mohindra9211 · 2023-09-30T15:46:25Z

Can you tell me what font and text you used? I'd love to have all those glitches corrected.

This problem is solved
The "Karma" font (in the file "Karma-Regular.ttf") is the most suitable choice for displaying Hindi text. I have included a sample for your reference.

Thank you once again for your valuable help.

mohindra9211 · 2023-09-30T15:54:07Z

Can you tell me what font and text you used? I'd love to have all those glitches corrected.

Tomorrow, I'll provide a list of fonts that correctly support Hindi text. Please incorporate this information into your document. It will be particularly beneficial for FPDF2 users, especially those in India. I appreciate your support and prompt response. Thank you.

gmischler · 2023-10-02T19:28:25Z

Those fonts that don't work with fpdf2, do they produce correct results with other software?
If so, then maybe you can provide a list of that category as well.
If we can't make them work on our own, then it may actually be that harfbuzz (the library that does the actual text shaping) is unable to handle them. In that case, the developers there might be interested to learn about it.

sanjaykare · 2023-10-25T10:22:47Z

@mohindra9211 did you try "set_text_shaping()"?

here is the small test I did:

from fpdf import FPDF

text= "परी कथाएँ काल्पनिक होते हुए भी मन को उड़ान देने वाली और शिक्षाप्रद होती हैं।"

pdf = FPDF()
pdf.add_page()
pdf.add_font(family="Mangal", fname="C:\\Apps\\fpdf2\\test\\text_shaping\\Mangal 400.ttf")
pdf.set_font("Mangal", size=40)
pdf.set_text_shaping(False)
pdf.multi_cell(w=pdf.epw, txt=text, new_x="LEFT", new_y="NEXT")
pdf.ln()
pdf.set_text_shaping(True)
pdf.multi_cell(w=pdf.epw, txt=text, new_x="LEFT", new_y="NEXT")
pdf.output("hindi.pdf")

And the results with text shaping enabled looks correct.

I tried with the given code but it's not working & tried to Mangal_Regular font. May font problem. please correct it.

mohindra9211 · 2023-10-25T10:40:10Z

@mohindra9211 did you try "set_text_shaping()"?
here is the small test I did:

from fpdf import FPDF

text= "परी कथाएँ काल्पनिक होते हुए भी मन को उड़ान देने वाली और शिक्षाप्रद होती हैं।"

pdf = FPDF()
pdf.add_page()
pdf.add_font(family="Mangal", fname="C:\\Apps\\fpdf2\\test\\text_shaping\\Mangal 400.ttf")
pdf.set_font("Mangal", size=40)
pdf.set_text_shaping(False)
pdf.multi_cell(w=pdf.epw, txt=text, new_x="LEFT", new_y="NEXT")
pdf.ln()
pdf.set_text_shaping(True)
pdf.multi_cell(w=pdf.epw, txt=text, new_x="LEFT", new_y="NEXT")
pdf.output("hindi.pdf")

And the results with text shaping enabled looks correct.

I tried with the given code but it's not working & tried to Mangal_Regular font. May font problem. please correct it.

Read "AttributeError" carefully and write the correct path

sanjaykare · 2023-10-25T12:53:11Z

@mohindra9211 did you try "set_text_shaping()"?
here is the small test I did:
from fpdf import FPDF

text= "परी कथाएँ काल्पनिक होते हुए भी मन को उड़ान देने वाली और शिक्षाप्रद होती हैं।"

pdf = FPDF()
pdf.add_page()
pdf.add_font(family="Mangal", fname="C:\\Apps\\fpdf2\\test\\text_shaping\\Mangal 400.ttf")
pdf.set_font("Mangal", size=40)
pdf.set_text_shaping(False)
pdf.multi_cell(w=pdf.epw, txt=text, new_x="LEFT", new_y="NEXT")
pdf.ln()
pdf.set_text_shaping(True)
pdf.multi_cell(w=pdf.epw, txt=text, new_x="LEFT", new_y="NEXT")
pdf.output("hindi.pdf")
And the results with text shaping enabled looks correct.
I tried with the given code but it's not working & tried to Mangal_Regular font. May font problem. please correct it.
Read "AttributeError" carefully and write the correct path

fixed, the file path was not correct. Thank you!
need more questions 1) can we use Hindi text with HTML tag? 2) Can we use Hindi with English text both?

mohindra9211 · 2023-10-26T05:51:59Z

To gain a better understanding of fpdf2, it is advisable to peruse the fpdf2 documentation along with its tutorials. It's worth noting that you can incorporate both Hindi and English text into your documents, depending on your coding proficiency.

namastevis added the bug label Mar 14, 2022

Lucas-C added the unicode label Mar 19, 2022

Lucas-C changed the title ~~Hindi text is printed correctly~~ Hindi text is rendered incorrectly Mar 19, 2022

carlhiggs mentioned this issue Apr 4, 2022

Tamil text appears incorrect, regardless of font global-healthy-liveable-cities/global_scorecards#7

Open

gmischler mentioned this issue Apr 7, 2022

fpdf2-manual.pdf: fix rendering of Hindi tutorial #381

Closed

Lucas-C mentioned this issue Apr 18, 2022

Translate the one-page tutorial in your language! #267

Open

gmischler mentioned this issue May 7, 2022

Switch to using fonttools #418

Closed

This was referenced Jun 22, 2022

Thai font collapse when using more than 1 tone marks #459

Closed

Missing and unknown text in outline when use pdf.start_section() #458

Closed

gmischler mentioned this issue Jul 21, 2022

Kannada characters are not displaying properly with Fpdf2 and Python3 #474

Closed

semaeostomea mentioned this issue Aug 16, 2022

added info about an arabic script fix, fixed typo #490

Merged

1 task

gmischler mentioned this issue Sep 7, 2022

Rewriting add_font() and _putfonts() using Fonttools library #477

Merged

5 tasks

Lucas-C added up-for-grabs hacktoberfest labels Sep 7, 2022

gmischler mentioned this issue Sep 15, 2022

New Feature: Support ligature Glyphs from TTF Fonts #540

Closed

eroux mentioned this issue Feb 22, 2023

I cannot render Khmer Unicode Properly in PDF file. #700

Closed

andersonhc mentioned this issue Jun 14, 2023

Text shaping #820

Merged

9 tasks

Lucas-C closed this as completed in #820 Aug 2, 2023

Lucas-C added the text-shaping label Aug 13, 2023

This comment was marked as resolved.

Sign in to view

kreier mentioned this issue May 30, 2024

Support for Khmer contains errors kreier/timeline#35

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hindi text is rendered incorrectly #365

Hindi text is rendered incorrectly #365

namastevis commented Mar 14, 2022 •

edited by Lucas-C

Loading

gmischler commented Mar 14, 2022

Lucas-C commented Apr 23, 2022

MayankFawkes commented Aug 11, 2022 •

edited

Loading

Lucas-C commented Aug 11, 2022

gmischler commented Aug 11, 2022

MayankFawkes commented Aug 11, 2022 •

edited

Loading

gmischler commented Aug 12, 2022 •

edited

Loading

Lucas-C commented Aug 2, 2023

mohindra9211 commented Sep 30, 2023 •

edited

Loading

andersonhc commented Sep 30, 2023

mohindra9211 commented Sep 30, 2023

andersonhc commented Sep 30, 2023

mohindra9211 commented Sep 30, 2023

This comment was marked as resolved.

mohindra9211 commented Sep 30, 2023

gmischler commented Oct 2, 2023

sanjaykare commented Oct 25, 2023

mohindra9211 commented Oct 25, 2023

sanjaykare commented Oct 25, 2023 •

edited

Loading

mohindra9211 commented Oct 26, 2023

Hindi text is rendered incorrectly #365

Hindi text is rendered incorrectly #365

Comments

namastevis commented Mar 14, 2022 • edited by Lucas-C Loading

gmischler commented Mar 14, 2022

Lucas-C commented Apr 23, 2022

MayankFawkes commented Aug 11, 2022 • edited Loading

Lucas-C commented Aug 11, 2022

gmischler commented Aug 11, 2022

MayankFawkes commented Aug 11, 2022 • edited Loading

gmischler commented Aug 12, 2022 • edited Loading

Lucas-C commented Aug 2, 2023

mohindra9211 commented Sep 30, 2023 • edited Loading

I continue to encounter the same problem, and unfortunately, it remains unresolved. I have experimented with various Hindi Devanagari fonts, but the text still does not render correctly.

Original Text = परी कथाएँ काल्पनिक होते हुए भी मन को उड़ान देने वाली और शिक्षाप्रद होती हैं।

FPDF2 output

Browser output: परी कथाएँ काल्पनिक होते हुए भी मन को उड़ान देने वाली और शिक्षाप्रद होती हैं।

andersonhc commented Sep 30, 2023

mohindra9211 commented Sep 30, 2023

andersonhc commented Sep 30, 2023

mohindra9211 commented Sep 30, 2023

This comment was marked as resolved.

mohindra9211 commented Sep 30, 2023

gmischler commented Oct 2, 2023

sanjaykare commented Oct 25, 2023

mohindra9211 commented Oct 25, 2023

sanjaykare commented Oct 25, 2023 • edited Loading

mohindra9211 commented Oct 26, 2023

namastevis commented Mar 14, 2022 •

edited by Lucas-C

Loading

MayankFawkes commented Aug 11, 2022 •

edited

Loading

MayankFawkes commented Aug 11, 2022 •

edited

Loading

gmischler commented Aug 12, 2022 •

edited

Loading

mohindra9211 commented Sep 30, 2023 •

edited

Loading

sanjaykare commented Oct 25, 2023 •

edited

Loading