You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A description of the bug
With CMAP installed, the output file remained containing Chinese characters as raw cid code (CID:xxx).
While debuging, I found that PDFPageInterpreter.fontmap['F3'].cid2unicode a dictionary of 200+ items which is obviously not long enough.
Hope it helps!
Steps to reproduce the bug. Try to minimize the number of steps needed.
Include the command and/or script that you use. Also include the PDF that
you use. A0095607-010169.pdf
Bug report
With CMAP installed, the output file remained containing Chinese characters as raw cid code (CID:xxx).
While debuging, I found that PDFPageInterpreter.fontmap['F3'].cid2unicode a dictionary of 200+ items which is obviously not long enough.
Hope it helps!
Include the command and/or script that you use. Also include the PDF that
you use.
A0095607-010169.pdf
python tools/pdf2txt.py -c utf8/gbk/no matter what -o output.html samples/A0095607-010169.pdf
The text was updated successfully, but these errors were encountered: