Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 TypeError: a bytes-like object is required, not 'str' #541

Closed
astariul opened this issue Oct 29, 2020 · 2 comments · Fixed by #733
Closed

🐛 TypeError: a bytes-like object is required, not 'str' #541

astariul opened this issue Oct 29, 2020 · 2 comments · Fixed by #733

Comments

@astariul
Copy link

astariul commented Oct 29, 2020

File

aa.pdf

Description

When trying to extract the pages from this file, I'm having following error :

File "/home/remondn/.venv/stable_prospector/lib/python3.6/site-packages/pdfminer/encodingdb.py", line 27, in name2unicode
name = name.split('.')[0]
TypeError: a bytes-like object is required, not 'str'

Code

from pdfminer.high_level import extract_pages
list(extract_pages("aa.pdf"))

Full stack-trace

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/remondn/.venv/stable_prospector/lib/python3.6/site-packages/pdfminer/high_level.py", line 149, in extract_pages
    interpreter.process_page(page)
  File "/home/remondn/.venv/stable_prospector/lib/python3.6/site-packages/pdfminer/pdfinterp.py", line 895, in process_page
    self.render_contents(page.resources, page.contents, ctm=ctm)
  File "/home/remondn/.venv/stable_prospector/lib/python3.6/site-packages/pdfminer/pdfinterp.py", line 908, in render_contents
    self.execute(list_value(streams))
  File "/home/remondn/.venv/stable_prospector/lib/python3.6/site-packages/pdfminer/pdfinterp.py", line 933, in execute
    func(*args)
  File "/home/remondn/.venv/stable_prospector/lib/python3.6/site-packages/pdfminer/pdfinterp.py", line 872, in do_Do
    ctm=mult_matrix(matrix, self.ctm))
  File "/home/remondn/.venv/stable_prospector/lib/python3.6/site-packages/pdfminer/pdfinterp.py", line 906, in render_contents
    self.init_resources(resources)
  File "/home/remondn/.venv/stable_prospector/lib/python3.6/site-packages/pdfminer/pdfinterp.py", line 354, in init_resources
    self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
  File "/home/remondn/.venv/stable_prospector/lib/python3.6/site-packages/pdfminer/pdfinterp.py", line 190, in get_font
    font = PDFType3Font(self, spec)
  File "/home/remondn/.venv/stable_prospector/lib/python3.6/site-packages/pdfminer/pdffont.py", line 647, in __init__
    PDFSimpleFont.__init__(self, descriptor, widths, spec)
  File "/home/remondn/.venv/stable_prospector/lib/python3.6/site-packages/pdfminer/pdffont.py", line 575, in __init__
    self.cid2unicode = EncodingDB.get_encoding(name, diff)
  File "/home/remondn/.venv/stable_prospector/lib/python3.6/site-packages/pdfminer/encodingdb.py", line 108, in get_encoding
    cid2unicode[cid] = name2unicode(x.name)
  File "/home/remondn/.venv/stable_prospector/lib/python3.6/site-packages/pdfminer/encodingdb.py", line 27, in name2unicode
    name = name.split('.')[0]
TypeError: a bytes-like object is required, not 'str'

Environment

  • OS : Ubuntu 18.04.4 LTS (Bionic Beaver)
  • Python version : 3.6.9
  • pdfminer version : 20200726

How can I run extract_pages() without having this error ?

@pietermarsman
Copy link
Member

Hi @astariul-colanim,

Thanks for raising the issue. I can replicate it with python tools/pdf2txt.py ~/Downloads/aa.pdf. This should not happen.

Its related to these issues:

(I prefer to keep them all open because we need to make sure the issue is fixed for each of the input pdf's)

@astariul
Copy link
Author

astariul commented Nov 8, 2020

Thanks @pietermarsman

Can you give me a pointer where can I monkey-patch the library to make it work ? (just a work-around is fine for me)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants