Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing pdfminer layout related objects "textboxhorizontal" and "textlinehorizontal" #359

Closed
frascuchon opened this issue Feb 23, 2021 · 2 comments
Assignees
Labels

Comments

@frascuchon
Copy link

Describe the bug

Hi. I'm not really sure if is a bug or an expected behaviour in newer versions, but the related pdfminer layout objects are missing under page.objects dict when passing laparams configuration.

Code to reproduce the problem

with pdfplumber.open(<path>, laparams= {}) as pdf:
  page = pdf.pages[0]
  assert "textboxhorizontal" in page.objects.keys() # This will fail

Expected behavior

Like previous versions, passing the laparams configuration on pdf creation, will make available layout objects.

Actual behavior

The custom objects "textboxhorizontal" and "textlinehorizontal", related to pdfminer layout analysis and given by default in previous versions (0.5.16 at least), are missing.

Environment

  • pdfplumber version: 0.5.26
  • Python version: 3.7.6
  • OS: Mac

Additional context

I think the else condition in function iter_layout_objects discard all high-level objects that could contain textual info (horizontal textboxes and horizontal lines)

def iter_layout_objects(self, layout_objects):

@frascuchon frascuchon added the bug label Feb 23, 2021
@jsvine
Copy link
Owner

jsvine commented Feb 23, 2021

Hi @frascuchon, and thanks for your interest in this library. I'll look into this.

@jsvine jsvine self-assigned this Feb 23, 2021
jsvine added a commit that referenced this issue Feb 26, 2021
This commit reinstates access to higher-level layout objects (such as
`textboxhorizontal`) when `laparams` is passed to
`pdfplumber.open(...)`. Had been removed in `0.5.24` via 1f87898.

Also adds a test for this behavior.
jsvine added a commit that referenced this issue Feb 28, 2021
Re-add textboxhorizontal/etc. when laparams (#359)
@jsvine
Copy link
Owner

jsvine commented Feb 28, 2021

Thanks again for opening the issues, @frascuchon. Those objects should be accessible again in the latest release, 0.5.27.

@jsvine jsvine closed this as completed Feb 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants