Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_image doesn't accept parameter "width" #798

Closed
pseudomonas opened this issue Jan 29, 2023 · 6 comments
Closed

to_image doesn't accept parameter "width" #798

pseudomonas opened this issue Jan 29, 2023 · 6 comments
Assignees
Labels

Comments

@pseudomonas
Copy link

pseudomonas commented Jan 29, 2023

Describe the bug

page.to_image(resolution=300) works fine (for any value of 300)

page.to_image(width=1000) does not work, despite the docs referring to https://docs.wand-py.org/en/latest/wand/image.html#wand.image.Image which says that width should be an accepted parameter

Traceback (most recent call last):
  File "[FILE_PATH_REDACTED]", line 151, in <module>
    im = page.to_image(width=1000)
  File "/home/username/miniconda3/envs/envname/lib/python3.10/site-packages/pdfplumber/page.py", line 386, in to_image
    return PageImage(self, **kwargs)
TypeError: PageImage.__init__() got an unexpected keyword argument 'width'

Process finished with exit code 1

Environment

  • pdfplumber version: 0.7.6
  • Python version: 3.10.4
  • OS: Ubuntu 22.04
  • Pillow: 9.1 (I tried with 9.2 then downgraded to 9.1; it did not appear to affect this issue)
@jsvine jsvine self-assigned this Feb 1, 2023
@jsvine
Copy link
Owner

jsvine commented Feb 1, 2023

Hi @pseudomonas, and thanks for raising this issue. Right now, resolution is the only wand.image.Image kwarg passable. I should either enable all kwargs or clarify the documentation on this point. To better understand the use-case: What is your particular intent with passing width?

jsvine added a commit that referenced this issue Feb 3, 2023
Thanks to @pseudomonas in #798 for flagging.

Decided to go for a stricter approach, given the complexity of getting a
more flexible approach right. But open to PRs that provide the more
flexible approach.
@pseudomonas
Copy link
Author

I had previously converted the PDF to images with ghostscript , and done some image-processing on the images. I wanted to generate some equivalently-sized images showing the PDF annotations.

In the end, I found another way to do it. But it'd be really useful to have some methods to do scaling of objects, where I can convert the pdf measurement units to pixels given one of an page-image height, an page-image width, or a DPI value.

@jsvine
Copy link
Owner

jsvine commented Feb 3, 2023

Thanks, that's helpful context. If I'm understanding correctly, I think your suggestion is this: To be able to call, e.g., page.to_image(width=1000) and have pdfplumber figure out the implied resolution based on the page height. Is that correct? If so, I think that makes sense and can see adding that.

@pseudomonas
Copy link
Author

My original usecase was exactly as you say there, yes.

My second request was for a general translator of user-units to pixels so I could say:

page.convert_units_to_pixels(mychar["x0"], width=1000) and know how far mychar is from the border of my image (which I have colour-pre-processed in ways that are outside the scope of pdfplumber).

@pseudomonas
Copy link
Author

pseudomonas commented Feb 3, 2023

(actual use-case of that: find the pixel region just to the right of the last character in each line, and see if there's an un-OCR'ed hyphen lurking there)

Actually a pair of methods convert_units_to_pixels and convert_pixels_to_units would be ideal. It's not exactly hard to do some division and rounding but it'd be a nice utility thing to have.

jsvine added a commit that referenced this issue Feb 14, 2023
@jsvine
Copy link
Owner

jsvine commented Feb 14, 2023

width and height keyword arguments for .to_image(...) now available in v0.8.0. Give them a spin and let me know what you think.

As for your second request:

  • Feel free to open another issue specific to that feature request. Having separate issues makes it easier to discuss.
  • For now, you might find PageImage._reproject((x: float, y: float)) -> (x: float, y:float) achieves your goals. (It also has the advantage of not having to re-specify width/height each time.) Does it?

@jsvine jsvine closed this as completed Feb 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants