Skip to content

Commit

Permalink
Merge pull request #364 from jsvine/issue-359
Browse files Browse the repository at this point in the history
Re-add textboxhorizontal/etc. when laparams (#359)
  • Loading branch information
jsvine authored Feb 28, 2021
2 parents 6846405 + 5bf3298 commit e46376c
Show file tree
Hide file tree
Showing 4 changed files with 40 additions and 2 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ All notable changes to this project will be documented in this file. The format
## [0.5.27][Unreleased]
### Fixed
- Fix regression (introduced in `0.5.26`/[b1849f4](https://github.com/jsvine/pdfplumber/commit/b1849f4)) in closing files opened by `PDF.open`
- Reinstate access to higher-level layout objects (such as `textboxhorizontal`) when `laparams` is passed to `pdfplumber.open(...)`. Had been removed in `0.5.24` via [1f87898](https://github.com/jsvine/pdfplumber/commit/1f878988576017b64f5cd77e1eb21b401124c699). ([#359](https://github.com/jsvine/pdfplumber/issues/359) + [#364](https://github.com/jsvine/pdfplumber/pull/364))

## [0.5.26] — 2021-02-10
### Added
Expand Down
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,8 @@ The `open` method returns an instance of the `pdfplumber.PDF` class.

To load a password-protected PDF, pass the `password` keyword argument, e.g., `pdfplumber.open("file.pdf", password = "test")`.

To set layout analysis parameters to `pdfminer.six`'s layout engine, pass the `laparams` keyword argument, e.g., `pdfplumber.open("file.pdf", laparams = { "line_overlap": 0.7 })`.

Invalid metadata values are treated as a warning by default. If that is not intended, pass `strict_metadata=True` to the `open` method and `pdfplumber.open` will raise an exception if it is unable to parse the metadata.

### The `pdfplumber.PDF` class
Expand Down Expand Up @@ -201,6 +203,10 @@ Additionally, both `pdfplumber.PDF` and `pdfplumber.Page` provide access to two

[To be completed.]

### Obtaining higher-level layout objects via `pdfminer.six`

If you pass the `pdfminer.six`-handling `laparams` parameter to `pdfplumber.open(...)`, then each page's `.objects` dictionary will also contain `pdfminer.six`'s higher-level layout objects, such as `"textboxhorizontal"`.

## Visual debugging

__Note:__ To use `pdfplumber`'s visual-debugging tools, you'll also need to have two additional pieces of software installed on your computer:
Expand Down
7 changes: 5 additions & 2 deletions pdfplumber/page.py
Original file line number Diff line number Diff line change
Expand Up @@ -207,9 +207,12 @@ def point2coord(pt):

def iter_layout_objects(self, layout_objects):
for obj in layout_objects:
# If object is, like LTFigure, a higher-level object
# then iterate through it's children
# If object is, like LTFigure, a higher-level object ...
if hasattr(obj, "_objs"):
# and LAParams is passed, process the object itself.
if self.pdf.laparams is not None:
yield self.process_object(obj)
# Regardless, iterate through its children
yield from self.iter_layout_objects(obj._objs)
else:
yield self.process_object(obj)
Expand Down
28 changes: 28 additions & 0 deletions tests/test_laparams.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#!/usr/bin/env python
import unittest
import pdfplumber
import os

import logging

logging.disable(logging.ERROR)

HERE = os.path.abspath(os.path.dirname(__file__))


class Test(unittest.TestCase):
@classmethod
def setup_class(self):
self.path = os.path.join(HERE, "pdfs/issue-13-151201DSP-Fond-581-90D.pdf")

def test_without_laparams(self):
with pdfplumber.open(self.path, laparams=None) as pdf:
objs = pdf.pages[0].objects
assert "textboxhorizontal" not in objs.keys()
assert len(objs["char"]) == 4408

def test_with_laparams(self):
with pdfplumber.open(self.path, laparams={}) as pdf:
objs = pdf.pages[0].objects
assert len(objs["textboxhorizontal"]) == 21
assert len(objs["char"]) == 4408

0 comments on commit e46376c

Please sign in to comment.