Skip to content

Commit

Permalink
Tweak #467 code slightly, update CHANGELOG/thanks
Browse files Browse the repository at this point in the history
  • Loading branch information
jsvine committed Jul 15, 2021
1 parent 3ce87eb commit 9019854
Show file tree
Hide file tree
Showing 3 changed files with 20 additions and 18 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@
All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](http://keepachangelog.com/).

## [0.5.29] - [unreleased]
## Changed
- Change behavior of horizontal `text_strategy`, so that it uses the top and bottom of *every* word, not just the top of every word and the bottom of the last. ([#467](https://github.com/jsvine/pdfplumber/pull/467) + [#466](https://github.com/jsvine/pdfplumber/issues/466) + [#265](https://github.com/jsvine/pdfplumber/issues/265)) [h/t @bobluda + @samkit-jain]

### Development Changes
- Add `CONTRIBUTING.md` ([#428](https://github.com/jsvine/pdfplumber/pull/428))

Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -431,6 +431,7 @@ Many thanks to the following users who've contributed ideas, features, and fixes
- [xv44586](https://github.com/xv44586)
- [Alexander Regueiro](https://github.com/alexreg)
- [Daniel Peña](https://github.com/trifling)
- [bobluda](https://github.com/bobluda)

## Contributing

Expand Down
34 changes: 16 additions & 18 deletions pdfplumber/table.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,32 +85,30 @@ def words_to_edges_h(words, word_threshold=DEFAULT_MIN_WORDS_HORIZONTAL):
min_x0 = min(map(itemgetter("x0"), rects))
max_x1 = max(map(itemgetter("x1"), rects))

edges = [
{
"x0": min_x0,
"x1": max_x1,
"top": r["top"],
"bottom": r["top"],
"width": max_x1 - min_x0,
"orientation": "h",
}
for r in rects
]

# For each detected row, we also add the 'bottom' line.
# This will generate extra edges, (some will be redundant with the next row
# 'top' line), but this catches the last row of every table.
edges = []
for r in rects:
edges.append(
edges += [
# Top of text
{
"x0": min_x0,
"x1": max_x1,
"top": r["top"],
"bottom": r["top"],
"width": max_x1 - min_x0,
"orientation": "h",
},
# For each detected row, we also add the 'bottom' line. This will
# generate extra edges, (some will be redundant with the next row
# 'top' line), but this catches the last row of every table.
{
"x0": min_x0,
"x1": max_x1,
"top": r["bottom"],
"bottom": r["bottom"],
"width": max_x1 - min_x0,
"orientation": "h",
}
)
},
]

return edges

Expand Down

0 comments on commit 9019854

Please sign in to comment.