Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug that crashed table extraction when null value provided for (text|intersection)_(x|y)_tolerance keys #545

Merged
merged 1 commit into from
Nov 28, 2021

Conversation

samkit-jain
Copy link
Collaborator

Resolves bug raised in #539. The issue was happening because, in TableFinder, the settings were being cleaned up before the table extraction but in extract_table(s), the updated table settings were not being used and passing in None in the tolerance setting resulted in a comparison between numeric type and None type and that caused the exception. The issue was resolved by creating a static method to cleanup the user-provided table settings and using it for every operation post table extraction.

@samkit-jain samkit-jain self-assigned this Nov 22, 2021
…text|intersection)_(x|y)_tolerance` keys

Fixes #539
Thanks to @yoavxyoav for reporting
@codecov
Copy link

codecov bot commented Nov 22, 2021

Codecov Report

Merging #545 (a806bfa) into develop (c915a00) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head a806bfa differs from pull request most recent head 608db02. Consider uploading reports for the commit 608db02 to get more accurate results
Impacted file tree graph

@@           Coverage Diff            @@
##           develop     #545   +/-   ##
========================================
  Coverage    98.77%   98.77%           
========================================
  Files           10       10           
  Lines         1220     1226    +6     
========================================
+ Hits          1205     1211    +6     
  Misses          15       15           
Impacted Files Coverage Δ
pdfplumber/page.py 100.00% <100.00%> (ø)
pdfplumber/table.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c915a00...608db02. Read the comment docs.

@jsvine
Copy link
Owner

jsvine commented Nov 23, 2021

Thanks, @samkit-jain! I think it does make sense to do some settings cleanup/resolution, and so I'm definitely leaning toward merging this PR. But I wonder: Is it always safe to assume that a user passing None for a setting value means they want to use the default value? And if they want to use the default value, why pass the setting at all?

@samkit-jain
Copy link
Collaborator Author

I don't see any value in the settings that can have a meaningful null value and hence, think that the property not being provided or having the value as null can be treated the same. At https://github.com/jsvine/pdfplumber#table-extraction-settings, null value is being provided and that similar to in discussion #539 can confuse the users. We can also just remove the null and give those properties the default values.

@jsvine
Copy link
Owner

jsvine commented Nov 28, 2021

Thanks, @samkit-jain! That makes sense to me. And I agree: It's worth tweaking the README so that those None values in the table-settings documentation are listed as the default values. I can/will do that.

@jsvine jsvine merged commit f3a14bc into develop Nov 28, 2021
jsvine added a commit that referenced this pull request Nov 28, 2021
@samkit-jain samkit-jain deleted the discussion-539 branch November 29, 2021 06:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants