Skip to content

Issues: IBM/data-prep-kit

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

[Bug] pdf2parquet must calculate hash and size on the file bug Something isn't working
#605 opened Sep 20, 2024 by sujee
1 of 2 tasks
[Feature] Enable pure python transforms in new spark runtime. enhancement New feature or request
#586 opened Sep 12, 2024 by daw3rd
1 of 17 tasks
[Bug] possible regression on ededupe code in release dev3 bug Something isn't working
#585 opened Sep 11, 2024 by sujee
1 of 2 tasks
[Bug] Testing Rag notebook with latest release of pdf2Parquet, eDedup and DocID bug Something isn't working
#583 opened Sep 10, 2024 by touma-I
1 of 2 tasks
[Bug] issues running ray transformations on Google colab bug Something isn't working
#582 opened Sep 10, 2024 by sujee
1 of 2 tasks
[Feature] Need better documentation of fuzzy dedupe enhancement New feature or request
#578 opened Sep 6, 2024 by sujee
2 tasks done
[Bug] Intermittent doc_id test-src failures in ci/cd. bug Something isn't working
#574 opened Sep 5, 2024 by daw3rd
2 tasks done
[Bug] improve performance of pdf2parquet enhancement New feature or request
#573 opened Sep 5, 2024 by sujee
1 of 2 tasks
[Bug] test/publish-image targets are disabled for pii_redactor/ray due to OSError bug Something isn't working
#571 opened Sep 4, 2024 by daw3rd
1 of 2 tasks
[Feature] Remove or merge older examples from examples/notebooks/archive enhancement New feature or request
#568 opened Sep 4, 2024 by daw3rd
2 tasks done
[Feature] HTML to Markdown (based on HTML2Parquet trafilatura code) enhancement New feature or request
#559 opened Aug 30, 2024 by touma-I
2 tasks done
[Bug] header_cleanser fails in running in openshift bug Something isn't working
#557 opened Aug 30, 2024 by dtsuzuku-ibm
1 of 2 tasks
[Feature] Publish data-prep-kit core and transforms NIGHTLY into pypi enhancement New feature or request
#554 opened Aug 29, 2024 by sujee
1 of 2 tasks
[Bug] pdf2parquet is now failing ci/cd builds bug Something isn't working
#552 opened Aug 28, 2024 by daw3rd
1 of 2 tasks
[Feature] Provide an operator that loads files content to parquet enhancement New feature or request
#543 opened Aug 26, 2024 by touma-I
2 tasks done
[Feature] Allow selected metadata fields to be ignored during tests. enhancement New feature or request
#536 opened Aug 23, 2024 by daw3rd
1 of 2 tasks
[Feature] Allow a transform to define the file extensions it supports enhancement New feature or request
#535 opened Aug 23, 2024 by daw3rd
1 of 2 tasks
[Feature] Publish Single Wheel for Doc Quality Transform enhancement New feature or request
#533 opened Aug 23, 2024 by touma-I
2 tasks done
Enhance Code2Parquet module to handle non-code text as well enhancement New feature or request
#520 opened Aug 19, 2024 by shahrokhDaijavad
1 of 2 tasks
[Feature] New num_processors python launcher option needs doc enhancement New feature or request
#503 opened Aug 14, 2024 by daw3rd
1 of 2 tasks
ProTip! Type g p on any issue or pull request to go back to the pull request listing page.