Initial version of profiler #269

blublinsky · 2024-06-12T19:53:10Z

Why are these changes needed?

New transform

Related issue number (if any).

https://github.ibm.com/ai-models-data/data-prep-kit-inner/issues/84

daw3rd

I also think this should be renamed to profiler. aggregation is not what it is doing.

transforms/universal/aggregator/ray/images/exactdedup.png

transforms/universal/aggregator/ray/src/aggregator_transform_ray.py

daw3rd · 2024-06-13T13:56:59Z

transforms/universal/aggregator/ray/test-data/input/sample1.parquet

Maybe there should be 2 test files so aggregation is done across multiple calls to transform()

1 is good enough

blublinsky · 2024-06-13T14:25:11Z

@daw3rd one of the larger issues I am getting with this is testing. Because I am:

output csv files that we are not currently supporting in test framework
file names are not fixed. The latter we have also have an issue https://github.ibm.com/ai-models-data/data-prep-kit-inner/issues/82, which asks for a timestamp for metadata

daw3rd · 2024-06-13T16:00:38Z

@daw3rd one of the larger issues I am getting with this is testing. Because I am:

1. output csv files that we are not currently supporting in test framework

2. file names are not fixed. The latter we have also have an issue https://github.ibm.com/ai-models-data/data-prep-kit-inner/issues/82, which asks for a timestamp for metadata

Maybe promoting the methods added in spark (highlighted below) to the super class and using them somehow in the super class would do it. Then for this transform, you override _validate_metadata() or use some other mechanism to get any metadata file. Similar override of the unhighlighted method could be done for the csv file?

transforms/universal/aggregator/ray/src/aggregator_transform_ray.py

blublinsky · 2024-06-13T20:23:53Z

profiler

done

transforms/universal/profiler/ray/test/test_profiler.py

data-processing-lib/python/src/data_processing/test_support/abstract_test.py

blublinsky marked this pull request as draft June 12, 2024 19:58

blublinsky added the enhancement New feature or request label Jun 13, 2024

blublinsky marked this pull request as ready for review June 13, 2024 13:03

daw3rd requested changes Jun 13, 2024

View reviewed changes

transforms/universal/aggregator/ray/src/aggregator_transform_ray.py Outdated Show resolved Hide resolved

blublinsky changed the title ~~Initial version of aggregator~~ Initial version of profiler Jun 13, 2024

blublinsky force-pushed the aggregator branch from f09955b to 4fd1b2e Compare June 13, 2024 20:33

blublinsky added 13 commits June 17, 2024 14:55

Initial version of aggregator

e0af1a8

Initial version of aggregator

cca54f9

bug fixes

51c99ea

bug fixes

ca94b5a

bug fixes

b0add10

nasty bug fixes

e96b896

addressed review comments

8d63695

fixed testing

ad7fac7

fixed testing

a34abe8

renamed

8fc1c15

renamed version

275cd78

renamed version

4522e65

renamed classes

5d970e5

blublinsky force-pushed the aggregator branch from 1e5f690 to 5d970e5 Compare June 17, 2024 14:02

blublinsky added 2 commits June 17, 2024 16:04

restored fdedup

1a52621

restored fdedup

e76572f

daw3rd requested changes Jun 17, 2024

View reviewed changes

transforms/universal/profiler/ray/test/test_profiler.py Outdated Show resolved Hide resolved

data-processing-lib/python/src/data_processing/test_support/abstract_test.py Outdated Show resolved Hide resolved

blublinsky added 3 commits June 17, 2024 20:57

address coments

c15c84e

address coments

41549a6

address coments

f7bb008

daw3rd approved these changes Jun 18, 2024

View reviewed changes

blublinsky merged commit e21a6ab into dev Jun 18, 2024
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial version of profiler #269

Initial version of profiler #269

blublinsky commented Jun 12, 2024

daw3rd left a comment

daw3rd Jun 13, 2024

blublinsky Jun 17, 2024

blublinsky commented Jun 13, 2024

daw3rd commented Jun 13, 2024 •

edited

Loading

blublinsky commented Jun 13, 2024

Initial version of profiler #269

Initial version of profiler #269

Conversation

blublinsky commented Jun 12, 2024

Why are these changes needed?

Related issue number (if any).

daw3rd left a comment

Choose a reason for hiding this comment

daw3rd Jun 13, 2024

Choose a reason for hiding this comment

blublinsky Jun 17, 2024

Choose a reason for hiding this comment

blublinsky commented Jun 13, 2024

daw3rd commented Jun 13, 2024 • edited Loading

blublinsky commented Jun 13, 2024

daw3rd commented Jun 13, 2024 •

edited

Loading