Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#610: SampleArray util method replaced by using arrow::compute::Take … #612

Merged
merged 13 commits into from
Sep 29, 2022

Conversation

mstaylor
Copy link
Collaborator

…and pass corresponding indices to it. Note:

1)Commented code that was deprecated by the resolution of this issue
2) Note comments in sample_binary_array function related to result status.

…te::Take and pass corresponding indices to it. Note:

 1)Commented code that was deprecated by the resolution of this issue
 2) Note comments in sample_binary_array function related to result status.
…te::Take and pass corresponding indices to it - partial
Comment on lines 312 to 316
auto arr_result = builder.Finish();

RETURN_ARROW_STATUS_IF_FAILED(arr_result.status());

auto indices = arr_result.ValueOrDie();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto arr_result = builder.Finish();
RETURN_ARROW_STATUS_IF_FAILED(arr_result.status());
auto indices = arr_result.ValueOrDie();
ARROW_ASSIGN_OR_RAISE(auto indices, builder.Finish());

This is a convenient macro where arrow::Result is returned.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be done

Comment on lines 320 to 324
auto sample_array = arrow::compute::Take(out, indices, take_options);

RETURN_ARROW_STATUS_IF_FAILED(sample_array.status());

out = sample_array.ValueOrDie().make_array();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be done

@@ -166,6 +166,7 @@ arrow::Status Duplicate(const std::shared_ptr<arrow::Table> &table, arrow::Memor
return arrow::Status::OK();
}

/**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove all commented out code

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be done

…te::Take and pass corresponding indices to it - changes from PR review
…te::Take and pass corresponding indices to it - changes from build to using std:vector and std:sort. Vector data is moved to an arrow::Buffer::Wrap (wrapper) that can be used by the actual arrow::Array.
@nirandaperera
Copy link
Collaborator

@mstaylor Hey Bob, I added a vector wrapping util while fixing this issue #615. I think you can reuse it in this PR, if you take a pull from the main

@mstaylor
Copy link
Collaborator Author

@mstaylor Hey Bob, I added a vector wrapping util while fixing this issue #615. I think you can reuse it in this PR, if you take a pull from the main

OK - I will check on that.

…te::Take and pass corresponding indices to it - changes from build to using std:vector and std:sort. Vector data is moved to an arrow::Buffer::Wrap (wrapper) that can be used by the actual arrow::Array (continued - adds call to WrapNumericVector function added by cylondata#615-cylondata#616)
Comment on lines 125 to 128
cylon::Status SampleTableUniform(const std::shared_ptr<Table> &local_sorted,
int num_samples, std::vector<int32_t> sort_columns,
std::shared_ptr<Table> &sample_result,
const std::shared_ptr<CylonContext> &ctx);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use all arrow objects in this header, and at the call site convert the arrow table to cylon table.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand why we would want to do this. Can you please elaborate on the reasoning.

Comment on lines 207 to 208
std::vector<int64_t> vector_indices(num_samples);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should do a reserve here. Otherwise vector would instantiate num_samples number of 0 in the vector.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just syntactic sugar. All utils in this header are accepting/ returning arrow objects. 🙂

Copy link
Collaborator Author

@mstaylor mstaylor Sep 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok just pushed a change for this. Note my question on std::move. Since arrow::Table is an abstract class, std::move will not compile. Thoughts?

…ns and remove the incorrect initialization in the constructor.
@mstaylor
Copy link
Collaborator Author

@mstaylor Hey Bob, I added a vector wrapping util while fixing this issue #615. I think you can reuse it in this PR, if you take a pull from the main

OK - I will check on that.

This should be done

Copy link
Collaborator

@nirandaperera nirandaperera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made some minor comments. Can you also run a code formatter with Google styling (100 chars)?

Comment on lines 553 to 556
auto status = util::SampleTableUniform(merged_table->get_table(), num_split_points, sort_columns,
const_cast<std::shared_ptr<arrow::Table> &>(split_points->get_table()), ctx);

return cylon::Status(static_cast<int>(status.code()), status.message());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mstaylor this could be problematic. Hmm... This will actually create a NPE AFAIU.

Suggested change
auto status = util::SampleTableUniform(merged_table->get_table(), num_split_points, sort_columns,
const_cast<std::shared_ptr<arrow::Table> &>(split_points->get_table()), ctx);
return cylon::Status(static_cast<int>(status.code()), status.message());
std::shared_ptr<arrow::Table> atable;
RETURN_CYLON_STATUS_IF_ARROW_FAILS(util::SampleTableUniform(merged_table->get_table(),
num_split_points, sort_columns, atable, ctx);
return Table::FromArrowTable(ctx, std::move(atable), split_points);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completed


std::shared_ptr<Table> cylon_sample_result;

Table::FromArrowTable(ctx, sample_result, cylon_sample_result);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Table::FromArrowTable(ctx, sample_result, cylon_sample_result);
RETURN_CYLON_STATUS_IF_TAILED(Table::FromArrowTable(ctx, sample_result, cylon_sample_result));

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completed

@nirandaperera
Copy link
Collaborator

@mstaylor BTW please resolve the conflicts as well 🙂

# Conflicts:
#	cpp/src/cylon/table.cpp
@mstaylor
Copy link
Collaborator Author

made some minor comments. Can you also run a code formatter with Google styling (100 chars)?

Yes - the next build includes that requested change.

… applied google formatter per PR request. Validated tests complete successfully.
@mstaylor
Copy link
Collaborator Author

I've validated the build via the following command:

python build.py --cpp --test --python --pytest

@nirandaperera nirandaperera merged commit ef4c904 into cylondata:main Sep 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants