Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arrow-flight connector #23032

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

sabbasani
Copy link
Contributor

@sabbasani sabbasani commented Jun 19, 2024

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Add Arrow Flight connector :pr:`23032`
* Add documentation for the :doc:`/connector/base-arrow-flight`  :pr:`23032`

If release note is NOT required, use:

== NO RELEASE NOTE ==

 

@sabbasani sabbasani requested a review from a team as a code owner June 19, 2024 11:23
@sabbasani sabbasani marked this pull request as draft June 19, 2024 11:42
@steveburnett
Copy link
Contributor

Consider adding documentation for the new connector.

Suggest revising the release note entry to follow the Release Note Guidelines:

== RELEASE NOTES ==

General Changes
* Add Arrow Flight connector :pr:`23032`

@tdcmeehan tdcmeehan self-assigned this Jul 15, 2024
Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there's a hardcoded presumption that the underlying datasource accepts a SQL query. Can you remove this from the PR? The service may not accept SQL.

public List<Field> getColumnsList(String schema, String table, ConnectorSession connectorSession)
{
try {
String dbSpecificSchemaName = getDBSpecificSchemaName(config, schema);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid "DB" references as it might not be an underlying DB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed "DB" references

Comment on lines 35 to 36
@JsonProperty("jdbcType") int jdbcType,
@JsonProperty("jdbcTypeName") String jdbcTypeName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't have references to JDBC in this connector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed JDBC re references

@steveburnett
Copy link
Contributor

Suggest change to the release note entry as follows:

== RELEASE NOTES ==

General Changes
* Add Arrow Flight connector :pr:`23032`

The documentation for Arrow Flight Connector appears to be being added in #23212 , so it doesn't need to be mentioned in this release note.

Copy link

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick pass and had a few comments. I'll take another look at the rest a little later.

<air.main.basedir>${project.parent.basedir}</air.main.basedir>
<grpc.version>1.53.0</grpc.version>
<dep.okhttp.version>4.10.0</dep.okhttp.version>
<arrow.version>11.0.0</arrow.version>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fairly old version of Arrow, can you use a more recent one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BryanCutler I have updated to leastest arrow version

presto-base-arrow-flight/pom.xml Show resolved Hide resolved
public ArrowColumnHandle(
@JsonProperty("columnName") String columnName,
@JsonProperty("columnType") Type columnType,
@JsonProperty("jdbcTypeHandle") ArrowTypeHandle arrowTypeHandle)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean? "jdbcTypeHandle" -> "arrowTypeHandle"

return;
}
try {
RootAllocator allocator = new RootAllocator(Long.MAX_VALUE);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would usually create a RootAllocator as a class member and they should be closed when not used anymore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BryanCutler I have made changes for closing allocator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not re-using the flight client. The root allocator will be closed when ArrowFlightClient is closed or auto closed.

}
}

logger.debug("location %s", location.getUri().toString());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to stay?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed log @BryanCutler

trustedCertificate.get().close();
}
shutdownTimer();
isClientClosed.set(true);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the calls to getClient() and close() need to be thread-safe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BryanCutler I have addressed comments can please review changes

Copy link

linux-foundation-easycla bot commented Jul 17, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the base-arrow-flight.rst documentation in #23212. If this documentation is to be included in this PR - which I think would be a good idea to do this - please address my comments in my most recent review in #23212 here.

@sabbasani sabbasani force-pushed the arrow-connector branch 4 times, most recently from a69b92e to 7bd4ec9 Compare July 24, 2024 13:38
import static java.util.Objects.requireNonNull;
import static java.util.stream.Collectors.joining;

public class ArrowQueryBuilder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might better belong in a submodule that depends on this module which implements the Flight SQL spec. I don't think it belongs here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdcmeehan I have addressed comments can please review changes

@sabbasani sabbasani force-pushed the arrow-connector branch 8 times, most recently from 04cdc53 to 824175b Compare July 25, 2024 21:05
@sabbasani
Copy link
Contributor Author

sabbasani commented Jul 25, 2024

@tdcmeehan @BryanCutler @steveburnett
we have addressed all comments above.Can you please review

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the docs! Minor suggestions, mostly formatting.

A local doc build returns the following warning:

/Users/steveburnett/Documents/GitHub/presto/presto-docs/src/main/sphinx/connector/base-arrow-flight.rst: WARNING: document isn't included in any toctree

To address this warning,

@steveburnett
Copy link
Contributor

Suggest update of the release note to include the PR number in both entries, and to link to the new doc from the release note.

== RELEASE NOTES ==

General Changes
* Add Arrow Flight connector :pr:`23032`
* Add documentation for the :doc:`/connector/base-arrow-flight`  :pr:`23032`

@sabbasani sabbasani force-pushed the arrow-connector branch 2 times, most recently from 123ab6b to e24a904 Compare July 30, 2024 15:45
Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you investigated how to make an end to end test with an example implementation? It would be great to demonstrate real queries working with this connector.

<version>2.0.65.Final</version>
</dependency>

<dependency>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand why so many exclusions are required in this POM?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we excluded dependencies to resolve security issues.should retain the dependencies instead of excluding?

Copy link
Contributor

@tdcmeehan tdcmeehan Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The excludes are very brittle, if possible please don't add them

query = Optional.empty();
}

public ArrowAbstractFlightRequest(String schema, String table, Optional<String> query)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still see schema, table and query being referenced as if this were JDBC. Can you please remove?


import static java.util.Objects.requireNonNull;

public class ArrowExpression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is an arrow expression? Is this a SQL query?

@tdcmeehan
Copy link
Contributor

@sabbasani is this PR still being worked on?

@sabbasani
Copy link
Contributor Author

@sabbasani is this PR still being worked on?

@sabbasani is this PR still being worked on?

Yes @tdcmeehan I am working on testcases as discussed earlier

@sabbasani sabbasani force-pushed the arrow-connector branch 2 times, most recently from 7fafbd2 to e0daebb Compare September 2, 2024 11:07
@sabbasani
Copy link
Contributor Author

@tdcmeehan I have added testcases for arrow flight connector. Can you please review.

@sabbasani sabbasani force-pushed the arrow-connector branch 4 times, most recently from 8b97b44 to e824379 Compare September 4, 2024 09:10
Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding these tests.

I'm wondering if we can use a more robust dataset, and then use a more robust test suite, such as by extending from AbstractTestQueries.

Please take a look at H2QueryRunner, which can be used to generate the complete TPCH dataset required by AbstractTestQueries. We should be able to generate the same dataset between the expected query runner and the Arrow query runner, and verify they return identical results in a new test suite that derives from AbstractTestQueries.

@Test
public void testShowSchemas()
{
assertEquals(queryRunner.execute("SHOW SCHEMAS FROM arrow").getRowCount(), 3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you validate the contents as well?

Comment on lines 74 to 82
String query1 = "SELECT * FROM testdb.example_table1";
String expected1 = "1, John Doe, 1990-05-15, 50000.00, true";
String expected2 = "2, Jane Smith, 1985-11-20, 60000.00, false";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please inline these variables

String expected2 = "2, Jane Smith, 1985-11-20, 60000.00, false";

String result1 = queryRunner.execute(query1).toString();
assertTrue(result1.contains(expected1), "Expected row not found: " + expected1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also check that this is the entirity of the result set, or at least check on rowcount.

{
// Ensure the catalog and schema names are correct
String query = "SHOW TABLES FROM testdb";
String[] expectedTables = {"example_table1", "example_table2", "example_table3"};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use one line for each

* ``ArrowAbstractFlightRequest.java``
Implement this class to define the request data, including the data source type, connection properties, the number of partitions and other data required to interact with database.

* ``ArrowAbstractMetadata.java``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* ``ArrowAbstractMetadata.java``
* ``AbstractArrowMetadata.java``

* ``ArrowAbstractMetadata.java``
To retrieve metadata (schema and table information), implement the abstract methods in the ArrowAbstractMetadata class.

* ``ArrowAbstractSplitManager.java``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* ``ArrowAbstractSplitManager.java``
* ``AbstractArrowSplitManager.java``


public ArrowFlightClient getClient(Optional<String> uri)
{
return initializeClient(uri);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not re-using the flight client. This is a change from the approach mentioned in https://github.com/prestodb/rfcs/blob/main/RFC-0004-arrow-flight-connector.md. While testing the connector with multi pod IBM flight server on an open shift cluster, we noticed that re-using the flight client does not distribute the flight get info requests to the different flight server pods. But opening new flight connections for every request distributed the client requests among the different flight pods.

@elbinpallimalilibm elbinpallimalilibm force-pushed the arrow-connector branch 2 times, most recently from b274a2d to b450554 Compare October 10, 2024 06:36
import static org.testng.Assert.assertTrue;

public class TestArrowFlightSmoke
extends AbstractTestQueries

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdcmeehan Following your suggestion we are running all the 500 or so tests in AbstractTestQueries apart from the ones defined here.

@elbinpallimalilibm elbinpallimalilibm force-pushed the arrow-connector branch 4 times, most recently from 82adf97 to 32c9c08 Compare October 10, 2024 11:19
Co-authored-by: sai bhaskar reddy <sai.bhaskar.reddy.sabbasani1@ibm.com>
Co-authored-by: SthuthiGhosh9400 <Sthuthi.Ghosh@ibm.com>
Co-authored-by: lithinwxd <Lithin.Purushothaman@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants