-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arrow-flight connector #23032
base: master
Are you sure you want to change the base?
Arrow-flight connector #23032
Conversation
Consider adding documentation for the new connector. Suggest revising the release note entry to follow the Release Note Guidelines:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like there's a hardcoded presumption that the underlying datasource accepts a SQL query. Can you remove this from the PR? The service may not accept SQL.
public List<Field> getColumnsList(String schema, String table, ConnectorSession connectorSession) | ||
{ | ||
try { | ||
String dbSpecificSchemaName = getDBSpecificSchemaName(config, schema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid "DB" references as it might not be an underlying DB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed "DB" references
@JsonProperty("jdbcType") int jdbcType, | ||
@JsonProperty("jdbcTypeName") String jdbcTypeName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't have references to JDBC in this connector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed JDBC re references
Suggest change to the release note entry as follows:
The documentation for Arrow Flight Connector appears to be being added in #23212 , so it doesn't need to be mentioned in this release note. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a quick pass and had a few comments. I'll take another look at the rest a little later.
presto-base-arrow-flight/pom.xml
Outdated
<air.main.basedir>${project.parent.basedir}</air.main.basedir> | ||
<grpc.version>1.53.0</grpc.version> | ||
<dep.okhttp.version>4.10.0</dep.okhttp.version> | ||
<arrow.version>11.0.0</arrow.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fairly old version of Arrow, can you use a more recent one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BryanCutler I have updated to leastest arrow version
public ArrowColumnHandle( | ||
@JsonProperty("columnName") String columnName, | ||
@JsonProperty("columnType") Type columnType, | ||
@JsonProperty("jdbcTypeHandle") ArrowTypeHandle arrowTypeHandle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you mean? "jdbcTypeHandle" -> "arrowTypeHandle"
return; | ||
} | ||
try { | ||
RootAllocator allocator = new RootAllocator(Long.MAX_VALUE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You would usually create a RootAllocator as a class member and they should be closed when not used anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BryanCutler I have made changes for closing allocator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not re-using the flight client. The root allocator will be closed when ArrowFlightClient
is closed or auto closed.
} | ||
} | ||
|
||
logger.debug("location %s", location.getUri().toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this meant to stay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed log @BryanCutler
trustedCertificate.get().close(); | ||
} | ||
shutdownTimer(); | ||
isClientClosed.set(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the calls to getClient()
and close()
need to be thread-safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BryanCutler I have addressed comments can please review changes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a69b92e
to
7bd4ec9
Compare
import static java.util.Objects.requireNonNull; | ||
import static java.util.stream.Collectors.joining; | ||
|
||
public class ArrowQueryBuilder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might better belong in a submodule that depends on this module which implements the Flight SQL spec. I don't think it belongs here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tdcmeehan I have addressed comments can please review changes
04cdc53
to
824175b
Compare
@tdcmeehan @BryanCutler @steveburnett |
6d9d75b
to
11bada5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the docs! Minor suggestions, mostly formatting.
A local doc build returns the following warning:
/Users/steveburnett/Documents/GitHub/presto/presto-docs/src/main/sphinx/connector/base-arrow-flight.rst: WARNING: document isn't included in any toctree
To address this warning,
-
Add
connector/base-arrow-flight
to
https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/connector.rst so the new page is present on the Connector page in the Presto documentation. -
Add the new line in alphabetic order for the new page title Arrow-flight Connector.
Suggest update of the release note to include the PR number in both entries, and to link to the new doc from the release note.
|
3b73ee3
to
c4e77f1
Compare
123ab6b
to
e24a904
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you investigated how to make an end to end test with an example implementation? It would be great to demonstrate real queries working with this connector.
<version>2.0.65.Final</version> | ||
</dependency> | ||
|
||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help me understand why so many exclusions are required in this POM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we excluded dependencies to resolve security issues.should retain the dependencies instead of excluding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The excludes are very brittle, if possible please don't add them
query = Optional.empty(); | ||
} | ||
|
||
public ArrowAbstractFlightRequest(String schema, String table, Optional<String> query) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still see schema, table and query being referenced as if this were JDBC. Can you please remove?
|
||
import static java.util.Objects.requireNonNull; | ||
|
||
public class ArrowExpression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is an arrow expression? Is this a SQL query?
63b388b
to
fa50f08
Compare
@sabbasani is this PR still being worked on? |
Yes @tdcmeehan I am working on testcases as discussed earlier |
7fafbd2
to
e0daebb
Compare
@tdcmeehan I have added testcases for arrow flight connector. Can you please review. |
8b97b44
to
e824379
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding these tests.
I'm wondering if we can use a more robust dataset, and then use a more robust test suite, such as by extending from AbstractTestQueries
.
Please take a look at H2QueryRunner
, which can be used to generate the complete TPCH dataset required by AbstractTestQueries
. We should be able to generate the same dataset between the expected query runner and the Arrow query runner, and verify they return identical results in a new test suite that derives from AbstractTestQueries
.
@Test | ||
public void testShowSchemas() | ||
{ | ||
assertEquals(queryRunner.execute("SHOW SCHEMAS FROM arrow").getRowCount(), 3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you validate the contents as well?
String query1 = "SELECT * FROM testdb.example_table1"; | ||
String expected1 = "1, John Doe, 1990-05-15, 50000.00, true"; | ||
String expected2 = "2, Jane Smith, 1985-11-20, 60000.00, false"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please inline these variables
String expected2 = "2, Jane Smith, 1985-11-20, 60000.00, false"; | ||
|
||
String result1 = queryRunner.execute(query1).toString(); | ||
assertTrue(result1.contains(expected1), "Expected row not found: " + expected1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also check that this is the entirity of the result set, or at least check on rowcount.
{ | ||
// Ensure the catalog and schema names are correct | ||
String query = "SHOW TABLES FROM testdb"; | ||
String[] expectedTables = {"example_table1", "example_table2", "example_table3"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use one line for each
* ``ArrowAbstractFlightRequest.java`` | ||
Implement this class to define the request data, including the data source type, connection properties, the number of partitions and other data required to interact with database. | ||
|
||
* ``ArrowAbstractMetadata.java`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* ``ArrowAbstractMetadata.java`` | |
* ``AbstractArrowMetadata.java`` |
* ``ArrowAbstractMetadata.java`` | ||
To retrieve metadata (schema and table information), implement the abstract methods in the ArrowAbstractMetadata class. | ||
|
||
* ``ArrowAbstractSplitManager.java`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* ``ArrowAbstractSplitManager.java`` | |
* ``AbstractArrowSplitManager.java`` |
1bbf201
to
d329e9f
Compare
a9a311a
to
43e72a2
Compare
|
||
public ArrowFlightClient getClient(Optional<String> uri) | ||
{ | ||
return initializeClient(uri); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not re-using the flight client. This is a change from the approach mentioned in https://github.com/prestodb/rfcs/blob/main/RFC-0004-arrow-flight-connector.md. While testing the connector with multi pod IBM flight server on an open shift cluster, we noticed that re-using the flight client does not distribute the flight get info requests to the different flight server pods. But opening new flight connections for every request distributed the client requests among the different flight pods.
b274a2d
to
b450554
Compare
import static org.testng.Assert.assertTrue; | ||
|
||
public class TestArrowFlightSmoke | ||
extends AbstractTestQueries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tdcmeehan Following your suggestion we are running all the 500 or so tests in AbstractTestQueries
apart from the ones defined here.
82adf97
to
32c9c08
Compare
Co-authored-by: sai bhaskar reddy <sai.bhaskar.reddy.sabbasani1@ibm.com> Co-authored-by: SthuthiGhosh9400 <Sthuthi.Ghosh@ibm.com> Co-authored-by: lithinwxd <Lithin.Purushothaman@ibm.com>
c3de72f
to
f3c57ad
Compare
Description
Motivation and Context
Impact
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.
If release note is NOT required, use: