Skip to content

Commit

Permalink
Merge branch 'prestodb:master' into iceberg_disable_timestamptz
Browse files Browse the repository at this point in the history
  • Loading branch information
denodo-research-labs authored Jun 6, 2024
2 parents 9307809 + 635335b commit b513c81
Show file tree
Hide file tree
Showing 67 changed files with 747 additions and 295 deletions.
8 changes: 1 addition & 7 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,7 @@ Presto aims to accomplish the above goals for users by creating a broad, powerfu

## Presto Community

The Presto project believes that, while excellence in the code is table stakes for the project, of even greater importance is *how* the project develops the code. In particular, we value:

* Politeness and professionalism in all public forums (GitHub, Slack, mailing lists).
* Helping those who come to the project with questions, issues or code.
* Collaboration and teamwork.

The Presto community is always striving to be a welcoming and inclusive community. We believe that a diverse community is a stronger community, and we welcome all who wish to contribute to the project.
The Presto project believes that, while excellence in the code is table stakes for the project, of even greater importance is *how* the project develops the code. For more information, see [Presto Community](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#presto-community).

## Presto Technical Architecture

Expand Down
40 changes: 34 additions & 6 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,40 @@
# Contributing to Presto

Thanks for your interest in Presto. Our goal is to build a fast, scalable and reliable distributed SQL query engine for running low latency interactive and batch analytic queries against data sources of all sizes ranging from gigabytes to petabytes.
Thanks for your interest in Presto. Our goal is to build a fast, scalable, and reliable distributed SQL query engine for running low latency interactive and batch analytic queries against data sources of all sizes ranging from gigabytes to petabytes.

# What Would You Like to Do?

| Area | Information |
|------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Getting Started | 1. [Build Presto](README.md#user-content-building-presto) <br/>2. Look for [good first issue](https://github.com/prestodb/presto/labels/good%20first%20issue) tickets. <br/> 3. Reference [project boards](https://github.com/prestodb/presto/projects?query=is%3Aopen) for open work. |
| Report Bug | To report a bug, visit Presto's [open issues](https://github.com/prestodb/presto/issues). |
| Contributions | Please read the [contributions](#contributions) section to learn about how you can contribute to Presto, including the submission process, minimum expectations, and guidelines for designing your code. Ready to open a pull request? Be sure to review the [Pull Request guidelines](#pullrequests). |
| Contributor License Agreement ("CLA") | First-time contributors must sign a CLA. For more information see [Contributor License Agreement ("CLA")](#cla). |
| Supporting Users | Reply to questions on the [Slack channel](https://join.slack.com/t/prestodb/shared_invite/enQtNTQ3NjU2MTYyNDA2LTYyOTg3MzUyMWE1YTI3Njc5YjgxZjNiYTgxODAzYjI5YWMwYWE0MTZjYWFhNGMwNjczYjI3N2JhM2ExMGJlMWM), check Presto's [open issues](https://github.com/prestodb/presto/issues) for user questions, or help with [code reviews](#codereviews).
| Need help? | For community support, [ask for help in Slack](https://join.slack.com/t/prestodb/shared_invite/enQtNTQ3NjU2MTYyNDA2LTYyOTg3MzUyMWE1YTI3Njc5YjgxZjNiYTgxODAzYjI5YWMwYWE0MTZjYWFhNGMwNjczYjI3N2JhM2ExMGJlMWM). |

## <a id="requirements">Requirements</a>

## Presto Community

The Presto community values:

* Politeness and professionalism in all public forums such as GitHub, Slack, and mailing lists.
* Helping those who come to the project with questions, issues, or code.
* Collaboration and teamwork.

We strive to be a welcoming and inclusive community. We believe that a diverse community is a stronger community, and we welcome all who wish to contribute to the project.

## Mission and Architecture

See [PrestoDB: Mission and Architecture](https://github.com/prestodb/presto/blob/master/ARCHITECTURE.md).

## Getting Started

Presto's [open issues are here](https://github.com/prestodb/presto/issues). Tag issues that would make a good first pull request for new contributors with [good first issue](https://github.com/prestodb/presto/labels/good%20first%20issue) tag. An easy way to start helping the project is to *file an issue*. Issues can include bugs, new features, or documentation that looks outdated. For community support, [ask for help in Slack](https://join.slack.com/t/prestodb/shared_invite/enQtNTQ3NjU2MTYyNDA2LTYyOTg3MzUyMWE1YTI3Njc5YjgxZjNiYTgxODAzYjI5YWMwYWE0MTZjYWFhNGMwNjczYjI3N2JhM2ExMGJlMWM).
Read Presto's [open issues](https://github.com/prestodb/presto/issues). Tag issues that would make a good first pull request for new contributors with a [good first issue](https://github.com/prestodb/presto/labels/good%20first%20issue) tag. An easy way to start helping the project is to [open an issue](https://github.com/prestodb/presto/issues/new/choose). Issues can include bugs, new features, or outdated documentation.
For community support, [ask for help in Slack](https://join.slack.com/t/prestodb/shared_invite/enQtNTQ3NjU2MTYyNDA2LTYyOTg3MzUyMWE1YTI3Njc5YjgxZjNiYTgxODAzYjI5YWMwYWE0MTZjYWFhNGMwNjczYjI3N2JhM2ExMGJlMWM).

## Contributions
## <a id=contributions>Contributions</a>

Presto welcomes contributions from everyone.

Expand Down Expand Up @@ -334,7 +362,7 @@ Presto committers are defined as [code owners](https://docs.github.com/en/reposi

New committers are approved by majority vote of the TSC ([see TSC charter](https://github.com/prestodb/tsc/blob/master/CHARTER.md)). To become a committer, reach out to an [existing TSC member](https://github.com/prestodb/tsc#members) and ask for their feedback on your eligibility (see: [How to become a Presto Committer?](https://github.com/prestodb/presto/wiki/How-to-become-a-Presto-committer%3F)). Note: to expedite the process, consider creating a document that outlines your Github stats, such as the number of reviews, lines of code added, number of PRs, and outlines particularly outstanding code and review contributions. If the TSC member believes you are eligible, they will submit your nomination to a vote by the TSC, typically in the form of a PR that adds your handle to the `CODEOWNERS` file. The process is complete once the PR is merged.

## Pull Requests
## <a id="pullrequests">Pull Requests</a>
* #### PR size and structure
* A PR can consist of multiple small commits, preferably not more than 20.
* The total number of lines modified in a single PR shall not exceed 5000. An exception to this rule is for changes that include checked in code generated files (such as [presto_protocol.cpp](https://github.com/prestodb/presto/blob/master/presto-native-execution/presto_cpp/presto_protocol/presto_protocol.cpp)).
Expand Down Expand Up @@ -398,7 +426,7 @@ We use the [Fork and Pull model](https://docs.github.com/en/pull-requests/collab
* Instead, review the related code, then draft initial documentation as a separate commit
* Submit without test cases or clear justification for lack thereof
## Code Reviews
## <a id="codereviews">Code Reviews</a>
#### What to do
* Provide explicit feedback on what is needed or what would be better
* Review code with the objective of helping someone land their changes
Expand All @@ -409,6 +437,6 @@ We use the [Fork and Pull model](https://docs.github.com/en/pull-requests/collab
Please refer to our [Code of Conduct](https://github.com/prestodb/tsc/blob/master/CODE_OF_CONDUCT.md).
## Contributor License Agreement ("CLA")
## <a id="cla">Contributor License Agreement ("CLA")</a>
To accept your pull request, you must submit a CLA. You only need to do this once, so if you've done this for one repository in the [prestodb](https://github.com/prestodb) organization, you're good to go. When you submit a pull request for the first time, the communitybridge-easycla bot notifies you if you haven't signed, and provides you with a link. If you are contributing on behalf of a company, you might want to let the person who manages your corporate CLA whitelist know they will be receiving a request from you.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ See [PrestoDB: Mission and Architecture](ARCHITECTURE.md).
* Python 2.4+ (for running with the launcher script)

<details> <!-- from: https://github.com/prestodb/presto/blob/master/README.md -->
<summary><h2>Building Presto</h2></summary>
<summary><a id="building-presto"><h2>Building Presto</h2></a></summary>

### Overview (Java)

Expand Down
2 changes: 1 addition & 1 deletion presto-docs/src/main/sphinx/sql/drop-schema.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Drop the schema ``web``::

Drop the schema ``sales`` if it exists::

DROP TABLE IF EXISTS sales
DROP SCHEMA IF EXISTS sales

See Also
--------
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -304,7 +304,7 @@ public ConnectorOutputTableHandle beginCreateTable(ConnectorSession session, Con
}

FileFormat fileFormat = getFileFormat(tableMetadata.getProperties());
TableMetadata metadata = newTableMetadata(schema, partitionSpec, targetPath, populateTableProperties(tableMetadata, fileFormat));
TableMetadata metadata = newTableMetadata(schema, partitionSpec, targetPath, populateTableProperties(tableMetadata, fileFormat, session));
transaction = createTableTransaction(tableName, operations, metadata);

return new IcebergWritableTableHandle(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,7 @@ public ConnectorOutputTableHandle beginCreateTable(ConnectorSession session, Con

try {
transaction = catalogFactory.getCatalog(session).newCreateTableTransaction(
toIcebergTableIdentifier(schemaTableName), schema, partitionSpec, populateTableProperties(tableMetadata, fileFormat));
toIcebergTableIdentifier(schemaTableName), schema, partitionSpec, populateTableProperties(tableMetadata, fileFormat, session));
}
catch (AlreadyExistsException e) {
throw new TableAlreadyExistsException(schemaTableName);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,7 @@
import static com.facebook.presto.iceberg.IcebergErrorCode.ICEBERG_INVALID_SNAPSHOT_ID;
import static com.facebook.presto.iceberg.IcebergErrorCode.ICEBERG_INVALID_TABLE_TIMESTAMP;
import static com.facebook.presto.iceberg.IcebergPartitionType.IDENTITY;
import static com.facebook.presto.iceberg.IcebergSessionProperties.getCompressionCodec;
import static com.facebook.presto.iceberg.IcebergSessionProperties.isMergeOnReadModeEnabled;
import static com.facebook.presto.iceberg.IcebergTableProperties.getCommitRetries;
import static com.facebook.presto.iceberg.IcebergTableProperties.getFormatVersion;
Expand Down Expand Up @@ -178,6 +179,8 @@
import static org.apache.iceberg.TableProperties.DELETE_MODE_DEFAULT;
import static org.apache.iceberg.TableProperties.FORMAT_VERSION;
import static org.apache.iceberg.TableProperties.MERGE_MODE;
import static org.apache.iceberg.TableProperties.ORC_COMPRESSION;
import static org.apache.iceberg.TableProperties.PARQUET_COMPRESSION;
import static org.apache.iceberg.TableProperties.UPDATE_MODE;
import static org.apache.iceberg.TableProperties.WRITE_LOCATION_PROVIDER_IMPL;
import static org.apache.iceberg.types.Type.TypeID.BINARY;
Expand Down Expand Up @@ -1045,12 +1048,20 @@ public void close()
}
}

public static Map<String, String> populateTableProperties(ConnectorTableMetadata tableMetadata, FileFormat fileFormat)
public static Map<String, String> populateTableProperties(ConnectorTableMetadata tableMetadata, FileFormat fileFormat, ConnectorSession session)
{
ImmutableMap.Builder<String, String> propertiesBuilder = ImmutableMap.builderWithExpectedSize(5);
Integer commitRetries = getCommitRetries(tableMetadata.getProperties());
propertiesBuilder.put(DEFAULT_FILE_FORMAT, fileFormat.toString());
propertiesBuilder.put(COMMIT_NUM_RETRIES, String.valueOf(commitRetries));
switch (fileFormat) {
case PARQUET:
propertiesBuilder.put(PARQUET_COMPRESSION, getCompressionCodec(session).getParquetCompressionCodec().get().toString());
break;
case ORC:
propertiesBuilder.put(ORC_COMPRESSION, getCompressionCodec(session).getOrcCompressionKind().name());
break;
}
if (tableMetadata.getComment().isPresent()) {
propertiesBuilder.put(TABLE_COMMENT, tableMetadata.getComment().get());
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,9 @@ public void init()
assertUpdate("INSERT INTO test_schema.test_table_drop_column VALUES ('c', 3, CAST('2019-09-09' AS DATE)), ('a', 4, CAST('2019-09-10' AS DATE)), ('b', 5, CAST('2019-09-10' AS DATE))", 3);
assertQuery("SELECT count(*) FROM test_schema.test_table_drop_column", "VALUES 6");
assertUpdate("ALTER TABLE test_schema.test_table_drop_column DROP COLUMN _varchar");

assertUpdate("CREATE TABLE test_schema.test_table_orc (_bigint BIGINT) WITH (format_version = '1', format = 'ORC')");
assertUpdate("INSERT INTO test_schema.test_table_orc VALUES (0), (1), (2)", 3);
}

@Test
Expand Down Expand Up @@ -205,6 +208,28 @@ protected void checkTableProperties(String tableName, String deleteMode)
.isEqualTo(new MaterializedRow(MaterializedResult.DEFAULT_PRECISION, "write.delete.mode", deleteMode)))
.anySatisfy(row -> assertThat(row)
.isEqualTo(new MaterializedRow(MaterializedResult.DEFAULT_PRECISION, "write.format.default", "PARQUET")))
.anySatisfy(row -> assertThat(row)
.isEqualTo(new MaterializedRow(MaterializedResult.DEFAULT_PRECISION, "write.parquet.compression-codec", "GZIP")))
.anySatisfy(row -> assertThat(row)
.isEqualTo(new MaterializedRow(MaterializedResult.DEFAULT_PRECISION, "commit.retry.num-retries", "4")));
}

protected void checkORCFormatTableProperties(String tableName, String deleteMode)
{
assertQuery(String.format("SHOW COLUMNS FROM test_schema.\"%s$properties\"", tableName),
"VALUES ('key', 'varchar', '', '')," + "('value', 'varchar', '', '')");
assertQuery(String.format("SELECT COUNT(*) FROM test_schema.\"%s$properties\"", tableName), "VALUES 5");
List<MaterializedRow> materializedRows = computeActual(getSession(),
String.format("SELECT * FROM test_schema.\"%s$properties\"", tableName)).getMaterializedRows();

assertThat(materializedRows).hasSize(5);
assertThat(materializedRows)
.anySatisfy(row -> assertThat(row)
.isEqualTo(new MaterializedRow(MaterializedResult.DEFAULT_PRECISION, "write.delete.mode", deleteMode)))
.anySatisfy(row -> assertThat(row)
.isEqualTo(new MaterializedRow(MaterializedResult.DEFAULT_PRECISION, "write.format.default", "ORC")))
.anySatisfy(row -> assertThat(row)
.isEqualTo(new MaterializedRow(MaterializedResult.DEFAULT_PRECISION, "write.orc.compression-codec", "ZLIB")))
.anySatisfy(row -> assertThat(row)
.isEqualTo(new MaterializedRow(MaterializedResult.DEFAULT_PRECISION, "write.parquet.compression-codec", "zstd")))
.anySatisfy(row -> assertThat(row)
Expand All @@ -217,6 +242,7 @@ public void testPropertiesTable()
// Test table properties for all supported format versions
checkTableProperties("test_table_v1", "copy-on-write");
checkTableProperties("test_table", "merge-on-read");
checkORCFormatTableProperties("test_table_orc", "copy-on-write");
}

@Test
Expand All @@ -242,6 +268,7 @@ public void tearDown()
{
assertUpdate("DROP TABLE IF EXISTS test_schema.test_table");
assertUpdate("DROP TABLE IF EXISTS test_schema.test_table_v1");
assertUpdate("DROP TABLE IF EXISTS test_schema.test_table_orc");
assertUpdate("DROP TABLE IF EXISTS test_schema.test_table_multilevel_partitions");
assertUpdate("DROP TABLE IF EXISTS test_schema.test_table_drop_column");
assertUpdate("DROP TABLE IF EXISTS test_schema.test_table_add_column");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,33 @@ protected void checkTableProperties(String tableName, String deleteMode)
.isEqualTo(new MaterializedRow(MaterializedResult.DEFAULT_PRECISION, "write.format.default", "PARQUET")))
.anySatisfy(row -> assertThat(row.getField(0)).isEqualTo("nessie.commit.id"))
.anySatisfy(row -> assertThat(row).isEqualTo(new MaterializedRow(MaterializedResult.DEFAULT_PRECISION, "gc.enabled", "false")))
.anySatisfy(row -> assertThat(row)
.isEqualTo(new MaterializedRow(MaterializedResult.DEFAULT_PRECISION, "write.parquet.compression-codec", "GZIP")))
.anySatisfy(row -> assertThat(row)
.isEqualTo(new MaterializedRow(MaterializedResult.DEFAULT_PRECISION, "write.metadata.delete-after-commit.enabled", "false")))
.anySatisfy(row -> assertThat(row)
.isEqualTo(new MaterializedRow(MaterializedResult.DEFAULT_PRECISION, "commit.retry.num-retries", "4")));
}

@Override
protected void checkORCFormatTableProperties(String tableName, String deleteMode)
{
assertQuery(String.format("SHOW COLUMNS FROM test_schema.\"%s$properties\"", tableName),
"VALUES ('key', 'varchar', '', '')," + "('value', 'varchar', '', '')");
assertQuery(String.format("SELECT COUNT(*) FROM test_schema.\"%s$properties\"", tableName), "VALUES 8");
List<MaterializedRow> materializedRows = computeActual(getSession(),
String.format("SELECT * FROM test_schema.\"%s$properties\"", tableName)).getMaterializedRows();

assertThat(materializedRows).hasSize(8);
assertThat(materializedRows)
.anySatisfy(row -> assertThat(row)
.isEqualTo(new MaterializedRow(MaterializedResult.DEFAULT_PRECISION, "write.delete.mode", deleteMode)))
.anySatisfy(row -> assertThat(row)
.isEqualTo(new MaterializedRow(MaterializedResult.DEFAULT_PRECISION, "write.format.default", "ORC")))
.anySatisfy(row -> assertThat(row)
.isEqualTo(new MaterializedRow(MaterializedResult.DEFAULT_PRECISION, "write.orc.compression-codec", "ZLIB")))
.anySatisfy(row -> assertThat(row.getField(0)).isEqualTo("nessie.commit.id"))
.anySatisfy(row -> assertThat(row).isEqualTo(new MaterializedRow(MaterializedResult.DEFAULT_PRECISION, "gc.enabled", "false")))
.anySatisfy(row -> assertThat(row)
.isEqualTo(new MaterializedRow(MaterializedResult.DEFAULT_PRECISION, "write.parquet.compression-codec", "zstd")))
.anySatisfy(row -> assertThat(row)
Expand Down
Loading

0 comments on commit b513c81

Please sign in to comment.