-
Notifications
You must be signed in to change notification settings - Fork 332
Description
Apache Iceberg version
0.8.1 (latest release)
Please describe the bug 🐞
I have an external volume in Snowflake pointing to an Azure ADLS2:
create or replace external volume ev_iceberg_tables
storage_locations =
((
name = 'iceberg_snowflake_managed'
storage_provider = 'AZURE'
storage_base_url = 'azure://[storage_account].blob.core.windows.net/catalog/snowflake_managed/'
azure_tenant_id = '[tenant]'
))
;
So the container is called “catalog” and the Open Catalog I want to point to is called “snowflake_managed”. Then this is my catalog integration:
create or replace catalog integration i_iceberg_catalog
catalog_source = polaris
table_format = iceberg
catalog_namespace= 'default'
rest_config = (
catalog_uri = 'https://[locator].snowflakecomputing.com/polaris/api/catalog'
warehouse = 'snowflake_managed'
)
rest_authentication = (
type = oauth
oauth_client_id = '[client_id]'
oauth_client_secret = '[client_secret]'
oauth_allowed_scopes = ( 'PRINCIPAL_ROLE:ALL' )
)
enabled = true
;
With this I create a table in the catalog:
create or replace iceberg table iceberg.jira.roadmap (
id int
, [...]
)
external_volume = 'ev_iceberg_tables'
catalog = 'SNOWFLAKE'
base_location = 'jira/roadmap/'
catalog_sync = 'i_iceberg_catalog'
;
This creates the table in Open Catalog and I can populate the table just fine. But when I try to read from the table using pyIceberg or polars, this error is returned:
ValueError: No registered filesystem for scheme: wasbs
So I checked the table's metadata:
from pyiceberg.catalog import load_catalog
from pyiceberg.io.fsspec import FsspecFileIO
catalog = load_catalog(
**{
"type": "rest",
"header.X-Iceberg-Access-Delegation": "vended-credentials",
"uri": f"https://[locator].snowflakecomputing.com/polaris/api/catalog",
"credential": f"[open_catalog_client_id]:[open_catalog_client_secret]",
"scope": "PRINCIPAL_ROLE:pyIceberg",
"warehouse": "snowflake_managed",
"token-refresh-enabled": "true",
"py-io-impl": "pyiceberg.io.fsspec.FsspecFileIO",
}
)
table = catalog.load_table('ICEBERG.JIRA.ROADMAP')
table.metadata
TableMetadataV2(location=‘wasbs://catalog@[storage_account].blob.core.windows.net/snowflake_managed/jira/roadmap’, table_uuid=UUID(‘35b[…]’), last_updated_ms=1738578925967, last_column_id=19, schemas=[Schema(NestedField(field_id=1, name=‘ID’, […], schema_id=0, identifier_field_ids=)], current_schema_id=0, partition_specs=[PartitionSpec(spec_id=0)], default_spec_id=0, last_partition_id=999, properties={‘format-version’: ‘2’}, current_snapshot_id=78408874928435018, snapshots=[Snapshot(snapshot_id=3032990014606473543, parent_snapshot_id=None, sequence_number=1, timestamp_ms=1738578919582, manifest_list=‘wasbs://catalog@[storage_account].blob.core.windows.net/snowflake_managed/jira/roadmap/metadata/snap-1738578919582000000-5714c4a4-11e8-4c0a-b89b-cab4ea909f97.avro’, summary=None, schema_id=0), Snapshot(snapshot_id=78408874928435018, parent_snapshot_id=None, sequence_number=2, timestamp_ms=1738578925967, manifest_list=‘wasbs://catalog@[storage_account].blob.core.windows.net/snowflake_managed/jira/roadmap/metadata/snap-1738578925967000000-fbf8b14b-e0ba-4bf5-bfde-5c6cf88251ad.avro’, summary=Summary(Operation.APPEND, **{‘manifests-kept’: ‘0’, ‘added-files-size’: ‘112128’, ‘total-records’: ‘708’, ‘manifests-created’: ‘1’, ‘total-data-files’: ‘8’, ‘manifests-replaced’: ‘0’, ‘added-data-files’: ‘8’, ‘added-records’: ‘708’, ‘total-files-size’: ‘112128’}), schema_id=0)], snapshot_log=[SnapshotLogEntry(snapshot_id=3032990014606473543, timestamp_ms=1738578919582), SnapshotLogEntry(snapshot_id=78408874928435018, timestamp_ms=1738578925967)], metadata_log=, sort_orders=[SortOrder(order_id=0)], default_sort_order_id=0, refs={‘main’: SnapshotRef(snapshot_id=78408874928435018, snapshot_ref_type=SnapshotRefType.BRANCH, min_snapshots_to_keep=None, max_snapshot_age_ms=None, max_ref_age_ms=None)}, format_version=2, last_sequence_number=2)
Apparently the wasbs scheme was written into the metadata by either Open Catalog or Snowflake, even though the file is actually located in abfss:
table.metadata_location
abfss://catalog@[storage_account].blob.core.windows.net/snowflake_managed/jira/roadmap/metadata/[…].metadata.json
There obviously is a discrepancy between table.metadata
and table.metadata_location
- and I can't table.scan().to_arrow()
the table as a result:
File ~\AppData\Roaming\Python\Python313\site-packages\pyiceberg\io\pyarrow.py:1354, in _fs_from_file_path(file_path, io)
...
408 if scheme not in self._scheme_to_fs:
--> 409 raise ValueError(f"No registered filesystem for scheme: {scheme}")
410 return self._scheme_to_fsscheme
Willingness to contribute
- I can contribute a fix for this bug independently
- I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- I cannot contribute a fix for this bug at this time