Skip to content

Snowflake managed Open Catalog and Azure ADLS2 #1606

@martinseifertprojuventute

Description

Apache Iceberg version

0.8.1 (latest release)

Please describe the bug 🐞

I have an external volume in Snowflake pointing to an Azure ADLS2:

create or replace external volume ev_iceberg_tables
storage_locations =
    ((
        name = 'iceberg_snowflake_managed'
        storage_provider = 'AZURE'
        storage_base_url = 'azure://[storage_account].blob.core.windows.net/catalog/snowflake_managed/'
        azure_tenant_id = '[tenant]'
    ))
;

So the container is called “catalog” and the Open Catalog I want to point to is called “snowflake_managed”. Then this is my catalog integration:

create or replace catalog integration i_iceberg_catalog
catalog_source = polaris
table_format = iceberg
catalog_namespace= 'default'
rest_config = (
    catalog_uri = 'https://[locator].snowflakecomputing.com/polaris/api/catalog'
    warehouse = 'snowflake_managed'
)
rest_authentication = (
    type = oauth
    oauth_client_id = '[client_id]'
    oauth_client_secret = '[client_secret]'
    oauth_allowed_scopes = ( 'PRINCIPAL_ROLE:ALL' )
)
enabled = true
;

With this I create a table in the catalog:

create or replace iceberg table iceberg.jira.roadmap (
    id int
    , [...]
)
external_volume = 'ev_iceberg_tables'
catalog = 'SNOWFLAKE'
base_location = 'jira/roadmap/'
catalog_sync = 'i_iceberg_catalog'
;

This creates the table in Open Catalog and I can populate the table just fine. But when I try to read from the table using pyIceberg or polars, this error is returned:

ValueError: No registered filesystem for scheme: wasbs

So I checked the table's metadata:

from pyiceberg.catalog import load_catalog
from pyiceberg.io.fsspec import FsspecFileIO

catalog = load_catalog(
    **{
        "type": "rest",
        "header.X-Iceberg-Access-Delegation": "vended-credentials",
        "uri": f"https://[locator].snowflakecomputing.com/polaris/api/catalog",
        "credential": f"[open_catalog_client_id]:[open_catalog_client_secret]",
        "scope": "PRINCIPAL_ROLE:pyIceberg",
        "warehouse": "snowflake_managed",
        "token-refresh-enabled": "true",
        "py-io-impl": "pyiceberg.io.fsspec.FsspecFileIO",
    }
)

table = catalog.load_table('ICEBERG.JIRA.ROADMAP')

table.metadata

TableMetadataV2(location=‘wasbs://catalog@[storage_account].blob.core.windows.net/snowflake_managed/jira/roadmap’, table_uuid=UUID(‘35b[…]’), last_updated_ms=1738578925967, last_column_id=19, schemas=[Schema(NestedField(field_id=1, name=‘ID’, […], schema_id=0, identifier_field_ids=)], current_schema_id=0, partition_specs=[PartitionSpec(spec_id=0)], default_spec_id=0, last_partition_id=999, properties={‘format-version’: ‘2’}, current_snapshot_id=78408874928435018, snapshots=[Snapshot(snapshot_id=3032990014606473543, parent_snapshot_id=None, sequence_number=1, timestamp_ms=1738578919582, manifest_list=‘wasbs://catalog@[storage_account].blob.core.windows.net/snowflake_managed/jira/roadmap/metadata/snap-1738578919582000000-5714c4a4-11e8-4c0a-b89b-cab4ea909f97.avro’, summary=None, schema_id=0), Snapshot(snapshot_id=78408874928435018, parent_snapshot_id=None, sequence_number=2, timestamp_ms=1738578925967, manifest_list=‘wasbs://catalog@[storage_account].blob.core.windows.net/snowflake_managed/jira/roadmap/metadata/snap-1738578925967000000-fbf8b14b-e0ba-4bf5-bfde-5c6cf88251ad.avro’, summary=Summary(Operation.APPEND, **{‘manifests-kept’: ‘0’, ‘added-files-size’: ‘112128’, ‘total-records’: ‘708’, ‘manifests-created’: ‘1’, ‘total-data-files’: ‘8’, ‘manifests-replaced’: ‘0’, ‘added-data-files’: ‘8’, ‘added-records’: ‘708’, ‘total-files-size’: ‘112128’}), schema_id=0)], snapshot_log=[SnapshotLogEntry(snapshot_id=3032990014606473543, timestamp_ms=1738578919582), SnapshotLogEntry(snapshot_id=78408874928435018, timestamp_ms=1738578925967)], metadata_log=, sort_orders=[SortOrder(order_id=0)], default_sort_order_id=0, refs={‘main’: SnapshotRef(snapshot_id=78408874928435018, snapshot_ref_type=SnapshotRefType.BRANCH, min_snapshots_to_keep=None, max_snapshot_age_ms=None, max_ref_age_ms=None)}, format_version=2, last_sequence_number=2)

Apparently the wasbs scheme was written into the metadata by either Open Catalog or Snowflake, even though the file is actually located in abfss:

table.metadata_location

abfss://catalog@[storage_account].blob.core.windows.net/snowflake_managed/jira/roadmap/metadata/[…].metadata.json

There obviously is a discrepancy between table.metadata and table.metadata_location - and I can't table.scan().to_arrow() the table as a result:

File ~\AppData\Roaming\Python\Python313\site-packages\pyiceberg\io\pyarrow.py:1354, in _fs_from_file_path(file_path, io)
...
408 if scheme not in self._scheme_to_fs:
--> 409 raise ValueError(f"No registered filesystem for scheme: {scheme}")
410 return self._scheme_to_fsscheme

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions