Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Data Stream Snapshot Restore fails if targeting specific INDICES and it contains Security (even when not imported) #4751

Open
alexmaurizio opened this issue Sep 20, 2024 · 1 comment
Labels
bug Something isn't working triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable.

Comments

@alexmaurizio
Copy link

alexmaurizio commented Sep 20, 2024

Describe the bug

I have a situation where I'm restoring Snapshots between two clusters using an S3 Bucket repository as a temporary storage.

The restore snapshot will fail with a security exception, even when excluding the .opendistro_security, the global state and all other private indexes.
This will fail for DATA STREAMS only, with a security exception, whenever you specify the NAME of the datastream in the indices parameter.
This does not happen if the SOURCE snapshot does only contain the data stream and nothing else, and the full snapshot is restored without using the indices key.

Related component

Storage:Snapshots

To Reproduce

Prerequisites:
Both OpenSearch have configured correct cross-account roles and have the same S3 SnapShot Repository available, if migrating. This is reproducible in the same cluster by taking snapshots and then deleting the snapshotted data streams.

NOTE: The USER used in these calls has ALL_ACCESS, MANAGE_SNAPSHOTS, SECURITY, etc etc. Full access.

  1. On the SOURCE OpenSearch, I can add the repository, then take a full snapshot, including global state:
    PUT _snapshot/temp-repo/snapshot_20240918

  2. The snapshot contains everything, both the datastreams and the .private indexes

{
    "snapshots": [
        {
            "snapshot": "snapshot_20240918",
            "uuid": "REDACTED",
            "version_id": 136347827,
            "version": "2.13.0",
            "remote_store_index_shallow_copy": false,
            "indices": [
                ".kibana_-[redacted]", (a lot)
                ".ds-[DATA_STREAM_NAME]-000xxx", (a lot, different streams)
                ".plugins-flow-framework-templates",
                ".opendistro-reports-instances",
                ".opendistro-ism-managed-index-history-2024.08.26-000839", (a lot)
                ".plugins-ml-model",
                "[REGULAR_INDEXES_REDACTED]",
                ".opensearch-notifications-config",
                ".plugins-flow-framework-config",
                ".opensearch-sap-pre-packaged-rules-config",
                ".plugins-flow-framework-state",
                ".ql-datasources",
                ".opendistro_security", <<< IMPORTANT
                ".tasks",
                ".plugins-ml-model-group",
                ".plugins-ml-stop-words",
                ".plugins-ml-agent",
                ".opensearch-observability",
                ".plugins-ml-config",
            ],
            "data_streams": [
                "[DATA_STREAM_1]",
                "[DATA_STREAM_2]",
                "[DATA_STREAM_3]",
                "[DATA_STREAM_4]",
                "[DATA_STREAM_5]",
                "[DATA_STREAM_6]",
            ],
            "include_global_state": true,
            "state": "SUCCESS",
            "start_time": "---REDACTED---",
            "start_time_in_millis": ---REDACTED---,
            "end_time": "---REDACTED---",
            "end_time_in_millis": 0,
            "duration_in_millis": 0,
            "failures": [],
            "shards": {
                "total": 0,
                "failed": 0,
                "successful": 0
            }
        }
    ]
}
  1. Wait for completion, the new OpenSearch will then display the taken snapshot as completed and available.

  2. From the full snapshot, try to restore ONLY things that you will need, eg, only a single data stream.
    The data stream, nor it's backing indexes, DO NOT exist on the new cluster (2.15).
    Exclude global state
    POST _snapshot/temp-repo/snapshot_20240918/_restore

{
  "indices": "DATA-STREAM-NAME",
  "include_global_state": false,
  "ignore_unavailable": true
}
  1. The restore fails immediately with a security exception with NO roles required.
    image

  2. Additionally, EXCLUDING -.opendistro_security in indices has no effect - same Authorization error

  3. Recovering the .ds underlying indexes work - BUT there will be no Data Stream structure created, so it will be needed to do a manual recovery.

  4. Take a new snapshot, but this time, explicitly EXCLUDE global state and only include the data stream
    This must be done WHEN TAKING THE SNAPSHOT ITSELF, including only the datastream
    PUT _snapshot/temp-repo/snapshot_20240920_only_data_stream

{
    "snapshots": [
        {
            "snapshot": "snapshot_20240920_only_data_stream",
            "uuid": "---REDACTED---",
            "version_id": ---REDACTED---,
            "version": "2.13.0",
            "remote_store_index_shallow_copy": false,
            "indices": [
                ".ds-DATA-STREAM-NAME-000xxx", (only these appear in the indexes x 1000+)
            ],
            "data_streams": [
                "DATA-STREAM-NAME"
            ],
            "include_global_state": false,
            "state": "SUCCESS",
            "start_time": "---REDACTED---",
            "start_time_in_millis": ---REDACTED---,
            "end_time": "---REDACTED---",
            "end_time_in_millis": ---REDACTED---,
            "duration_in_millis": ---REDACTED---,
            "failures": [],
            "shards": {
                "total": ---REDACTED---,
                "failed": 0,
                "successful": ---REDACTED---
            }
        }
    ]
}
  1. Execute the same command as in STEP 3 but without using the indices field (it will result in an Index Not Found error, or a Security Error just like the first execution)
    POST _snapshot/temp-repo/snapshot_20240920_only_data_stream/_restore
{
  "include_global_state": false,
  "ignore_unavailable": true
}

image

  1. Everything works as expected
    image

Expected behavior

When restoring from a full, or partial Snapshot, specifying the name of the index pattern in the field indices should work without security issues OR index not found errors.

At the moment, only restoring the full snapshot works.

TL;DR: This SHOULD WORK FOR ANY SNAPSHOT -> but it does not:
POST _snapshot/temp-repo/any-snapshot-full-or-partial/_restore

{
  "indices": "DATA-STREAM-NAME, -.opendistro_security",
  "include_global_state": false,
  "ignore_unavailable": true
}

SLOW WORKAROUND FOR NOW:
If you cannot take a new partial snapshot and are stuck with a FULL snapshot, which contains special indexes that cannot be recovered without excluding them (eg: .opendistro_security), since you CANNOT use the indices fields, you can

  1. Recover all the underlying .ds-stream-name-000001 to a _recovered status, using rename fields
    POST _snapshot/temp-repo/FULL_SNAPSHOT/_restore
  "indices": ".ds-[DATA-STREAM-NAME]-*",
  "include_global_state": false,
  "ignore_unavailable": true,
  "rename_pattern": "(.+)",
  "rename_replacement": "$1_recovered"
  1. Create a new data stream with the same name and settings of what you need to recreate [DATA-STREAM-NAME]
  2. Do a /_reindex (you need to manage manual rollovers if you need ISM since if you reindex everything it will not apply ISM policies until the end!!) from the .ds-name-000_RECOVERED -> to the new DataStream
{
  "source":{
    "index": ".ds-[DATA-STREAM-NAME]-*_recovered"
  },
  "dest": {
    "index":"[DATA-STREAM-NAME]",
    "op_type": "create"
  }
}

NOTE: If you have a lot of backing indexes, reindex them ONE BY ONE (not using *), and manual _rollover the data stream!

This takes a lot of time (in my case, ETA was +80 hours for ~400GB) but works nicely.

----> If you can take a PARTIAL snapshot, do that, since the recovery is much faster.

Additional Details

Plugins

Host/Environment (please complete the following information):

  • OS: AWS OpenSearch Service
  • Version 2.13 (Source) -> 2.15 (Destination)

Additional context
This is referenced in this closed issue: #2583 and it's still a problem as of OpenSearch 2.15 (AWS version)

@alexmaurizio alexmaurizio added bug Something isn't working untriaged Require the attention of the repository maintainers and may need to be prioritized labels Sep 20, 2024
@cwperks
Copy link
Member

cwperks commented Sep 23, 2024

[Triage] @alexmaurizio Thank you for filing this issue. This looks like a bug. Transferring to security repo for comment.

@cwperks cwperks transferred this issue from opensearch-project/OpenSearch Sep 23, 2024
@cwperks cwperks added triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable. and removed untriaged Require the attention of the repository maintainers and may need to be prioritized labels Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable.
Projects
Status: 🆕 New
Development

No branches or pull requests

2 participants