Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest/cdc: fix cdc/kafka-auth #118952

Closed
wants to merge 1 commit into from

Conversation

wenyihu6
Copy link
Contributor

@wenyihu6 wenyihu6 commented Feb 8, 2024

From kafka 2.0 onwards, host name verification of servers is enabled by default.
ssl.endpoint.identification.algorithm defaults to https which validates server
host name to match the host name in the certificate. This patch fixes the
failure by pre-pending https to the sink connection URL.

Fixes: #118525
Release note: none

From kafka 2.0 onwards, host name verification of servers is enabled by default.
ssl.endpoint.identification.algorithm defaults to `https` which validates server
host name to match the host name in the certificate. This patch fixes the
failure by pre-pending https to the sink connection URL.

Fixes: cockroachdb#118525
Release note: none
@wenyihu6 wenyihu6 requested a review from a team as a code owner February 8, 2024 16:48
@wenyihu6 wenyihu6 requested review from herkolategan and renatolabs and removed request for a team, herkolategan and renatolabs February 8, 2024 16:48
@cockroach-teamcity
Copy link
Member

This change is Reviewable

wenyihu6 added a commit to wenyihu6/cockroach that referenced this pull request Feb 8, 2024
Now that we have fixed kafka auth setup in roachtests, we can change Dial() to
fetch metadata for specific topics only.

Fixes: cockroachdb#116872
Part of: cockroachdb#118952
Release note: none
Copy link
Collaborator

@rharding6373 rharding6373 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Thanks for tracking this down!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jayshrivastava and @wenyihu6)


-- commits line 6 at r1:
Is this only relevant to the roachtests? Is there anything that needs to be updated or recommended in our kafka sink uri documentation? https://www.cockroachlabs.com/docs/v23.2/changefeed-sinks#kafka

@wenyihu6
Copy link
Contributor Author

wenyihu6 commented Feb 8, 2024

-- commits line 6 at r1:

Previously, rharding6373 (Rachael Harding) wrote…

Is this only relevant to the roachtests? Is there anything that needs to be updated or recommended in our kafka sink uri documentation? https://www.cockroachlabs.com/docs/v23.2/changefeed-sinks#kafka

I think this is only relevant to the roachtests. Customers should have kafka nodes properly set up ssl.endpoint.identification.algorithm is a kafka configuration like what we did here

kafkaOauthConfigTmpl = `
. But I'm still confused about why other kafka roachtests didn't fail and about auth in general. It was a pure luck that I got this fixed haha. Hope Steven could shed some light tomorrow.

Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work digging into this so far.

This change isn't quite what we want. You are right that the underlying problem is related to host name verification.

However, what your change here is doing is changing the changefeed URL we use to create the changefeed from one that starts with kafka:// to one that starts with https://kafka://. While I haven't confirmed this, what I suspect is then happening is that when we create a changefeed the "scheme" of the URL is https:// with a host of kafka://rest_of_url. As a result, a cloudstorage sink is started. Now, that cloud storage sink will fail, but not during planning/job creation and this test assumes that any auth failure will happen via job creation. The kafka sink fails during planning because it implements the Dial method which attempts to connect to the sink. The cloud storage sink simply return nil from Dial. As a result, the test passes, but not for the right reason.

I think we have a couple of options on how to fix the underlying error. (1) change the kafka configuration used for controller to broker connections or (2) create a TLS cert with our fully qualified domain name in the DNS alternative names. We can discuss both in detail today.

@wenyihu6
Copy link
Contributor Author

Closing in favor of #119077 - this old PR appeared to fix the problem but is actually using a cloudstorage sink rather than a kafka sink.

// During the deprecation period, we need to keep parsing these as cloudstorage for backwards
// compatibility. Afterwards we'll either remove them or move them to webhook.
case changefeedbase.DeprecatedSinkSchemeHTTP, changefeedbase.DeprecatedSinkSchemeHTTPS:

@wenyihu6 wenyihu6 closed this Feb 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: cdc/kafka-auth failed
4 participants