Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sasl: Enable AWS_MSK_IAM SASL mechanism (#3402) #3496

Closed
wants to merge 14 commits into from

Conversation

garrett528
Copy link

AWS_MSK_IAM is a new SASL mechanism for
authenticating clients to AWS MSK Kafka
clusters and use IAM-based controls to
set Kafka ACLs and permissions. This change
provides support to allow clients to pass
AWS credentials at runtime which is used
to build the SASL payload and authenticate
clients to IAM enabled MSK clusters. It adds
a new SASL mechanism, AWS_MSK_IAM, as well
as configuration options to set the following:

NOTE: As of now, the changes in this PR will allow librdkafka to connect for the valid duration of the credentials provided. For permanent credentials, the connection will stay open indefinitely. For STS temporary credentials, the connection will stay open until those credentials expire. We have some work slated over the next couple of weeks to enable a refresh callback where clients provide updated credentials as needed. We will be following the example from the sasl_oauthbearer to provide a similar mechanism where an event can be handled by the client to provide refreshed credentials. Please expect that to be in a follow-up PR.

@garrett528 garrett528 force-pushed the master branch 5 times, most recently from a472004 to a04046e Compare August 13, 2021 01:08
AWS_MSK_IAM is a new SASL mechanism for
authenticating clients to AWS MSK Kafka
clusters and use IAM-based controls to
set Kafka ACLs and permissions. This change
provides support to allow clients to pass
AWS credentials at runtime which is used
to build the SASL payload and authenticate
clients to IAM enabled MSK clusters. It adds
a new SASL mechanism, AWS_MSK_IAM, as well
as configuration options to set the following:
* AWS access key id
* AWS secret access key
* AWS region
* AWS security token
The SASL handshake requires a specific payload
that is described here:
https://github.com/aws/aws-msk-iam-auth
@edenhill
Copy link
Contributor

I'm sorry Garrett but before you get to far with this PR I need to remind you, as mentioned on the discussion thread already; I really don't like to see inclusion of vendor-specific functionality in librdkafka, so the likelyhood of this PR getting merged is very low.

Highly recommend to take the formal open source route and submit a KIP for your new proposed SASL mechanism, or even better, use an existing standardized non-vendor-specific one (e.g., OAUTHBEARER).

@garrett528
Copy link
Author

garrett528 commented Aug 19, 2021

I'm sorry Garrett but before you get to far with this PR I need to remind you, as mentioned on the discussion thread already; I really don't like to see inclusion of vendor-specific functionality in librdkafka, so the likelyhood of this PR getting merged is very low.

Highly recommend to take the formal open source route and submit a KIP for your new proposed SASL mechanism, or even better, use an existing standardized non-vendor-specific one (e.g., OAUTHBEARER).

thanks for the response. we're going to have to continue on our fork in the meantime since we need this functionality now and can't wait for the KIP process. hopefully there's a path forward to support this functionality in librdkafka since it's a crucial library for the Kafka ecosystem and there are a lot of AWS customers out there (like ourselves) that are being pushed to use IAM auth by their respective security teams.

would you be open to having some sort of SASL plugin mechanism that would allow for custom implementations? it seems like we could build around the rd_kafka_sasl_provider interface to allow callers to provide implementation details that would live outside of the librdkafka repo. if you were open to that, it would be great to collaborate on that design so that you're happy with whatever gets built.

chrisnovakovic and others added 3 commits October 15, 2021 12:27
The test program that is used at compile-time to detect whether zlib is
available fails to compile due to `NULL` being undefined:

```
_mkltmpyos55w.c:5:20: error: use of undeclared identifier 'NULL'
     z_stream *p = NULL;
                   ^
1 error generated.
```

This means that zlib availability is only automatically detected when
using pkg-config.

Import `stddef.h` (which defines `NULL`) in the test program, allowing
zlib to be automatically detected via a compilation check.
@tooptoop4
Copy link

@garrett528 have u built kafkacat that works with IAM MSK too?

@garrett528
Copy link
Author

@garrett528 have u built kafkacat that works with IAM MSK too?

yes, got that working. i'll add to the README in the fork if you're interested. unfortunately, it will involve building librdkafka and kcat from source. it doesn't look like this is going to be accepted any time soon but better the fork than nothing.

@tooptoop4
Copy link

tooptoop4 commented Nov 2, 2021

@garrett528 that would be great. does it work with IAM role (like for java clients) instead of supplying aws keys/tokens that need to be rotated?

@shrinjay-mio
Copy link

shrinjay-mio commented Nov 4, 2021

@garrett528 Hi Garrett, we're looking at using your fork for our .NET application, have you already tried integrating your changes into confluent-kafka-dotnet? If so, how did you achieve that? Just swapping the DLLs that confluence-kafka-dotnet was pointing to?

Further, do you know what consequences this has for maintainability and receiving updates, specifically security updates?

@garrett528
Copy link
Author

garrett528 commented Nov 10, 2021

@tooptoop4 doesn't look like the ./configure call is finding libxml2. you need to make sure pkgconfig can find the corresponding .pc file for it (for Ubuntu 20, it's usually located in /usr/lib/x86_64-linux-gnu/pkgconfig/libxml-2.0.pc and you may need to symlink it to a path where pkgconfig can find it. for example in Ubuntu, ln -sf /usr/lib/x86_64-linux-gnu/pkgconfig/libxml-2.0.pc /usr/lib/x86_64-linux-gnu/pkgconfig/libxml2.pc.

if you don't know where it is, try find / -name libxml-2.0.pc. once that's correctly linked, you should see one of these switch to "ok":

checking for libxml2 (by pkg-config)... failed
checking for libxml2 (by compile)... failed (disable)

and BUILT_WITH GCC GXX PKGCONFIG INSTALL GNULD LDS LIBDL PLUGINS ZLIB SSL SASL_CYRUS ZSTD CURL HDRHISTOGRAM SYSLOG SNAPPY SOCKEM SASL_SCRAM SASL_OAUTHBEARER CRC32C_HW will have "SASL_AWS_MSK_IAM"

@tooptoop4
Copy link

after getting libxml2, got further but next error below:

make[1]: Entering directory `/home/user/librdkafka-1.9.0-AWS_MSK_IAM/src'
WARNING: librdkafka-static.a: Not creating self-contained static library librdkafka-static.a: no static libraries available/enabled
Generating pkg-config file rdkafka-static.pc
Checking librdkafka integrity
librdkafka.so.1                OK
librdkafka.a                   OK
Symbol visibility              OK
make[1]: Leaving directory `/home/user/librdkafka-1.9.0-AWS_MSK_IAM/src'
make[1]: Entering directory `/home/user/librdkafka-1.9.0-AWS_MSK_IAM/src-cpp'
Generating pkg-config file rdkafka++-static.pc
Checking librdkafka++ integrity
librdkafka++.so.1              OK
librdkafka++.a                 OK
make[1]: Leaving directory `/home/user/librdkafka-1.9.0-AWS_MSK_IAM/src-cpp'
make -C examples
make[1]: Entering directory `/home/user/librdkafka-1.9.0-AWS_MSK_IAM/examples'
gcc -I/usr/include/libxml2 -g -O2 -fPIC -Wall -Wsign-compare -Wfloat-equal -Wpointer-arith -Wcast-align  -I../src rdkafka_example.c -o rdkafka_example  \
        ../src/librdkafka.a -lxml2 -lcurl -lm -lcurl   -lzstd   -lsasl2   -lssl -lcrypto   -lcrypto   -lz   -ldl -lpthread -lrt
../src/librdkafka.a(rdkafka_aws.o): In function `rd_kafka_aws_build_string_to_sign':
/home/user/librdkafka-1.9.0-AWS_MSK_IAM/src/rdkafka_aws.c:310: undefined reference to `EVP_MD_CTX_new'
/home/user/librdkafka-1.9.0-AWS_MSK_IAM/src/rdkafka_aws.c:314: undefined reference to `EVP_MD_CTX_free'
../src/librdkafka.a(rdkafka_aws.o): In function `rd_kafka_aws_build_canonical_request':
/home/user/librdkafka-1.9.0-AWS_MSK_IAM/src/rdkafka_aws.c:265: undefined reference to `EVP_MD_CTX_new'
/home/user/librdkafka-1.9.0-AWS_MSK_IAM/src/rdkafka_aws.c:269: undefined reference to `EVP_MD_CTX_free'
collect2: error: ld returned 1 exit status
make[1]: *** [rdkafka_example] Error 1
make[1]: Leaving directory `/home/user/librdkafka-1.9.0-AWS_MSK_IAM/examples'
make: *** [examples] Error 2

@garrett528
Copy link
Author

That is coming from openssl. Did you make sure all of the packages laid out in the Readme are installed on your machine?

@tooptoop4
Copy link

tooptoop4 commented Nov 11, 2021

@garrett528 is below openssl version too old?

openssl version
OpenSSL 1.0.2k-fips 26 Jan 2017

UPDATE: was able to build with this code change https://stackoverflow.com/a/46769674
Tested kcat with it too, works well with access/secret/session passed in but wasn't able to get it to assume role like java clients can

@garrett528
Copy link
Author

@tooptoop4 good to hear it built. the assumeRole should be working. is the role you're assuming set up properly? usually, i have to get temp credentials from the role i want to assume to pass in. also, make sure the role can assume itself

@shrinjay-mio
Copy link

@tooptoop4 doesn't look like the ./configure call is finding libxml2. you need to make sure pkgconfig can find the corresponding .pc file for it (for Ubuntu 20, it's usually located in /usr/lib/x86_64-linux-gnu/pkgconfig/libxml-2.0.pc and you may need to symlink it to a path where pkgconfig can find it. for example in Ubuntu, ln -sf /usr/lib/x86_64-linux-gnu/pkgconfig/libxml-2.0.pc /usr/lib/x86_64-linux-gnu/pkgconfig/libxml2.pc.

if you don't know where it is, try find / -name libxml-2.0.pc. once that's correctly linked, you should see one of these switch to "ok":

checking for libxml2 (by pkg-config)... failed
checking for libxml2 (by compile)... failed (disable)

and BUILT_WITH GCC GXX PKGCONFIG INSTALL GNULD LDS LIBDL PLUGINS ZLIB SSL SASL_CYRUS ZSTD CURL HDRHISTOGRAM SYSLOG SNAPPY SOCKEM SASL_SCRAM SASL_OAUTHBEARER CRC32C_HW will have "SASL_AWS_MSK_IAM"

I'm having the same issue trying to install on a docker container, but the issue is it doesn't even seem like libxml 2 is being installed as attempting to use the find command for the libxml2 package returns nothing. Any idea?

@garrett528
Copy link
Author

@shrinjay-mio what base image are you using? we use ubuntu 20 and it can find the packages. make sure to run sudo apt-get update before trying to install on the image

@shrinjay-mio
Copy link

shrinjay-mio commented Nov 11, 2021

We're using the aspnet:5.0 base image... which I just realized uses Debian 11 and not Ubuntu 20. After digging up an image that uses Ubuntu 20 (mcr.microsoft.com/dotnet/aspnet:5.0.12-focal-amd64), it now seems to work fine. Really sorry about that, I'm not super familiar with docker yet.

@garrett528
Copy link
Author

@shrinjay-mio i'm sure you can figure it out for debian 11 as well. i just haven't had to yet so have no great guidance there.

this is just standard cross OS dependency management and pain points so if you figure it out, please add the information to the README so others can follow.

@shrinjay-mio
Copy link

shrinjay-mio commented Nov 11, 2021

Anyone else getting:

Failed to load the librdkafka native library.
   at Confluent.Kafka.Impl.Librdkafka.TrySetDelegates(List`1 nativeMethodCandidateTypes) 
   ...

When trying to load the librdkafka .so file in a docker container from Confluent.Kafka.Dotnet?

@garrett528
Copy link
Author

@shrinjay-mio not sure about Dotnet but i would make sure that the LD_LIBRARY_PATH includes the path to wherever you copied your .so file.

for example in our Dockerfile, we do COPY librdkafka.so.1 /app and then add ENV LD_LIBRARY_PATH $LD_LIBRARY_PATH:/app so the linker can find it at runtime

@shrinjay-mio
Copy link

Were the files in win32 ever updated to allow this fork to build on windows?

@garrett528
Copy link
Author

No I only work with Linux and osx. Don't even have a windows to test with. Feel free to open a pr with changes

@shrinjay-mio
Copy link

Got it, just working through the changes but wasn't sure if I even needed to make them. I had to start by manually adding the WITH_SASL_AWS_MSK_IAM preprocessor directive to the build. While this did work, there is now another error, "unresolved external symbol _rd_kafka_sasl_aws_msk_iam". I take it this is a build issue, not a platform one, are there any other preprocessor directives I need to add or any further guidance?

@garrett528
Copy link
Author

Are you sure you have all of the required packages installed? Only thing I can think of is that the preprocessor is skipping that definition since a package is missing (like SSL support or something).o

@shrinjay-mio
Copy link

I have openssl, zstd, zlib and libxml installed, is that all?

@garrett528
Copy link
Author

See the comments above regarding the '. /configure' . You also definitely need curl.

@shrinjay-mio
Copy link

Forgot to mention, that's already been installed. I tried to reverse engineer the configure script but it seems all else that was needed was libxml and curl, both of which are installed.

* Fix memory leak in admin requests

Fix a memory leak introduces in ca1b30e in which the arguments to an
admin request were not being freed. Discovered by the test suite for
rust-rdkafka [0].

[0]: https://github.com/fede1024/rust-rdkafka/pull/397/checks?check_run_id=3914902373

* Fix MinGW Travis build issues by breaking test execution into a separate script

* ACL Admin Apis: CreateAcls, DescribeAcls, DeleteAcls

* Minor ACL API adjustments and some small code tweaks

* Add ACL support to CHANGELOG

* Retrieve jwt token from token provider (@jliunyu, confluentinc#3560)

* Fixed typo

* MsgSets with just aborted msgs raised a MSG_SIZE error, and fix backoff (confluentinc#2993)

This also removes fetch backoffs on underflows (truncated responses).

* test 0129: style fix

* test 0105: Fix race condition

* Idempotent producer: save state for removed partitions

.. in case they come back. To avoid silent message loss.

* Remove incorrect comment on mock API

* Fix rkbuf_rkb assert on malformed JoinGroupResponse.metadata

* clusterid() would fail if there were no topics in metadata (confluentinc#3620)

* sasl.oauthbearer.extensions should be optional

Fixes confluentinc/confluent-kafka-python#1269.

* Added AK 3.1.0 to test versions

* Changelog updates

* Bump version to v1.9.0

* sasl.oauthbearer.scope should be optional

According to the section 4.4.2 of RFC 6749, the scope is optional
in the access token request in client credentials flow.

And indeed, for OIDC providers that I find in the wild such as
Amazon Cognito, the scope _is_ optional. If the scope is omitted
from the request, then the returned access token will contain any
and all scope(s) that are configured for the client.

See https://datatracker.ietf.org/doc/html/rfc6749#section-4.4.2

* Fix hang in list_groups() when cluster is unavailable (confluentinc#3705)

This was caused by holding on to an old broker state version that got outdated
and caused an infinite loop, rather than a timeout.

* Style fixes

* Integration test for OIDC (confluentinc#3646)

* Test for trivup

* integration test

* Update code style for existing code at rdkafka_sasl_oauthbearer_oidc.c

* Handle review comment

* tiny fix

* Handle review comments

* misc.c style fix

* Test fixes: OIDC requires AK 3.1, not 3.0

* Test 0113: reset security.protocol when using mock cluster

* Travis: use Py 3.8 (not 3.5) on Xenial builder

* Travis: bump integration test from AK 2.7.0 to 2.8.1

* Fix README release wording

* Improve subscribe() error documentation

* Fix linger.ms/message.timeout.ms config checking (confluentinc#3709)

* Replace deprecated zookeeper flag with bootstrap (@ladislavmacoun, confluentinc#3700)

* Replace deprecated zookeeper flag with bootstrap

Fixes: confluentinc#3699

Signed-off-by: Ladislav Macoun <ladislavmacoun@gmail.com>

* Add backwards compatibility

Signed-off-by: Ladislav Macoun <ladislavmacoun@gmail.com>

* Add assertion for cmd fitting inside buffer

Signed-off-by: Ladislav Macoun <ladislavmacoun@gmail.com>

* Increase command buffer

Signed-off-by: Ladislav Macoun <ladislavmacoun@gmail.com>

* Save one superfluous message timeout toppar scan

* Update to fedora:35 to fix the CentOS 8 build

mock epel-8-x86_64 is now broken in fedora:33:
https://bugzilla.redhat.com/show_bug.cgi?id=2049024

Update to fedora:35 with mock configs:
centos+epel-7-x86_64
centos-stream+epel-8-x86_64

* Add link to tutorial on Confluent Developer

Also fix indenting of bullet list

* Grooming (compilation warnings, potential issues)

Signed-off-by: Sergio Arroutbi <sarroutb@redhat.com>

* fix: acl binding enum checks (@emasab, confluentinc#3741)

* checking enums values when creating or reading AclBinding and AclBindingFilter

* AclBinding destroy array function

* acl binding unit tests

* warnings and fix for unknown enums, test fixes

* int sizes matching the read size

* pointer to the correct broker

* cmake: Use CMAKE_INSTALL_LIBDIR

this ensures that it is portable across platforms e.g. ppc64/linux
uses lib64 not lib

Signed-off-by: Khem Raj <raj.khem@gmail.com>

* Trigger op callbacks regardless for unhandled types in consume_batch_queue() et.al. (confluentinc#3263)

* AppVeyor: Use Visual Studio 2019 image to build since 2015 has TLS problems

The 2015 image fails to donwload openssl due to TLS 1.2 not being available,
or something along those lines.

* mklove: add LD_LIBRARY_PATH to libcurl builder so that runtime checks pass

* Travis: build alpine & manylinux builds with --source-deps-only

This avoids relying on distro installed packages, which isn't very robust.

* Nuget Debian build: use --source-deps-only to avoid external dependencies

* RPM test: Use ubi8 image instead of centos:8

.. since centos is no more

* Curl 7.82.0

* mklove: curl now requires CoreFoundation and SystemConfiguration frameworks on osx

* Test 0128: skip if there's no oauthbearer support

* Test 0128: make thread-safe

* Test 0077: reduce flakyness by expediting compaction

* Update to zlib 1.2.12 and OpenSSL 1.1.1n

* vcpkg: revoke to zlib 1.2.11 since 1.2.12 is not yet available (as vcpkg)

* Travis: Disable mingw dynamic build for now (gcc breakage)

GCC 11 adds a new symbol that is not available in the mingw/msys2 libstdc++,
which makes it impossible to run applications that were built.

Until that's fixed we disable this worker since it will fail anyway.

* mklove: fix formatting of skipped pkg-config checks

* Fix lock order for rk_init_lock to avoid deadlock (non-released regression)

* vcpkg version bumps

* Update release instructions

* Make dynamic MinGW build copy DLLs instead of trying to manipulate PATH (@neptoess, confluentinc#3787)

* Make dynamic MinGW build copy DLLs instead of trying to manipulate PATH

* Remove tag requirement on MinGW dynamic build

Co-authored-by: Bill Rose <neptoess@gmail.com>

* Fix regression from last PR: curl_ldflags

* Reset stored offset on assign() and prevent offsets_store() for unassigned partitions

* Include broker_id in offset reset logs and corresponding consumer errors (confluentinc#3785)

* Txn: properly handle PRODUCER_FENCED in InitPid reply

* Provide reason to broker thread wakeups in debug logs

This will make troubleshooting easier

* rdkafka_performance: include broker in DR printouts

* Make SUBTESTS=.. match all of the subtest format string

* Added file io abstraction

* rdkafka_performance: cut down on the number of poll calls in full-rate mode

* Added test.mock.broker.rtt

* Log mock broker bootstrap.servers addresses when test.mock.num.brokers is set

* Mock brokers now allow compressed ProduceRequests

No decompression or validation is performed.

* Made rd_buf_read|peek_iXX() type safe

* SUB_TEST_SKIP() format verification

* Statistics: let broker.wakeups metric cover all broker wakeups, both IO and cnds

* Improved producer queue wakeups

* Broker thread: don't block on IO if there are ops available

* vcpkg: Update to zlib 1.2.12

* Fix some win32 compilation warnings

* Proper use of rd_socket_close() on Win32

Regression during v1.9.0 development

* Test 0101: missing return after Test::Skip()

* seek() doc clarification (confluentinc#3004)

* Documentation updates

* style-check* now fails on style warnings

* Automatic style fixes

* Some OIDC documentation fixes

* Fix for AWS_MSK_IAM

* Update for new method signature

Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
Co-authored-by: Bill Rose <neptoess@gmail.com>
Co-authored-by: Emanuele Sabellico <emasab@gmail.com>
Co-authored-by: Magnus Edenhill <magnus@edenhill.se>
Co-authored-by: Jing Liu <jl5311@nyu.edu>
Co-authored-by: Matt Clarke <matt.clarke@ess.eu>
Co-authored-by: Leo Singer <leo.singer@ligo.org>
Co-authored-by: Ladislav <ladislavmacoun@gmail.com>
Co-authored-by: Ladislav Snizek <ladislav.snizek@cdn77.com>
Co-authored-by: Lance Shelton <lance.shelton@hammerspace.com>
Co-authored-by: Robin Moffatt <robin@rmoff.net>
Co-authored-by: Sergio Arroutbi <sarroutb@redhat.com>
Co-authored-by: Khem Raj <raj.khem@gmail.com>
Co-authored-by: Bill Rose <wwriv1991@gmail.com>
@gburke-ppb
Copy link

gburke-ppb commented May 10, 2022

Hey guys. Any movement with this in the last while? IAM is something that we are looking to be able to use in many apps, and many languages' libraries are reliant on this one.
Can we get this merged in soon, please?

@edenhill
Copy link
Contributor

Closing this PR to avoid further confusion, see:

#3496 (comment)

@edenhill edenhill closed this May 12, 2022
@ScheerMT
Copy link

@garrett528 - Are you pursuing a KIP to get this officially added?

@garrett528
Copy link
Author

garrett528 commented Aug 1, 2022

@ScheerMT i have not pursued it. i don't think this makes sense to include in a KIP, tbh. this isn't an additional feature or something that should live natively in Kafka's source code. it's simply an implementation of an existing interface (SASL). because of that, it's unlikely that a KIP to just add AWS IAM auth would be accepted.

we've created a fork that implements IAM auth in this library that works. however, if you really need this for your implementation, i would suggest not using librdkafka if possible since support for this doesn't seem like it's something that will be merged to the mainline.

@JohnPreston
Copy link

JohnPreston commented Nov 11, 2022

@garrett528 I think this is an opportunity for a new lib, i.e. openlibrdkafka

As much as I cannot begin to comprehend the amount of work and time spent into this project, and absolutely respect that. Plus my long-gone skills in C guarantee I won't be able to contribute in a code capacity to this, so maybe one would feel I should just keep my mouth shut and be grateful this exists to begin with.

But out of Googling around for "python MSK auth" etc, you end up here.

And one could come across this, read #3496 (comment) and argue that not willing to merge in a "proprietary" authentication method that benefits Confluent's competition is the reason for not take onboard this contribution.

So is this out of not wanting to "help" competition at the cost of letting down a huge (and thankful) user base?

And in the end AWS might simply fork and distribute their own package for this like they had to do for the Glue+Serde serializers.

PS: in the interest of fairness, I can see (as it's been pointed out to me) the AWS' IAM not being OSS, among plenty of other things, why should this lib make this a built-in functionality for that vendor specifically.

@tooptoop4
Copy link

@JohnPreston agree, same issue happening in IBM/sarama#2192 , community demand being blocked

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants