Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserialize Protobuf messages using descriptor files #472

Merged

Conversation

xakassi
Copy link
Contributor

@xakassi xakassi commented Oct 22, 2020

Fix #149

Add ability to deserialize topics containing data in Protobuf format.

To deserialize topics containing data in Protobuf format, you can put descriptor files in some folder descriptors-folder
and set topics mapping: for each topic-regex you can specify descriptor-file name (from descriptors-folder)
and corresponding message types for keys and values. If, for example, keys are not in Protobuf format,
key-message-type can be omitted, the same for value-message-type.

Example configuration:

akhq:
  topic:
    deserialization:
      protobuf:
        descriptors-folder: "/app/protobuf_desc"
        topics-mapping:
          - topic-regex: "album.*"
            descriptor-file: "album.desc"
            value-message-type: "Album"
          - topic-regex: "film.*"
            descriptor-file: "film.desc"
            value-message-type: "Film"
          - topic-regex: "test.*"
            descriptor-file: "other.desc"
            key-message-type: "Row"
            value-message-type: "Envelope"

First I tried to do without configuration for topics - try to deserialize binary data just using all descriptor files from descriptors-folder, but it turned out that messages can be deserialized incorrectly by not-matching descriptor and we cannot understand it dynamically.
So, I added ability to specify regex for topics and corresponding descriptor files and message types for key and values.

Maybe you will consider that it's better to be able to choose descriptors and message types from drop down menu on UI... but for now I did not change UI. The idea with regexes in configuration looks not bad for me.

Please, give me your opinion.

P.S. In unit tests AlbumProto and FilmProto are auto-generated by Protobuf classes, you have no need to review them. Scripts used for generation of Java classes and descriptor classes are provided in /src/test/resources/protobuf_proto folder.

@xakassi xakassi force-pushed the feature/add_protobuf_deserializer branch from 03b2bb8 to b14b90f Compare October 22, 2020 08:25
@tchiotludo
Copy link
Owner

Real good stuff here :)
Will take some time to review, since I'm a noob about protobuf.
Have you think about integration with schema registry ? now it support Protobuf also ?

@xakassi
Copy link
Contributor Author

xakassi commented Oct 22, 2020

I'm a noob about schema registry )) We do not use it in our team, so for now I can't make an integration. But maybe I will learn something about Schema Registry further.

@xakassi xakassi force-pushed the feature/add_protobuf_deserializer branch 4 times, most recently from 22c4787 to ac7009d Compare October 22, 2020 14:11
@tchiotludo
Copy link
Owner

I look at this quickly, really thanks for the hard works 👍
Here some review, excuse if it's not reliable for protobuf, I don't use it myself :

  • First thing, I would like to avoid the to directory to mount (since we have a lot docker user and I would prefer to have definition directly in configuration files, not in a folder that user will need to mount. Maybe you can change the configuration properties instead of mount directory ?

  • Second: You use the same definition for all cluster in akhq. I think it will be nice to have different definition for cluster, maybe a global definition and a potential override for a cluster (dev def and prod def can be different)

The regexp seems to be fine IMO since it allow a large use case, IMO no need to have a UI change for this, since the user will need to upload new definition, the ui select seems to be useless and the regexp is fine.

@xakassi
Copy link
Contributor Author

xakassi commented Oct 28, 2020

Regarding the first thing, do you mean to define Protobuf descriptors right in AKHQ configuration? I'm afraid it's not possible, because Protobuf descriptor is binary data. I also think that directory mount is not fine, but when you start in docker-compose it's not a problem to mount volume, and when you start in Kubernetes I really hope that we can use Config Maps to store Protobuf descriptors. As far as I know, Config Maps can be created from file and it's not a problem that the file contains binary data (not plain text). So Persistent Volumes will not be necessary. I will check is it possible to create Config Map using descriptor file.

Regarding the second thing, good point, thanks, I just did not think that there are can be several clusters. I will make changes.

@tchiotludo
Copy link
Owner

For the first point, why not just need to base64 in the files in the configuration ?

@xakassi
Copy link
Contributor Author

xakassi commented Oct 28, 2020

Hm, interesting idea, I will try with base64.

@xakassi xakassi force-pushed the feature/add_protobuf_deserializer branch from ac7009d to 0a48ec5 Compare November 2, 2020 12:47
@xakassi
Copy link
Contributor Author

xakassi commented Nov 2, 2020

Hi, @tchiotludo !
I have pushed changes - now I use base64 descriptors (it works fine) and specify deserialization config for each Kafka cluster.
Please, take a look when you have time :)

P.S. I made "ammend last commit" and force push, because it was easier for me and changes were significant.

Copy link
Owner

@tchiotludo tchiotludo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be a really good stuff !

I've comment some minor change on review to have Protobuf everywhere and to keep the whole logic on CustomDeserializerRepository.

Just few changed and merged !

src/main/java/org/akhq/models/Record.java Outdated Show resolved Hide resolved
src/main/java/org/akhq/modules/KafkaModule.java Outdated Show resolved Hide resolved
src/main/java/org/akhq/modules/KafkaModule.java Outdated Show resolved Hide resolved
@xakassi xakassi force-pushed the feature/add_protobuf_deserializer branch from 0a48ec5 to 5ff02de Compare November 5, 2020 06:39
@xakassi xakassi force-pushed the feature/add_protobuf_deserializer branch from 5ff02de to 27c6ac6 Compare November 5, 2020 06:49
@xakassi
Copy link
Contributor Author

xakassi commented Nov 5, 2020

I have made changes, removed indent changes in all files and fix other observations. Seems, now it should be ok and ready for merge :)

@tchiotludo tchiotludo merged commit 726e019 into tchiotludo:dev Nov 11, 2020
@tchiotludo
Copy link
Owner

Seems to be perfect :)
Let's merge this and wait feedback from community !

@jorgheymans may you can try the dev branch please and see if it's work like your fork ?

Thanks @xakassi for these one !

@jorgheymans
Copy link
Contributor

Interesting, i was not aware of this PR. I'll have a look how it works for our case, thanks @xakassi !

@xakassi
Copy link
Contributor Author

xakassi commented Nov 11, 2020

Great!
I've also learnt a few things about Schema Registry recently and maybe I can make an integration for Protobuf later :)

@tchiotludo
Copy link
Owner

It will be great, I think people is waiting for it !
Whatever, thanks for the long awaited features 👍

@jorgheymans
Copy link
Contributor

jorgheymans commented Nov 11, 2020 via email

@jorgheymans
Copy link
Contributor

jorgheymans commented Nov 16, 2020

Thanks for the effort @xakassi !

Can we keep the proto definition external to the akhq configuration file ?

  • Embedding multiline strings in YAML is a pain
  • We currently have more than 200kb of proto definitions . That's an enormous amount of base64 text making the config file unreadable
  • Going forward, when proto definitions change (these are not static one-off definitions, they evolve), it would be great if akhq can pick these up dynamically. Embedding them as static base64 config makes developing such tooling around this very hard.

So i would propose to keep the external file loading, or even better to support spring-like url protocols file:, classpath: , http: so that it can cater for any usecase. A simple periodical or triggered reload would then work really well.

@tchiotludo
Copy link
Owner

good catch @jorgheymans .
The default implementation from @xakassi was file based and I asked him to changed to base64.
The main reason is most of akhq users are docker & k8s based, the base64 options is simplest with this env, you don't need to afford volume.
I understand the file based mostly for jar user so, so many we can add an option to add them as file system, the old commit is here : 03b2bb8, if anyone want to make a PR !
But I don't think I will provide this option for k8s through helm charts, so maybe a warning on doc will be fine.

@jorgheymans
Copy link
Contributor

jorgheymans commented Nov 16, 2020

Sure, i think both options should be present. @xakassi would you mind reinstating the file based descriptor loading alongside the base64 ?

@xakassi
Copy link
Contributor Author

xakassi commented Nov 18, 2020

Hi, guys!
Ok, I can add an option with file based descriptor loading. I'm a bit busy now, but I think I can do it on the next week.

@igalbk
Copy link

igalbk commented Jan 28, 2021

@xakassi @tchiotludo This is a very nice enhancement. We use this UI here, in VMware. AKHQ increases our productivity. We waited for a long time for Protubuf deserialization.
Managing descriptor files in config is a bit tedious, we have a very large number of descriptors and they change from time to time.
Any plans to make the changes more dynamic?
What about Confluent Schema registry, they added Protobuf support recently...

@tchiotludo
Copy link
Owner

Thanks @igalbk for the report ;)
Glad to know VMware is using it, don't hesite to fill that 😄
The confluent schema registry will be the next step.
Maybe you can find people in VMware to contribute ?

@igalbk
Copy link

igalbk commented Jan 28, 2021

@tchiotludo sure, I will check about it. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

package custom serde classes
4 participants