Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC/WIP] Embed client in producer and consumer #430

Closed
wants to merge 1 commit into from

Conversation

eapache
Copy link
Contributor

@eapache eapache commented Apr 20, 2015

@eapache
Copy link
Contributor Author

eapache commented Apr 20, 2015

CI will probably fail because I didn't update the mocks.

@eapache
Copy link
Contributor Author

eapache commented Apr 20, 2015

This more-or-less requires we land a proper mock-client to embed in our mock-producer and mock-consumer.

@wvanbergen
Copy link
Contributor

Hmmm; it's not possible in go to make Client itself private, and only export its methods, right?

This might be overkill after all. I see legitimate use cases for exporting Topics() and Partitions() for the consumer (to figure out what topics and partitions you are going to consume), and potentially they could also be useful for the producer (e.g. validating whether a topic exists before producing a message, or to help setting message.Partition when using the ManualPartitioner). All the other Client methods I don't really see a likely use case for.

@eapache
Copy link
Contributor Author

eapache commented Apr 20, 2015

Hmmm; it's not possible in go to make Client itself private, and only export its methods, right?

Not automagically like this AFAIK.

All the other Client methods I don't really see a likely use case for.

I was still thinking that the basic CommitOffset helper and friends should live on the client, since they don't actually have anything to do with any state that the consumer stores...

I agree this feels kind of weird, but so does manually proxying a handful of methods. I agree the user shouldn't have to create a client in order to use the high-level consumer in the common case, but I'm not sure that's as true for the (current) low-level consumer.

@wvanbergen
Copy link
Contributor

I was still thinking that the basic CommitOffset helper and friends should live on the client, since they don't actually have anything to do with any state that the consumer stores...

On the other hand, that CommitOffset, FetchOffset, and (Refresh)Coordinator are only relevant when consuming, even though they don't have anything to do with the state Consumer stores per se. Whenever you want to use them, you will always have a Consumer instance available, but not necessarily a Client. So I think it's a good idea to have them available on the Consumer type; whether that means they should be implemented on that type or proxy to Client, is a different question.

I agree the user shouldn't have to create a client in order to use the high-level consumer in the common case, but I'm not sure that's as true for the (current) low-level consumer.

We don't have a high-level consumer yet, and even once we do, there still is a place for using the low level consumer. So I think we shouldn't neglect the API usability of that one.

@eapache
Copy link
Contributor Author

eapache commented Apr 20, 2015

Whenever you want to use them, you will always have a Consumer instance available

Not if you're writing your own consumer implementation. Honestly, the only reason I ever published Client in the first place was because the original producer implementation was so simple I exposed the primitives in case somebody wanted to write their own. I've always approached Client as "all the primitives you need to write your own producer/consumer".

We don't have a high-level consumer yet, and even once we do, there still is a place for using the low level consumer.

Yes, but is there a use case for the low-level consumer that involves calling Partitions or Topics? All the cases I can think of that currently involve low-level consumer in combination with Partitions/Topics are better-satisfied by the high-level consumer. The cases that remain for the low-level consumer don't involve that.

@eapache
Copy link
Contributor Author

eapache commented Apr 20, 2015

Yes, but is there a use case for the low-level consumer that involves calling Partitions or Topics? All the cases I can think of that currently involve low-level consumer in combination with Partitions/Topics are better-satisfied by the high-level consumer. The cases that remain for the low-level consumer don't involve that.

With our current API guarantees, it's much easier to add APIs than to remove them, so let's work on building the high-level consumer on top of the low-level consumer and the client, and see what proxies we need to add when that's done.

@eapache eapache closed this Apr 20, 2015
@eapache eapache deleted the embed-client branch April 20, 2015 15:28
@eapache
Copy link
Contributor Author

eapache commented Apr 20, 2015

On the other hand, that CommitOffset, FetchOffset, and (Refresh)Coordinator are only relevant when consuming

Slightly tangential: WritablePartitions is only relevant when producing, but I don't think it belongs on the producer, because users of the producer don't care about it, only the producer itself uses it.

Similarly, the methods above are only relevant when consuming, but they should only be relevant to the consumer itself, not to the user of the consumer.

@wvanbergen
Copy link
Contributor

I still think that we should add Topics and Partitions as proxy.

  • The high level consumer is complicated to implement, as it will require adding Zookeeper as a direct dependency, or wait for Kafka 0.9.0. Adding these methods makes the API we currently have more usable, and a lot easier to test.
  • Even if the need for these methods completely disappears once we have the high-level consumer, the maintenance burden is negligible. It's just a proxy to the same method in Client, which we've committed to maintaining anyway.

@eapache
Copy link
Contributor Author

eapache commented Apr 20, 2015

The use cases we are addressing here are:

  • discovery of topics to consume (this should be rare I think, most of the time you know and hard-code which topics you want to consume)
  • discovery of partitions to consume (99% of the time, this is just "all of them")

So just ConsumeTopic(topic) on the low-level consumer might be a better solution?

@wvanbergen
Copy link
Contributor

ConsumeTopic(topic) won't allow you to set different offsets to start at for each partitions [1]. And, that API will definitely be useless once we have a high-level consumer, and will be a lot less trivial to maintain.

[1] unless you make the final argument a map[int32]int64, but then you would need to know how many partitions the topic has to initialize this map anyway.

@eapache
Copy link
Contributor Author

eapache commented Apr 20, 2015

If you have a list of offsets to start at, you know how many partitions are present and don't need to call Partitions at all

@wvanbergen
Copy link
Contributor

But then you would need a different API for when you don't know this yet.

@eapache
Copy link
Contributor Author

eapache commented Apr 20, 2015

It has to be a different path in the caller anyways, depending on whether you pass OffsetOldest or the saved custom value.

@wvanbergen
Copy link
Contributor

var consumers []*PartitionConsumer
partitions, _ := consumer.Partitions("topic")
for _, partition := range partitions
  offset, err := fetchOffset("topic", partition)
  if err != nil {
    offset = sarama.OffsetOldest // or OffsetNewest,
  }
  pc, _ = consumer.ConsumePartition("topic", partition, offset)
  consumers = append(consumers, pc)
}
var consumers []*PartitionConsumer
offsets, err := fetchOffsets("topic")
if err != nil {
  consumers, _ = consumer.ConsumeTopic("topic", offsets)
} else {
  consumers, _ = consumer.ConsumerTopicFromOldest("topic")
  //  consumer.ConsumerTopicFromNewest("topic")
}

I prefer the first option, because I think it's more straightforward, especially in the case when the offsets map doesn't have all partitions in it (e.g. when partitions are added to the topic)

@eapache
Copy link
Contributor Author

eapache commented Apr 20, 2015

I also prefer the first option, it's just the idea of exposing Partitions on the consumer really grates on my soul :P

@eapache
Copy link
Contributor Author

eapache commented Apr 20, 2015

var consumers []*PartitionConsumer
partitions, _ := client.Partitions("topic")
for _, partition := range partitions
  offset, err := fetchOffset("topic", partition)
  if err != nil {
    offset = sarama.OffsetOldest // or OffsetNewest,
  }
  pc, _ = consumer.ConsumePartition("topic", partition)
  consumers := append(consumers, pc)
}

would be three lines longer (to account for creating the client) and feels much cleaner to me

@eapache
Copy link
Contributor Author

eapache commented Apr 20, 2015

This is turning into a 🚲 🏠, let's just add them and I will swallow my pride until we get people confused and I can say I told you so :D

@wvanbergen
Copy link
Contributor

Except that testing is going to be way more complicated because you need to mock client, and is more susceptible to user error (by hardcoding partition numbers or creating a second client).

@wvanbergen
Copy link
Contributor

Anyway, I will just yolo-push. Once we're ready for v2 we can evaluate who was right :P

@eapache
Copy link
Contributor Author

eapache commented Apr 20, 2015

Creating a second client is not a terrible thing; it's not optimally efficient, but it's not going to cause problems.

@eapache
Copy link
Contributor Author

eapache commented Apr 20, 2015

ducks

@wvanbergen
Copy link
Contributor

See #431. This was previously discussed in #366 as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants