Zipkin host collector (agent) #1778

codefromthecrypt · 2017-11-06T01:28:55Z

Especially due to limited runtimes, like PHP, we should consider a host agent project. While we have a lot of tooling in java (ex benchmarked codecs, ability to estimate size etc), running another jvm as sidecar in VMs and for containers might be a difficult sell. Anything light weight is good. Go could be preferable, since managing dependencies is easier.

Options include rolling own hopefully leveraging libraries from zipkin-go, or layering on a future open sourcing of the AWS X-Ray agent. Meanwhile, we should make sure at least 3 parties sign up to help maintain this as there's significant long-term effort. We don't want to build interest for an agent and then drop it after people are using it.

cc @openzipkin/core

basvanbeek · 2017-11-06T02:14:16Z

I would be interested in being one of those parties moving this forward using Go, optionally taking pieces from the Zipkin-Go pakage.

Regardless with which language / ecosystem we end up with I think we could already start specifying our requirements, wishlists, scope, etc.

jcchavezs · 2017-11-06T04:08:40Z

I also have interest on this. Golang is a good option IMO as a portability advantage and also performance wise. I guess as outcome of this issue we might come up with a list of requirements for the agent.

connectwithnara · 2017-11-07T07:02:57Z

Brave is sufficient for services implemented in Java. However for JS and other runtimes it may become tedious to repeat implementing common aspects of the tracing client like compression, batching and posting data to backend. These aspects can be implemented in the agent instead. The sampling decision though should be taken in the library.

We should also think about compatibility between tracing library version and the agent because there will scenarios where the agent is not updated but the client library is and so on. This also means that we should avoid tight coupling between them.

We should probably start listing out what we want to implement in the library and the agent.

codefromthecrypt · 2017-11-07T07:08:08Z

FWIW I don't expect everyone to be able to use an agent, so almost certainly js will continue to have its own library-focused post mechanisms. In zipkin-js this is already implemented albeit not as advanced as java (already does http batching and compression is actually quite simple)

codefromthecrypt · 2017-11-07T07:09:49Z

for example, browsers and native apps won't be able to use an agent, neither will android (java) or most clients.. Not suggesting we don't do an agent, just reminding it isn't a magic bullet for all applications.

jcchavezs · 2017-11-11T19:04:42Z

I just started working on an agent. Have a look at https://github.com/jcchavezs/zipkin-agent

hexchain · 2017-11-18T15:48:32Z

I wrote an agent like this several months ago, mainly because we have lots of microservices written in Python and running in small containers, which connect to Kafka directly to send spans. And that's just too many connections for Kafka brokers.

All it does is to receive spans from a UDP port and then sends them in batch to Kafka.

eirslett · 2017-11-18T20:36:49Z

There's already an existing open source project you can use for this, fluentd. I used it with Scribe (before Zipkin went to kafka town), it works pretty well. The main project is written in Ruby, but parts of it have been ported to Go. We might need to write some documentation about how you can use it with Zipkin efficiently.

hexchain · 2017-11-19T06:22:17Z

@eirslett Seems that fluentd cannot combine multiple JSON document to a list and send the list in one message. Reducing the amount of Kafka messages is important to me.

codefromthecrypt · 2017-11-19T06:40:39Z

Fwiw I think recent py_zipkin allows bundling multiple spans in same message. Might require some work to do it smartly

…

On 19 Nov 2017 13:22, "Haochen Tong" ***@***.***> wrote: @eirslett <https://github.com/eirslett> Seems that fluentd cannot combine multiple JSON document to a list and send the list in one message. Reducing the amount of Kafka messages is important to me. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1778 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAD61xjzFZnkYXpsQjAxIo5hottxSkU6ks5s38kbgaJpZM4QSoCV> .

hexchain · 2017-11-19T06:50:20Z

We do have such mechanism in our tracing library, but I still prefer to have this kind of batching in agent, simply because the agent receives spans from all containers on one host so batching can be done more efficiently.

codefromthecrypt · 2017-11-20T01:54:54Z

I think we should at least consider fluentd as a custom agent is a long term responsibility, and finn (where eirik used to work) is a large site. Also, it can help people migrate off scribe. Right now, we have a problem which is people are pinned to scribe and this carries long-term weight on the project. Switching to fluentd can help with this, as it could localize the thrift+scribe dep to fluentd, which needs it anyway. Fluentd could emit to zipkin on a more supportable protocol. Finally, it can make it easier to adopt.. custom agents are even less familiar than a plugin into an existing agent. If fluentd had (re-)bundling capability, are there any other holdbacks? @jcchavezs would you be open to porting your work to a fluentd plugin? Does anyone have some time for due diligence on this option? Personally, I am very interested in this, for scribe deprecation alone! https://www.fluentd.org/

jcchavezs · 2017-11-25T09:00:12Z

@hexchain regarding the combination of json and Kafka @stakhiv has an idea on how to make it work in a overheadless way.

devinsba · 2019-05-09T14:56:52Z

I've been thinking about this one a lot, with the change to armeria we could get a pretty small server image with just the http collector and a passthrough storage shim. I'm wondering if there is an appetite here still. I know we don't particularly want to go down the route of reimplementing all of this in go due to duplicating work.

Thoughts?

anuraaga · 2019-05-09T15:11:11Z

Do you think to use Graal to remove the JVM? I think even with armeria it would be a fairly heavyweight sidecar still unless using Graal. But then it'd be quite small (I have toyed with Graal and have failed in creating a server that supports TLS but a sidecar wouldn't need it).

I wonder though if people can just use envoy as a zipkin sidecar though.

yurishkuro · 2019-05-09T15:22:08Z

Have you folks considered OpenCensus agent?

jcchavezs · 2019-05-09T15:30:04Z

I love this. I think it would be way easier for everyone to have this as sidecar if we deliver a binary. I started something related to this not so long ago https://github.com/jcchavezs/zipkin-agent but stopped it for a while. My idea was to cover the use case of PHP batching but there are many possibilities here. Before start my own work on this I thought on using fluent and fluentbit but configuration was a big issue. That said, are you guys also up to write this in a non java? tor. 9. mai 2019, 17:11 skrev Anuraag Agrawal <notifications@github.com>:

…

Do you think to use Graal to remove the JVM? I think even with armeria it would be a fairly heavyweight sidecar still unless using Graal. But then it'd be quite small (I have toyed with Graal and have failed in creating a server that supports TLS but a sidecar wouldn't need it). I wonder though if people can just use envoy as a zipkin sidecar though. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1778 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAXOYAWQ453CLCKN6JGEFQDPUQ5KHANCNFSM4ECKQCKQ> .

devinsba · 2019-05-09T17:21:56Z

So for me the answer for both "maybe try OpenCensus agent" and "write it in something other than java" are the same, if we do that we lose the ability to leverage the fairly extensive library of reporters that are already written in java. In my mind the power here would be in allowing the languages that do not support them to report spans over the ubiquitous http which would end up in kafka/kinesis/sqs/(whatever the future holds)

If we do feel that another language suits this better (smaller binary, faster startup, whatever) then we would want to write a compatibility test suite that could be run against both zipkin and the separate application to validate that both apps behave the same way for the same sets of input and follow the specifications of the API. Incidentally this would allow third-parties to also validate their implementations

codefromthecrypt · 2019-05-10T00:47:32Z

I think this issue is pie in the sky and also wouldn't affect the codebase in this repo.

I think people close to the codebase here will know there's extensive work vs ticking boxes, for example how and which logs are written, metrics are emitted, what can be supported or extended, how many people can and how close are they to the project. I won't troll by citing numerous examples of projects either not prioritizing things like format parity, data size, or abandonment. Suffice to say either a 3rd party or 1st party clone isn't going to replace this server. If someone wants to (as they always could have), they can write a contrib proxy, make it popular etc, or help other proxies like pitchfork or census.

Meanwhile, we undersell largely our own server. While we are focused on a lot of things, we've updated this to literally use the same infra as those who left twitter with the experience of the first attempt (finagle -> armeria). We also have numerous works in progress to reduce memory overhead per request and also address things like rate limiting. Duplicating all of this in a new language for the sake of it is expensive. Again folks can, but personally I see no advantage intentionally not improving our server, especially after all the investments we've made.

So, basically I agree with @devinsba and @anuraaga .. if there's concrete concern about which JVM should be used, we can address that in docker image. If there are overhead improvements, nothing to stop them happening here. If someone wants to experiment with another agent, there are places to do that including 3rd party repos, personal repos and contrib.

Meanwhile, this repo is in a different org now, apache. If we did anything else, that would either not be in this org, or a new incubator entry. Suffice to say this issue is out-of-date, even if insightful, so closing.

Thanks to all for the feedback!

codefromthecrypt mentioned this issue Nov 6, 2017

Explore possibilities for bulk reporting openzipkin/zipkin-php#22

Closed

This was referenced Nov 6, 2017

Storage support using InfluxDB interesting? #1628

Open

Possible to consider a lightweight version? openzipkin/zipkin-gcp#38

Closed

codefromthecrypt mentioned this issue Dec 5, 2017

Zipkin Collector Sampling based on Traces/End To End Transaction & not based on individual spans #1835

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zipkin host collector (agent) #1778

Zipkin host collector (agent) #1778

codefromthecrypt commented Nov 6, 2017

basvanbeek commented Nov 6, 2017

jcchavezs commented Nov 6, 2017

connectwithnara commented Nov 7, 2017

codefromthecrypt commented Nov 7, 2017 via email

codefromthecrypt commented Nov 7, 2017 via email

jcchavezs commented Nov 11, 2017

hexchain commented Nov 18, 2017 •

edited

Loading

eirslett commented Nov 18, 2017 •

edited

Loading

hexchain commented Nov 19, 2017

codefromthecrypt commented Nov 19, 2017 via email

hexchain commented Nov 19, 2017

codefromthecrypt commented Nov 20, 2017 via email

jcchavezs commented Nov 25, 2017

devinsba commented May 9, 2019

anuraaga commented May 9, 2019

yurishkuro commented May 9, 2019

jcchavezs commented May 9, 2019 via email

devinsba commented May 9, 2019

codefromthecrypt commented May 10, 2019

Zipkin host collector (agent) #1778

Zipkin host collector (agent) #1778

Comments

codefromthecrypt commented Nov 6, 2017

basvanbeek commented Nov 6, 2017

jcchavezs commented Nov 6, 2017

connectwithnara commented Nov 7, 2017

codefromthecrypt commented Nov 7, 2017 via email

codefromthecrypt commented Nov 7, 2017 via email

jcchavezs commented Nov 11, 2017

hexchain commented Nov 18, 2017 • edited Loading

eirslett commented Nov 18, 2017 • edited Loading

hexchain commented Nov 19, 2017

codefromthecrypt commented Nov 19, 2017 via email

hexchain commented Nov 19, 2017

codefromthecrypt commented Nov 20, 2017 via email

jcchavezs commented Nov 25, 2017

devinsba commented May 9, 2019

anuraaga commented May 9, 2019

yurishkuro commented May 9, 2019

jcchavezs commented May 9, 2019 via email

devinsba commented May 9, 2019

codefromthecrypt commented May 10, 2019

hexchain commented Nov 18, 2017 •

edited

Loading

eirslett commented Nov 18, 2017 •

edited

Loading