-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zipkin host collector (agent) #1778
Comments
I would be interested in being one of those parties moving this forward using Go, optionally taking pieces from the Zipkin-Go pakage. Regardless with which language / ecosystem we end up with I think we could already start specifying our requirements, wishlists, scope, etc. |
I also have interest on this. Golang is a good option IMO as a portability advantage and also performance wise. I guess as outcome of this issue we might come up with a list of requirements for the agent. |
Brave is sufficient for services implemented in Java. However for JS and other runtimes it may become tedious to repeat implementing common aspects of the tracing client like compression, batching and posting data to backend. These aspects can be implemented in the agent instead. The sampling decision though should be taken in the library. We should also think about compatibility between tracing library version and the agent because there will scenarios where the agent is not updated but the client library is and so on. This also means that we should avoid tight coupling between them. We should probably start listing out what we want to implement in the library and the agent. |
FWIW I don't expect everyone to be able to use an agent, so almost
certainly js will continue to have its own library-focused post mechanisms.
In zipkin-js this is already implemented albeit not as advanced as java
(already does http batching and compression is actually quite simple)
|
for example, browsers and native apps won't be able to use an agent,
neither will android (java) or most clients.. Not suggesting we don't
do an agent, just reminding it isn't a magic bullet for all
applications.
|
I just started working on an agent. Have a look at https://github.com/jcchavezs/zipkin-agent |
I wrote an agent like this several months ago, mainly because we have lots of microservices written in Python and running in small containers, which connect to Kafka directly to send spans. And that's just too many connections for Kafka brokers. All it does is to receive spans from a UDP port and then sends them in batch to Kafka. |
There's already an existing open source project you can use for this, fluentd. I used it with Scribe (before Zipkin went to kafka town), it works pretty well. The main project is written in Ruby, but parts of it have been ported to Go. We might need to write some documentation about how you can use it with Zipkin efficiently. |
@eirslett Seems that fluentd cannot combine multiple JSON document to a list and send the list in one message. Reducing the amount of Kafka messages is important to me. |
Fwiw I think recent py_zipkin allows bundling multiple spans in same
message. Might require some work to do it smartly
…On 19 Nov 2017 13:22, "Haochen Tong" ***@***.***> wrote:
@eirslett <https://github.com/eirslett> Seems that fluentd cannot combine
multiple JSON document to a list and send the list in one message. Reducing
the amount of Kafka messages is important to me.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1778 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAD61xjzFZnkYXpsQjAxIo5hottxSkU6ks5s38kbgaJpZM4QSoCV>
.
|
We do have such mechanism in our tracing library, but I still prefer to have this kind of batching in agent, simply because the agent receives spans from all containers on one host so batching can be done more efficiently. |
I think we should at least consider fluentd as a custom agent is a long
term responsibility, and finn (where eirik used to work) is a large site.
Also, it can help people migrate off scribe. Right now, we have a problem
which is people are pinned to scribe and this carries long-term weight on
the project. Switching to fluentd can help with this, as it could localize
the thrift+scribe dep to fluentd, which needs it anyway. Fluentd could emit
to zipkin on a more supportable protocol.
Finally, it can make it easier to adopt.. custom agents are even less
familiar than a plugin into an existing agent.
If fluentd had (re-)bundling capability, are there any other holdbacks?
@jcchavezs would you be open to porting your work to a fluentd plugin? Does
anyone have some time for due diligence on this option? Personally, I am
very interested in this, for scribe deprecation alone!
https://www.fluentd.org/
|
I've been thinking about this one a lot, with the change to armeria we could get a pretty small server image with just the http collector and a passthrough storage shim. I'm wondering if there is an appetite here still. I know we don't particularly want to go down the route of reimplementing all of this in go due to duplicating work. Thoughts? |
Do you think to use Graal to remove the JVM? I think even with armeria it would be a fairly heavyweight sidecar still unless using Graal. But then it'd be quite small (I have toyed with Graal and have failed in creating a server that supports TLS but a sidecar wouldn't need it). I wonder though if people can just use envoy as a zipkin sidecar though. |
Have you folks considered OpenCensus agent? |
I love this. I think it would be way easier for everyone to have this as
sidecar if we deliver a binary.
I started something related to this not so long ago
https://github.com/jcchavezs/zipkin-agent but stopped it for a while. My
idea was to cover the use case of PHP batching but there are many
possibilities here.
Before start my own work on this I thought on using fluent and fluentbit
but configuration was a big issue.
That said, are you guys also up to write this in a non java?
tor. 9. mai 2019, 17:11 skrev Anuraag Agrawal <notifications@github.com>:
… Do you think to use Graal to remove the JVM? I think even with armeria it
would be a fairly heavyweight sidecar still unless using Graal. But then
it'd be quite small (I have toyed with Graal and have failed in creating a
server that supports TLS but a sidecar wouldn't need it).
I wonder though if people can just use envoy as a zipkin sidecar though.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1778 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAXOYAWQ453CLCKN6JGEFQDPUQ5KHANCNFSM4ECKQCKQ>
.
|
So for me the answer for both "maybe try OpenCensus agent" and "write it in something other than java" are the same, if we do that we lose the ability to leverage the fairly extensive library of reporters that are already written in java. In my mind the power here would be in allowing the languages that do not support them to report spans over the ubiquitous http which would end up in kafka/kinesis/sqs/(whatever the future holds) If we do feel that another language suits this better (smaller binary, faster startup, whatever) then we would want to write a compatibility test suite that could be run against both zipkin and the separate application to validate that both apps behave the same way for the same sets of input and follow the specifications of the API. Incidentally this would allow third-parties to also validate their implementations |
I think this issue is pie in the sky and also wouldn't affect the codebase in this repo. I think people close to the codebase here will know there's extensive work vs ticking boxes, for example how and which logs are written, metrics are emitted, what can be supported or extended, how many people can and how close are they to the project. I won't troll by citing numerous examples of projects either not prioritizing things like format parity, data size, or abandonment. Suffice to say either a 3rd party or 1st party clone isn't going to replace this server. If someone wants to (as they always could have), they can write a contrib proxy, make it popular etc, or help other proxies like pitchfork or census. Meanwhile, we undersell largely our own server. While we are focused on a lot of things, we've updated this to literally use the same infra as those who left twitter with the experience of the first attempt (finagle -> armeria). We also have numerous works in progress to reduce memory overhead per request and also address things like rate limiting. Duplicating all of this in a new language for the sake of it is expensive. Again folks can, but personally I see no advantage intentionally not improving our server, especially after all the investments we've made. So, basically I agree with @devinsba and @anuraaga .. if there's concrete concern about which JVM should be used, we can address that in docker image. If there are overhead improvements, nothing to stop them happening here. If someone wants to experiment with another agent, there are places to do that including 3rd party repos, personal repos and contrib. Meanwhile, this repo is in a different org now, apache. If we did anything else, that would either not be in this org, or a new incubator entry. Suffice to say this issue is out-of-date, even if insightful, so closing. Thanks to all for the feedback! |
Especially due to limited runtimes, like PHP, we should consider a host agent project. While we have a lot of tooling in java (ex benchmarked codecs, ability to estimate size etc), running another jvm as sidecar in VMs and for containers might be a difficult sell. Anything light weight is good. Go could be preferable, since managing dependencies is easier.
Options include rolling own hopefully leveraging libraries from zipkin-go, or layering on a future open sourcing of the AWS X-Ray agent. Meanwhile, we should make sure at least 3 parties sign up to help maintain this as there's significant long-term effort. We don't want to build interest for an agent and then drop it after people are using it.
cc @openzipkin/core
The text was updated successfully, but these errors were encountered: