-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
0012: Event driven tinkerbell #17
Conversation
Signed-off-by: Gaurav Gahlot <gauravgahlot0107@gmail.com>
|
||
Not everything can be a Tink responsibility. | ||
Events are a scalable way to build an extendible system. | ||
This allows different components to tap-in to the event streams and leverage the extension points. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how this event stream
would be different from pub-sub
message model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pub-Sub is fire and forget. This events are persisted and can be replayed (until a TTL is reached). Anyway, pub-sub is a pattern, stream is a communication technique I presume. So I can't really answer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, to understand better I was trying to create an analogy between event stream
as a part of Event-driven pattern and pub-sub model and also, its difference with later.
Reading this file and comment in the link below, i think what you are trying to implement is Event sourcing
https://martinfowler.com/eaaDev/EventSourcing.html pattern and not Event-driven
pattern?
0012/README.md
Outdated
|
||
In the current model, the tink-server is loaded with tons of responsibilities. | ||
Responsibilities like - managing workflow state, handing overs actions to workers, logging the events, and others. | ||
We would like to delegate these responsibilities to respective smaller services/components. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need some more explanation on components. Where will they live in tink architecture?
Secondly, how will these components send/receive the events from the grpc, will it be redirected by tink-server ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need some more explanation on components. Where will they live in tink architecture?
it is not part of this proposal. The separation of concerns won't change here. We are just creating a mechanism that enables third parties to implement their own logic outside of Tinkerbell.
There is no redirection, as you can see in the system graph, events are stored in Postgres and there is a couple of new gRPC request served by the grpc server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I would request some details about this delegate these responsibilities to respective smaller services/components
, may be with an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gianarb and I are working to add more examples and implementation details so that it's clear what we intend to do here.
A new `EventClient` should be developed with the primitive required for the events at least `Watch`. | ||
At the beginning, only a `Watch` function is required but we think a natural evolution will be to serve a function that can be used from other components to fire events. | ||
Or maybe we can do a `Watch` and `Create` straight away. | ||
All the resource clients will have a `Watch` function that will stream events for that particular resource. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After streaming the events what this Watch
function (or the method which triggered this Watch
), intends to do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It returns something that waits and runs your code when an event gets fired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this would be an async implementation instead of polling. I would request a brief idea on how this wait and run on event
model would be implemented, ex how are you planning to keep a thread/process open/sleep/aync until an event is received or would it be similar to RMI or leverage some go library(some guess)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Watch is going to be as simple as an infinite for
loop running in a goroutine. All the events as they occur will be stored in the events table and will be streamed over gRPC and it will be up to the consumer how they want to consume the stream.
@Cbkhare I can't replay for those inline for some reason.
The stream is a grpc stream. The grpc server is currently served by tink-server. I think the answer is tink-server. |
|
||
## Implementation Details | ||
|
||
### PostgreSQL Notifications |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't really make sense at this point in time. Since PG Notify isn't persisted then there's no replay ability. Which means that needs to be handled at a higher layer. So what exactly are we gaining from using NOTIFY/LISTEN? What we'd really need if we wanted to do it in the DB is something more like auditing would be needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this won't be the real implementation, yesterday evening I was thinking about the possibility to implement it differently.
We will persist events directly in a DB table from tink-server. Where they happen. With an insert. And I think we will use listeners only on the event table to get the new line and stream them out.
In the meantime we will look at the auditing feature as well. My direction at this stage is to be 70% focused on a good grpc layer, and the rest is "make it working". When it works we can figure out how to make it better. But probably the v1 will be as I wrote, at some point in the code a new event will be stored in the event table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok makes sense. The nice thing about the trigger based approach is that the DB takes care of populating the events table itself and we don't have to worry about sprinkling code in everywhere we want an event. I'd really consider doing it that way first actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Cbkhare I think this should answer your question. The events need to be persisted into a table. NOTIFYs are from there, new watchers would SELECT to go back in time and then LISTEN for new updates. Some care would be needed to make sure we don't race and drop some events between the two.
Signed-off-by: Gaurav Gahlot <gauravgahlot0107@gmail.com>
Signed-off-by: Gaurav Gahlot <gauravgahlot0107@gmail.com>
|
||
Not everything can be a Tink responsibility. | ||
Events are a scalable way to build an extendible system. | ||
This allows different components to tap-in to the event streams and leverage the extension points. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, to understand better I was trying to create an analogy between event stream
as a part of Event-driven pattern and pub-sub model and also, its difference with later.
Reading this file and comment in the link below, i think what you are trying to implement is Event sourcing
https://martinfowler.com/eaaDev/EventSourcing.html pattern and not Event-driven
pattern?
A new `EventClient` should be developed with the primitive required for the events at least `Watch`. | ||
At the beginning, only a `Watch` function is required but we think a natural evolution will be to serve a function that can be used from other components to fire events. | ||
Or maybe we can do a `Watch` and `Create` straight away. | ||
All the resource clients will have a `Watch` function that will stream events for that particular resource. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this would be an async implementation instead of polling. I would request a brief idea on how this wait and run on event
model would be implemented, ex how are you planning to keep a thread/process open/sleep/aync until an event is received or would it be similar to RMI or leverage some go library(some guess)?
0012/README.md
Outdated
|
||
In the current model, the tink-server is loaded with tons of responsibilities. | ||
Responsibilities like - managing workflow state, handing overs actions to workers, logging the events, and others. | ||
We would like to delegate these responsibilities to respective smaller services/components. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I would request some details about this delegate these responsibilities to respective smaller services/components
, may be with an example.
|
||
The tink-server will only be responsible for generating the events. | ||
It will not contain any business logic that needs to be executed as an event occurs. | ||
Instead it's the consumer who will have to implement the business logic in an idempotent way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how are we handling the scenario where an event was triggered but consumer didn't received it ?
In that case, neither consumer is aware if event was triggered nor tink is aware if event was received ?
Also, a possible of re-delivery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a delivery/retry logic will be an overkill at this point. Every consumer when it connects to a stream of events will specify how old the events returned should be (by default 5m). That way they will receive the events missed during the connect breakdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how are we handling the scenario where an event was triggered but consumer didn't received it ?
We do not have a catalog of consumers. Consumers do not register themself (consumers are normal clients like the tink-cli) if a consumer is not attached to a stream or if it didn't ask for old enough events for the tink-server that consumer does not exist. When a consumer gets implemented it knows that tink-server stores events for some amount of time, it is the responsibility of who implements the consumer to figure out how to be reliable.
The tink-server responsibility is to store and stream events reliably and fastly to whoever asks for them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my comment was deleted. Let me add it again.
Event is streamed to consumer when it occurs and also, consumer is listening onto the stream and is unaware about the occurrence of an event until it receives it.
The tink-server responsibility is to store and stream events reliably and fastly to whoever asks for them
How are we ensuring the streaming events reliably, what happens when the event is lost during the stream, may be due to network issue or service restart? Just like with a REST call we are aware about the status of the execution with status-code and resend it or take corrective actions.
Maybe this should help https://sourcegraph.com/github.com/grpc/grpc-go/-/blob/stream.go#L1331.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is answered in #17
@@ -0,0 +1,233 @@ | |||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of my comments are marked as resolved. I would request some reply on them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was unintentional actually. I was marking resolved those we have responded to and accidentally marked your new comments as well. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np, i understand.
@Cbkhare I deleted a comment by mistake but I think you posted this twice: #17 (comment) |
A few of the comments are about the future direction, and maybe premature at this point. A lot of confusion/questions are coming up on it but its out of scope I think. Suggest the proposal is re-worded a bit to just mention future possibilities as lightly as possible. |
There is no logic to keep track of which events are fired or sent to a consumer. | ||
Every consumer when it connects to a stream of events will specify how old the events returned should be (by default 5m). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this is going to be a lot of work on the clients that would make them more complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you be a bit more precise? It is just a parameter, if they don't want to get old events they can ask for 0s
. I think by default we should avoid the possibility for a client to lose events during reconnection (for example during a consumer's deploy)
Signed-off-by: Gianluca Arbezzano <gianarb92@gmail.com>
I am gonna merge this as it is. It is open for future discussions if something has to change or if it will diverge from the actual implementation. But for now, I think it is good enough |
## Description This PR provides the implementation for event-driven Tinkerbell base model proposed [here](tinkerbell/proposals#17). **Note** We would like to keep `Watch` as the function used to stream events for the new event-driven tink. Therefore, the existing `hardware.Watch` has been renamed to `hardware.DeprecatedWatch`. This will also require some changes in Hegel, so that it can use the `hardware.DeprecatedWatch`. ## Why is this needed Fixes: #294 ## How Has This Been Tested? Unit tests and manual testing. ## How are existing users impacted? What migration steps/scripts do we need? This contains a bc-break. Watch has been renamed to DeprecatedWatch, clients will need to be regenerated and upgraded. ## Checklist: I have: - [ ] updated the documentation and/or roadmap (if required) - [x] added unit or e2e tests - [x] provided instructions on how to upgrade ## Known issues in progress (fixed) - [x] an event being received twice on the same stream - [x] `tink hardware watch <id>` fails to correctly print the hardware data for new events
No description provided.