Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0012: Event driven tinkerbell #17

Merged
merged 5 commits into from
Sep 21, 2020
Merged

0012: Event driven tinkerbell #17

merged 5 commits into from
Sep 21, 2020

Conversation

gauravgahlot
Copy link
Contributor

No description provided.

Signed-off-by: Gaurav Gahlot <gauravgahlot0107@gmail.com>
0012/README.md Outdated Show resolved Hide resolved
0012/README.md Show resolved Hide resolved
0012/README.md Outdated Show resolved Hide resolved

Not everything can be a Tink responsibility.
Events are a scalable way to build an extendible system.
This allows different components to tap-in to the event streams and leverage the extension points.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how this event stream would be different from pub-sub message model.

Copy link
Contributor

@gianarb gianarb Sep 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pub-Sub is fire and forget. This events are persisted and can be replayed (until a TTL is reached). Anyway, pub-sub is a pattern, stream is a communication technique I presume. So I can't really answer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, to understand better I was trying to create an analogy between event stream as a part of Event-driven pattern and pub-sub model and also, its difference with later.

Reading this file and comment in the link below, i think what you are trying to implement is Event sourcing https://martinfowler.com/eaaDev/EventSourcing.html pattern and not Event-driven pattern?

https://github.com/tinkerbell/proposals/pull/17/files/b71317a6ddca4227ec1b0394d99afcfcb7cd3083#r488554642

0012/README.md Outdated Show resolved Hide resolved
0012/README.md Outdated

In the current model, the tink-server is loaded with tons of responsibilities.
Responsibilities like - managing workflow state, handing overs actions to workers, logging the events, and others.
We would like to delegate these responsibilities to respective smaller services/components.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need some more explanation on components. Where will they live in tink architecture?
Secondly, how will these components send/receive the events from the grpc, will it be redirected by tink-server ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need some more explanation on components. Where will they live in tink architecture?

it is not part of this proposal. The separation of concerns won't change here. We are just creating a mechanism that enables third parties to implement their own logic outside of Tinkerbell.

There is no redirection, as you can see in the system graph, events are stored in Postgres and there is a couple of new gRPC request served by the grpc server.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I would request some details about this delegate these responsibilities to respective smaller services/components , may be with an example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gianarb and I are working to add more examples and implementation details so that it's clear what we intend to do here.

0012/README.md Show resolved Hide resolved
A new `EventClient` should be developed with the primitive required for the events at least `Watch`.
At the beginning, only a `Watch` function is required but we think a natural evolution will be to serve a function that can be used from other components to fire events.
Or maybe we can do a `Watch` and `Create` straight away.
All the resource clients will have a `Watch` function that will stream events for that particular resource.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After streaming the events what this Watch function (or the method which triggered this Watch), intends to do?

Copy link
Contributor

@gianarb gianarb Sep 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It returns something that waits and runs your code when an event gets fired.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this would be an async implementation instead of polling. I would request a brief idea on how this wait and run on event model would be implemented, ex how are you planning to keep a thread/process open/sleep/aync until an event is received or would it be similar to RMI or leverage some go library(some guess)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Watch is going to be as simple as an infinite for loop running in a goroutine. All the events as they occur will be stored in the events table and will be streamed over gRPC and it will be up to the consumer how they want to consume the stream.

0012/README.md Outdated Show resolved Hide resolved
0012/README.md Show resolved Hide resolved
@gianarb
Copy link
Contributor

gianarb commented Sep 15, 2020

@Cbkhare I can't replay for those inline for some reason.

Which component of tink will manage the trigger of streams?

The stream is a grpc stream. The grpc server is currently served by tink-server. I think the answer is tink-server.

0012/README.md Outdated Show resolved Hide resolved
0012/README.md Outdated Show resolved Hide resolved
0012/README.md Outdated Show resolved Hide resolved
0012/listener.go Outdated Show resolved Hide resolved
0012/README.md Show resolved Hide resolved

## Implementation Details

### PostgreSQL Notifications
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really make sense at this point in time. Since PG Notify isn't persisted then there's no replay ability. Which means that needs to be handled at a higher layer. So what exactly are we gaining from using NOTIFY/LISTEN? What we'd really need if we wanted to do it in the DB is something more like auditing would be needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this won't be the real implementation, yesterday evening I was thinking about the possibility to implement it differently.

We will persist events directly in a DB table from tink-server. Where they happen. With an insert. And I think we will use listeners only on the event table to get the new line and stream them out.

In the meantime we will look at the auditing feature as well. My direction at this stage is to be 70% focused on a good grpc layer, and the rest is "make it working". When it works we can figure out how to make it better. But probably the v1 will be as I wrote, at some point in the code a new event will be stored in the event table.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok makes sense. The nice thing about the trigger based approach is that the DB takes care of populating the events table itself and we don't have to worry about sprinkling code in everywhere we want an event. I'd really consider doing it that way first actually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Cbkhare I think this should answer your question. The events need to be persisted into a table. NOTIFYs are from there, new watchers would SELECT to go back in time and then LISTEN for new updates. Some care would be needed to make sure we don't race and drop some events between the two.

Signed-off-by: Gaurav Gahlot <gauravgahlot0107@gmail.com>
Signed-off-by: Gaurav Gahlot <gauravgahlot0107@gmail.com>
0012/README.md Show resolved Hide resolved

Not everything can be a Tink responsibility.
Events are a scalable way to build an extendible system.
This allows different components to tap-in to the event streams and leverage the extension points.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, to understand better I was trying to create an analogy between event stream as a part of Event-driven pattern and pub-sub model and also, its difference with later.

Reading this file and comment in the link below, i think what you are trying to implement is Event sourcing https://martinfowler.com/eaaDev/EventSourcing.html pattern and not Event-driven pattern?

https://github.com/tinkerbell/proposals/pull/17/files/b71317a6ddca4227ec1b0394d99afcfcb7cd3083#r488554642

A new `EventClient` should be developed with the primitive required for the events at least `Watch`.
At the beginning, only a `Watch` function is required but we think a natural evolution will be to serve a function that can be used from other components to fire events.
Or maybe we can do a `Watch` and `Create` straight away.
All the resource clients will have a `Watch` function that will stream events for that particular resource.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this would be an async implementation instead of polling. I would request a brief idea on how this wait and run on event model would be implemented, ex how are you planning to keep a thread/process open/sleep/aync until an event is received or would it be similar to RMI or leverage some go library(some guess)?

0012/README.md Outdated

In the current model, the tink-server is loaded with tons of responsibilities.
Responsibilities like - managing workflow state, handing overs actions to workers, logging the events, and others.
We would like to delegate these responsibilities to respective smaller services/components.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I would request some details about this delegate these responsibilities to respective smaller services/components , may be with an example.


The tink-server will only be responsible for generating the events.
It will not contain any business logic that needs to be executed as an event occurs.
Instead it's the consumer who will have to implement the business logic in an idempotent way.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how are we handling the scenario where an event was triggered but consumer didn't received it ?
In that case, neither consumer is aware if event was triggered nor tink is aware if event was received ?
Also, a possible of re-delivery.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a delivery/retry logic will be an overkill at this point. Every consumer when it connects to a stream of events will specify how old the events returned should be (by default 5m). That way they will receive the events missed during the connect breakdown.

Copy link
Contributor

@gianarb gianarb Sep 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how are we handling the scenario where an event was triggered but consumer didn't received it ?

We do not have a catalog of consumers. Consumers do not register themself (consumers are normal clients like the tink-cli) if a consumer is not attached to a stream or if it didn't ask for old enough events for the tink-server that consumer does not exist. When a consumer gets implemented it knows that tink-server stores events for some amount of time, it is the responsibility of who implements the consumer to figure out how to be reliable.

The tink-server responsibility is to store and stream events reliably and fastly to whoever asks for them

Copy link
Contributor

@Cbkhare Cbkhare Sep 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my comment was deleted. Let me add it again.

Event is streamed to consumer when it occurs and also, consumer is listening onto the stream and is unaware about the occurrence of an event until it receives it.

The tink-server responsibility is to store and stream events reliably and fastly to whoever asks for them

How are we ensuring the streaming events reliably, what happens when the event is lost during the stream, may be due to network issue or service restart? Just like with a REST call we are aware about the status of the execution with status-code and resend it or take corrective actions.

Maybe this should help https://sourcegraph.com/github.com/grpc/grpc-go/-/blob/stream.go#L1331.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is answered in #17

@@ -0,0 +1,233 @@
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of my comments are marked as resolved. I would request some reply on them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was unintentional actually. I was marking resolved those we have responded to and accidentally marked your new comments as well. 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np, i understand.

@tinkerbell tinkerbell deleted a comment from Cbkhare Sep 16, 2020
@gianarb
Copy link
Contributor

gianarb commented Sep 16, 2020

@Cbkhare I deleted a comment by mistake but I think you posted this twice: #17 (comment)

@mmlb
Copy link
Contributor

mmlb commented Sep 16, 2020

A few of the comments are about the future direction, and maybe premature at this point. A lot of confusion/questions are coming up on it but its out of scope I think. Suggest the proposal is re-worded a bit to just mention future possibilities as lightly as possible.

Comment on lines +198 to +199
There is no logic to keep track of which events are fired or sent to a consumer.
Every consumer when it connects to a stream of events will specify how old the events returned should be (by default 5m).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is going to be a lot of work on the clients that would make them more complex.

Copy link
Contributor

@gianarb gianarb Sep 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be a bit more precise? It is just a parameter, if they don't want to get old events they can ask for 0s. I think by default we should avoid the possibility for a client to lose events during reconnection (for example during a consumer's deploy)

@gianarb gianarb requested a review from mmlb September 20, 2020 19:48
Gianluca Arbezzano and others added 2 commits September 21, 2020 13:32
Signed-off-by: Gianluca Arbezzano <gianarb92@gmail.com>
Signed-off-by: Gaurav Gahlot <gauravgahlot0107@gmail.com>
@gianarb
Copy link
Contributor

gianarb commented Sep 21, 2020

I am gonna merge this as it is. It is open for future discussions if something has to change or if it will diverge from the actual implementation. But for now, I think it is good enough

@gianarb gianarb closed this Sep 21, 2020
@gianarb gianarb reopened this Sep 21, 2020
@gianarb gianarb merged commit 44fa89f into tinkerbell:master Sep 21, 2020
mergify bot added a commit to tinkerbell/tink that referenced this pull request Dec 7, 2020
## Description

This PR provides the implementation for event-driven Tinkerbell base model proposed [here](tinkerbell/proposals#17).

**Note**
We would like to keep `Watch` as the function used to stream events for the new event-driven tink. Therefore, the existing `hardware.Watch` has been renamed to `hardware.DeprecatedWatch`.

This will also require some changes in Hegel, so that it can use the `hardware.DeprecatedWatch`.

## Why is this needed

Fixes: #294 

## How Has This Been Tested?

Unit tests and manual testing.

## How are existing users impacted? What migration steps/scripts do we need?

This contains a bc-break. Watch has been renamed to DeprecatedWatch, clients will need to be regenerated and upgraded.


## Checklist:

I have:

- [ ] updated the documentation and/or roadmap (if required)
- [x] added unit or e2e tests
- [x] provided instructions on how to upgrade

## Known issues in progress (fixed)
- [x] an event being received twice on the same stream
- [x] `tink hardware watch <id>` fails to correctly print the hardware data for new events
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants