[Draft] Add bayesian training #2430

seemanne · 2023-08-16T16:31:44Z

No description provided.

github-actions · 2023-08-16T16:32:01Z

@seemanne: There are no 'kind' label on this PR. You need a 'kind' label to generate the release automatically.

/kind feature
/kind enhancement
/kind fix
/kind chore
/kind dependencies

Details

I am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository.

github-actions · 2023-08-16T16:32:03Z

@seemanne: There are no area labels on this PR. You can add as many areas as you see fit.

/area agent
/area local-api
/area cscli
/area security
/area configuration

Details

I am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository.

seemanne · 2023-08-16T16:33:26Z

/kind feature
/area configuration

codecov · 2023-08-16T16:45:43Z

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (92f923c) 56.94% compared to head (ba7a5a3) 37.79%.

Files	Patch %	Lines
pkg/leakybucket/bayesian.go	0.00%	4 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           master    #2430       +/-   ##
===========================================
- Coverage   56.94%   37.79%   -19.16%     
===========================================
  Files         195      191        -4     
  Lines       26675    26302      -373     
===========================================
- Hits        15191     9940     -5251     
- Misses       9901    15018     +5117     
+ Partials     1583     1344      -239

Flag	Coverage Δ
bats	`37.79% <0.00%> (ø)`
unit-linux	`?`
unit-windows	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sbs2001 · 2023-08-17T05:35:13Z

pkg/bayesiantrain/trainer.go

+
+	for _, v := range s.ParsedIpEvents {
+		go evaluateProgramOnBucket(&v, compiled, inputChan)
+	}
+
+	go controllerRoutine(inputChan, outputChan, s.total)


I’m not aware of the larger context of where this fits in. Looks like you’re adding new function in scenarios/parsers. That being said I think using go routines won't add performance but is only making the code more complex. The “go-routined” function evaluateProgramOnBucket is not doing any I/O work hence doesn't make sense to use go routines on it. Something like

var result evalHypothesisResult for _, v := range s.ParsedIpEvents { r := evaluateProgramOnBucket(&v) // update result }

Would have equivalent if not better performance while eliminating the overhead of controller routine etc

Ok maybe I should explain. The whole thing is the main training loop for the bayesian buckets. The construction works the following way:

The logs are loaded into the LogEventStorage into the map with the key being the Ip and the all the events for this Ip are added to the fakeBucket

The user can then test different hypothesis expr to see if any of them would make for good conditions in the bucket using TestHypothesis

To speed up the hypothesis testing the idea way to run it in parallel threads for each IP (as its basically counting some stuff per IP)

The goal of the channel/goroutine design in TestHypothesis is to enable this parallelism by spawing an independent routine for each IP (fake bucket) and then collecting all the results using the controller.

Does this make more sense now?

Understood, the goroutines would definitely increase throughput. If the training is CPU intensive task than goroutine would make sense.

Nice, thank you

github-actions bot added the needs/kind label Aug 16, 2023

github-actions bot added the needs/area label Aug 16, 2023

seemanne changed the title ~~add basic trainer class~~ Add bayesian training Aug 16, 2023

seemanne changed the title ~~Add bayesian training~~ [Draft] Add bayesian training Aug 16, 2023

github-actions bot added kind/feature area/configuration and removed needs/kind needs/area labels Aug 16, 2023

sbs2001 reviewed Aug 17, 2023

View reviewed changes

seemanne added 15 commits November 22, 2023 15:38

add basic trainer class

e4ab767

switch to channel based memory sharing

0ede693

finish training skeleton

17bf20f

start on parser internals

3f69fb9

add parsers with Loz

1e82584

make everything public

6e8f667

parallelize timemachine mode for faster parsing

cef5666

add cache to storage directly

13d36ba

change channel buffer size

40ef0fa

try fixing null pointers

3a86928

fix total not being set correctly

c97b178

add single threaded training

aca22ed

make Update public to allow training to use it

20d81d3

add inference mode

e69ed4d

expose metrics to file

ba7a5a3

seemanne force-pushed the add-train-pkg branch from 2fa5c07 to ba7a5a3 Compare November 22, 2023 14:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Draft] Add bayesian training #2430

[Draft] Add bayesian training #2430

seemanne commented Aug 16, 2023

github-actions bot commented Aug 16, 2023

github-actions bot commented Aug 16, 2023

seemanne commented Aug 16, 2023

codecov bot commented Aug 16, 2023 •

edited

Loading

sbs2001 Aug 17, 2023 •

edited

Loading

seemanne Aug 17, 2023

sbs2001 Aug 17, 2023

seemanne Aug 17, 2023

[Draft] Add bayesian training #2430

Are you sure you want to change the base?

[Draft] Add bayesian training #2430

Conversation

seemanne commented Aug 16, 2023

github-actions bot commented Aug 16, 2023

github-actions bot commented Aug 16, 2023

seemanne commented Aug 16, 2023

codecov bot commented Aug 16, 2023 • edited Loading

Codecov Report

sbs2001 Aug 17, 2023 • edited Loading

Choose a reason for hiding this comment

seemanne Aug 17, 2023

Choose a reason for hiding this comment

sbs2001 Aug 17, 2023

Choose a reason for hiding this comment

seemanne Aug 17, 2023

Choose a reason for hiding this comment

codecov bot commented Aug 16, 2023 •

edited

Loading

sbs2001 Aug 17, 2023 •

edited

Loading