Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clearly define all parameters used in a bot #757

Closed
sykaeh opened this issue Oct 27, 2016 · 15 comments · Fixed by #1751
Closed

Clearly define all parameters used in a bot #757

sykaeh opened this issue Oct 27, 2016 · 15 comments · Fixed by #1751

Comments

@sykaeh
Copy link
Contributor

sykaeh commented Oct 27, 2016

Recently I have been creating, modifying and using a lot of bots to fit our setup. And one of the things that has been bothering me is how inconsistently the bot's parameters are defined/documented. Most of the parameters for a BOT are listed in the BOTS file, however I have noticed that the list of parameters is not always complete (especially for parameters that are processed elsewhere like code or accuracy in collector bots). Also see #668, for a related discussion about the BOTS file.

Additionally, there is no way to explain what the parameter is, whether it is required or optional and what type the parameter is (needed for certtools/intelmq-manager#81). So ideally, you should be able to specify the following items for each parameter:

  • variable name
  • human readable name
  • description
  • required/optional
  • type (string, boolean, etc.)
  • default value (if appropriate)

Other requirements:

  • It should be possible to inherit common parameters (e.g. feed parameters (name, code, accuracy, provider), collector parameters (rate_limit), HTTP request parameters (http_username, http_password, http_proxy), logging defaults, etc.)
  • The information is specific to the implementation of the bot and the code relies on it, so it should be in a file close the the code and not in some huge configuration file (like the BOTS file currently)
  • Possibility to add a description of the bot so we can remove the BOTS file entirely.
  • (optional) Possibility to group parameters together by name (i.e. feed parameters, mail config parameters, etc.)

As far as I can tell, the BOTS file is only used by the intelmq-manager (and possibly as a reference guide of what to put in runtime.conf if you do not use the intelmq-manager).

My current suggestion is to put all of this information in a python file (either directly in the bot, in __init__.py or in a separate file in the bots module). This allows easy implementation of inheritance and I personally find it just as easy to write Python code as JSON files). The intelmq-manager would then read the information through the intemqctl (or a something similar) thus allowing to get a list of all of the currently installed bots and their configuration settings without having to list all of them in a single file.

@sykaeh
Copy link
Contributor Author

sykaeh commented Oct 27, 2016

I also just noticed that issue #644 is related.

Also related: #267, #552, #368

@sykaeh
Copy link
Contributor Author

sykaeh commented Oct 27, 2016

I'm not entirely sure what this would mean for the defaults.conf file as I do not quite understand how it relates to the runtime.conf and BOTS file.

@sebix sebix modified the milestones: v1.0.1 First Bugfix release, v1.1 Feature release Nov 3, 2016
@sebix
Copy link
Member

sebix commented Nov 8, 2016

defaults and runtime hold the same parameters. runtime overrides defaults.
BOTS is only for the manager, it's not used anywhere else.

@SYNchroACK
Copy link
Contributor

@sykaeh as we discuss, if you have chance, present us an example (proposal) to possible fix the issue that you raise. Indeed, is not an easy task but as I mentioned to you, the main principle is to provide to new users and new developers a quick way to start without need to learn about python.

@sykaeh
Copy link
Contributor Author

sykaeh commented Jan 4, 2017

I have pushed an initial, partial possible implementation to sykaeh/intelmq (replace-bot branch). Here I briefly describe the main changes:

1. Param (in intelmq/lib/bot.py)

This class is used to specify an individual parameter with the value name, a description, whether it is optional or required, what type it is and whether there is a default value. Optionally it is also possibly to assign this parameter to a group of parameters.

class Param(object):

    def __init__(self, name, description, required, validation_type,
                 group='specific', default=None):
        self.name = name
        self.description = description
        self.group = group
        self.required = required
        self.validation_type = validation_type
        self.default = default

    def validate(self, value):
        return self.validation_type.is_valid(value)

2. ParameterDefinitions (in intelmq/lib/bot.py)

This is a class to specify all of the parameters that a bot needs. I defined all of the default parameters (from defaults.conf) and some of the frequently used parameters in separate groups (such as http parameters, mail parameters, redis parameters). The constructor can then be used to easily define which parameters are used by listing which predefined groups are needed and then which additional parameters only specific to that bot.

3. Adjustments in all of the Bots

Using those two classes, I have added all of the information from the BOTS file (plus some additional info) in each of the bot files.

For example the MailURLCollectorBot:

class MailURLCollectorBot(CollectorBot):

    NAME = 'Generic Mail URL Fetcher'
    DESCRIPTION = 'Monitor IMAP mailboxes and fetch files from URLs contained in mail bodies'
    PARAMETERS = ParameterDefinitions('mail feed collector http', [
        Param('folder', 'The folder in which to search for messages', True, String),
        Param('subject_regex', 'The regular expression to search for in the subject', True, String),
        Param('from_address', 'The sender\'s e-mail address', False, String),
        Param('url_regex', 'The regular expression for the URL', True, String)
    ])

    def init(self):
        pass

This fulfills all of the requirements listed above and also allows for the automatic generation of documentation. It can also be used to automatically check whether a bot has all necessary parameters defined and if they are valid values.

As an example of how to iterate through all of the Bots and retrieve the information I created intelmq_gen_bots_file.py which generates the current BOTS file automatically.

Open issues/not yet done:

  • The parser bots have not yet been adjusted for the new format
  • There are a lot of descriptions for parameters missing and the defaults, descriptions, etc. should be checked
  • The class Parameters is defined in at least four different files (intelmqctl.py, bot.py, utils.py, test.py) and should probably be integrated into the new structure.
  • Automatic generation of documentation
  • Validation of parameters
  • Adjustments to IntelMQ-Manager

I would really appreciate your feedback on this approach before proceeding any further. As I have said, this is just an initial suggestion on how to solve the problem of the BOTS file.

@SYNchroACK
Copy link
Contributor

@sykaeh Awesome work! From a quick look, it seems really interesting and clean approach. ASAP I will give you a more detailed feedback to push this one further.

Thank you very much @sykaeh !!!

@dmth
Copy link
Contributor

dmth commented Jan 6, 2017

I like this approach, Thank you @sykaeh. I see "internationalisation" coming up here :- )

@sykaeh do you have time to advertise your work on the Mailing-List? This approach will be a mayor change and should be communicated properly.

@SYNchroACK
Copy link
Contributor

@dmth yeahhh! :D really awesome!

I think there is a need for review before advertise in ML.
I'm currently doing it and already have some minimal things that need to discuss with @sykaeh.

@dmth If you have some time, check the code and let us know your technical feedback. Will be important to also have the feedback from @sebix and @aaronkaplan.

....After that, @sykaeh ML is yours! 🥇

@dmth
Copy link
Contributor

dmth commented Jan 6, 2017

Without having looked into the code yet:
We should integrate the ML into the parts of the reviewing process. At least announce that the Idea is currently under review, but looking $YOUR_TENDENCY so far, for transparency reasons.

@ghost
Copy link

ghost commented Jan 10, 2017

Before going into detail two general questions:

  • Why did you base your class on object and not on dict?
  • Why do you use harmonization types intended for events for configuration purpose?

@sykaeh
Copy link
Contributor Author

sykaeh commented Jan 11, 2017

  • To be honest, I had just not thought of basing the class on dict since I have never seen that done before. What would be the advantages? (Also I am assuming you mean the Param class).
  • We want to be able to specify the type of the parameter and since we already had the types defined in harmonization I thought we could re-use them. Is there a reason not to do so?

@ghost ghost mentioned this issue Jan 11, 2017
@ghost
Copy link

ghost commented Jan 11, 2017

First: Thanks for your work! We had this in mind for some time now but yet noone hat time for it. Your proposal is very clean and sophisticated and was a lot of work!

To be honest, I had just not thought of basing the class on dict since I have never seen that done before.

We are doing it for the message class(es) too.

We want to be able to specify the type of the parameter and since we already had the types defined in harmonization I thought we could re-use them. Is there a reason not to do so?

Those aren't real types (at least currently). Python types are fine I think (str, int, float, list, enum etc).

I can work on this next week a show my proposal (based on yours).

@sebix sebix assigned ghost and unassigned sebix Jan 16, 2017
@ghost
Copy link

ghost commented Mar 29, 2017

Concerning the types of parameters: I think we can reuse the typing module too, this would provide proper definitions for Mappings and Sequences.

@ghost ghost modified the milestones: 1.1.0, 2.0.0 Jun 28, 2018
@ghost
Copy link

ghost commented Aug 6, 2018

This would allow adding help/manpages to all bots.

E.g. intelmq.bots.collecots.file.collector -h gives a list of parameters

@ghost ghost mentioned this issue Aug 7, 2018
@ghost ghost modified the milestones: 2.0.0, 2.1.0 Apr 10, 2019
@ghost ghost mentioned this issue Sep 10, 2019
@ghost ghost modified the milestones: 2.1.0, 2.2.0 Oct 25, 2019
@bernhardreiter
Copy link
Contributor

An important use case as I see it is:
Developers want one single place as source for the documentation to easily maintain it.

When looking at some of the MISP bots, I've again noticed that the location of documentation about a bot is a problem. So, to add to the importance of the issue, I'll document this as an example:

- misp_verify: true or false, check the validity of the certificate
still has a deprecated parameter, while the duplicated documentation at https://github.com/certtools/intelmq/blob/bb205e9855c474eb8d0642557336d4b038dcdf46/docs/Bots.md#misp-generic already has the new behaviour.

The problem here that there are two places for the information and it is a bit of random which is remembered by developers if they make a small change in the code. In this case the docstring was not updated, while Bots.md was.

(After noticing this, I've proposed a fix as part of #1486 )

@ghost ghost modified the milestones: 2.2.0, 2.3.0 Jun 17, 2020
@ghost ghost removed their assignment Jul 13, 2020
@ghost ghost removed this from the 2.3.0 milestone Feb 4, 2021
@ghost ghost closed this as completed in #1751 Mar 15, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants