Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cleaner] Separate process of preparing obfuscations #3262

Merged
merged 6 commits into from
Jun 22, 2023

Conversation

TurboTurtle
Copy link
Member

This patchset is the first that should set us on a path to providing multi-process --clean support in a way that is not fragile or particularly disadvantageous for single-archive cleaning.

In order to facilitate multi-process down the road, we need to separate some tight bindings between parsers and mappings that exist today. The first of which, is how the mappings get prepared from archives before we do the actual obfuscation work.

This set introduces the use of SoSPreppers, which will be used to pull select data out of archives and use it to prepare the parsers and mappings. This was previously done via prep_files dicts for archive abstractions and via additional methods in parsers and/or mappings to handle specific data from specific prep_file contents.

By separating this out, we gain flexibility to allow the individual Prepper entities to grok data as needed, rather than always relying on the existing regex parsing or special-case handling in mappings. This also allows us to provide a better framework for specific use cases in the future - e.g. we could create cluster-specific Preppers that ensure that we obfuscate cluster hosts together, so that their obfuscation numbering is sequential.

The next steps will include moving compiled regex handling directly to parsers, and not binding the mappings directly to the parsers.

Related: RH: SUPDEV-135


Please place an 'X' inside each '[]' to confirm you adhere to our Contributor Guidelines

  • Is the commit message split over multiple lines and hard-wrapped at 72 characters?
  • Is the subject and message clear and concise?
  • Does the subject start with [plugin_name] if submitting a plugin patch or a [section_name] if part of the core sosreport code?
  • Does the commit contain a Signed-off-by: First Lastname email@example.com?
  • Are any related Issues or existing PRs properly referenced via a Closes (Issue) or Resolved (PR) line?

@packit-as-a-service
Copy link

Congratulations! One of the builds has completed. 🍾

You can install the built RPMs by following these steps:

  • sudo yum install -y dnf-plugins-core on RHEL 8
  • sudo dnf install -y dnf-plugins-core on Fedora
  • dnf copr enable packit/sosreport-sos-3262
  • And now you can install the packages.

Please note that the RPMs should be used only in a testing environment.

Up until now, the archive abstractions have defined what files `sos
clean` will use to prepare the mappings for obfuscation before entering
the normal obfuscation loop over every file in every archive.

While this is straight forward enough, it is not particularly flexible,
and prevents us from easily using other approaches for preparing the
mappings beyond what is directly obtained via the parsers (which in some
cases need special handling to be prepared at all).

Change this by introducing `SoSPrepper`s which will be used to determine
now only what files to pass to which parsers on an archive-by-archive
basis, but will also allow for manually retrieving items from disaparate
sources within the archive(s) and handing those directly to the
mappings, without the need for those items to first pass the parser
check.

Related: RH: SUPDEV-135

Signed-off-by: Jake Hunsaker <jhunsake@redhat.com>
Adds a new prepper for IP network address sourcing.

Signed-off-by: Jake Hunsaker <jhunsake@redhat.com>
Adds a new Prepper for handling hostname determination for preparing the
mapping and parser.

As part of this new prepper, pass the CLI options to each prepper for use.

Signed-off-by: Jake Hunsaker <jhunsake@redhat.com>
Adds a new Prepper to handle feeding relevant files to the mac parser
for initial preparation of the mappings.

Signed-off-by: Jake Hunsaker <jhunsake@redhat.com>
Adds a new Prepper for usernames, and removes the bits from the parser
and mapping that otherwise handled the initial preparation of the
mapping. The prepper will source from the same initial files, as well as
from the `--usernames` command line option.

Signed-off-by: Jake Hunsaker <jhunsake@redhat.com>
Adds a new Prepper to handle keyword preparation. This is slightly
inefficient since we will only realistically need this once, but baking
it into the archive loop does not pose any other problems, and it would
be more fragile to break out a special flow just for keywords.

Signed-off-by: Jake Hunsaker <jhunsake@redhat.com>
@TurboTurtle TurboTurtle added the RH QE RH QE has been requested to review label Jun 2, 2023
Comment on lines +27 to +32
Preppers may specify their own priority in order to influence the order in
which mappings are prepped. Further, Preppers have two ways to prepare
the maps - either by generating a list of filenames or via directly pulling
content out of select files without the assistance of a parser. A lower
priority value means the prepper should run sooner than those with higher
values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick comment: swap 2nd and 3rd sentence (as priority levels follows the 1st sentence, while "two ways to prepare the maps" is followe by the next paragraph.

@pmoravec
Copy link
Contributor

Nice separation (and also unification of "prepping" methods like s/load_hostname_into_map/mapping.add).

@pmoravec
Copy link
Contributor

Another similar nitpick: reviewing the patch set and reading comments of methods here and there, I spot two typos in https://github.com/sosreport/sos/blob/main/sos/cleaner/__init__.py#L710-L713 - could they be added to the patchset or shall I fix them directly?

("Obfuscate and individual file" -> "Obfuscate an individual file", "written to a temp file without our own tmpdir" -> ".. within our own tmpdir")

Comment on lines +302 to +303
def test_mapping_method_translation(self):
self.assertEqual([], self.prepper.get_items_for_map('foobar', None))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment like to test_parser_method_translation.

@pmoravec
Copy link
Contributor

pmoravec commented Jun 12, 2023

Just the few minor issues / nitpicks found. But some testing is blocked by #3271 (well, until one uninstalls magic library :) ).

@TurboTurtle
Copy link
Member Author

Another similar nitpick: reviewing the patch set and reading comments of methods here and there, I spot two typos in https://github.com/sosreport/sos/blob/main/sos/cleaner/__init__.py#L710-L713 - could they be added to the patchset or shall I fix them directly?

("Obfuscate and individual file" -> "Obfuscate an individual file", "written to a temp file without our own tmpdir" -> ".. within our own tmpdir")

Since that is from an earlier commit, let's tackle that later. I have a strong feeling that any work that follows this on the path to multi-process handling of files will re-write that method anyways, so we can address it then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RH QE RH QE has been requested to review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants