Big Query Anonymization Test Tool

PUBLIC VERSION: Testing solution for BQ GDPR anonymization use case.

📝 Table of Contents

About
Getting Started
Running the tests
Authors
Acknowledgments

🧐 About

IMPORTANT: This is a public version of the project. Feature files and SQL templates were anonymized. Also, API connection to BigQuery is not possible. Rest of the codebase is intact.

This projects implements a testing solution using python-behave framework to test, whether ID fields in BQ datasets' tables were anonymized successfully.

🏁 Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

What things you need to install the software and how to install them.

Python 3.6+ with these external packages:
- behave
- allure-behave
- pandas
- openpyxl
- tqdm
- pyhamcrest
- google
- google-cloud-biqquery
- protobuf
linux (Ubuntu)/Win10 OS
allure reporting tool
- on Win10 install using scoop
- on Ubuntu/linux install using linuxbrew
access to tested BQ data project
access to BQ API, have it set up and have proper roles
access to this repository

Get familiar with used external tools' documentation to really understand, what is going on

Google and protobuf packages had to be placed in setup.py file to ensure proper functionality of BQ API library package.

Installing

Install Python (refer to documentation, how to do that on your OS)
fire up your command line tool of choice and get to the directory, where you will want to clone the project from github
clone this repo
run "python3 setup.py install" if on ubuntu, or "py setup.py install" if on win10. On Win10, package "pandas" will not be installed, you will have to do it manually. See comment in the setup.py file for link. Download the package, and run command pip install [path to package]/packagefile

🔧 Running the tests

In the console, be in the root folder of the project
run command "behave -f allure_behave.formatter:AllureFormatter -f pretty -o allure-results .\test\features" if on ubuntu, or "behave -f allure_behave.formatter:AllureFormatter -f pretty -o allure-results ./test/features" if on Win10
wait, until tests are finished
failed test have BQ data saved in XLSX file with timestamped name in the ./reports folder.
you can also display interactive HTML report. To do this, run "allure serve" command in your console and the report will open in your default browser. It should be Firefox or Chrome.

Pseudo-random feature file test running

All datasets are divided into 5 feature files, with few exceptions. It is possible to run them either as it is specified above, or, if needed, it is possible to apply pseudo-random selection of the feature file.

To do that, run "python3 (or py on windows) manage.py -r" command in the console.

This will pick one of the tags stored in the list in the "functions.py" file and then run behave test framework, as usual, but only the feature file tagged by this tag will be actually run.

This process can be repeated as many times, as there are some tags, that were not picked, or "exhausted". When that happens, ValueError exception is caught, and you have to manually clear the "config.json" file.

To do that, use the utility "py manage.py -c".

You can also run the utility with both parameters at once, so next time the pseudorandom function will be able to choose from full set of tags again. In this case, run command like this "py manage.py -r -c".

Manage.py utility

To provide easier and faster work with behave coupled with allure reporting tool - since that console command can be quite long, you can use manage.py utility to cover these scenarios:

py manage.py -r will run one randomly picked feature file from all tagged feature files. This feature file will not be ran again, until config.json is cleared.
py manage.py -c will clear config.json file, which stores tags of feature files, which were already randomly run.
py manage.py -b will run all feature files like this command "behave -f allure_behave.formatter:AllureFormatter -f pretty -o allure-results .\test\features" would do.
py manage.py -t "@tag1" -t "@tag2" etc... wil run all feature files or just some of their scenarios tagged by provided tags. Take care to enter the tags wrapped in " " !.
py manage.py -h is always available by default and will display all available command with short descriptions.

✍️ Authors

@bednaJedna - Idea & work

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
helpers		helpers
sql		sql
test/features		test/features
.gitignore		.gitignore
README.md		README.md
config.json		config.json
manage.py		manage.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big Query Anonymization Test Tool

📝 Table of Contents

🧐 About

🏁 Getting Started

Prerequisites

Get familiar with used external tools' documentation to really understand, what is going on

Installing

🔧 Running the tests

Pseudo-random feature file test running

Manage.py utility

✍️ Authors

About

Releases

Packages

Languages

radekBednarik/bq_anonymization_public

Folders and files

Latest commit

History

Repository files navigation

Big Query Anonymization Test Tool

📝 Table of Contents

🧐 About

🏁 Getting Started

Prerequisites

Get familiar with used external tools' documentation to really understand, what is going on

Installing

🔧 Running the tests

Pseudo-random feature file test running

Manage.py utility

✍️ Authors

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages