Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more checks in Travis CI #120

Open
1 of 4 tasks
say-yawn opened this issue Feb 12, 2020 · 16 comments · Fixed by #154
Open
1 of 4 tasks

Add more checks in Travis CI #120

say-yawn opened this issue Feb 12, 2020 · 16 comments · Fixed by #154
Assignees
Labels
good first bug travis-ci Bugs and features for Tracis CI

Comments

@say-yawn
Copy link
Contributor

say-yawn commented Feb 12, 2020

About this Issue

To make the PR tests be more robust and comprehensive we should add additional checks/tests in Travis CI to check for the following:

Acceptance Criteria

  • Check the diff between shavar-prod-lists and disconnect-tracking-protection and have Travis CI fail if it ever finds domains in our list that aren't in Disconnect's
  • Check the list files updated (run shavar-list-creation) from the changes in the shavar-prod-lists, see Get diff between branches to indicate list changes #171 for more details
  • Check that the size of the files updated does not exceed the max file size (4Mb), see Add File Size Checks from the Latest Changes  #170 for more details.
  • Remove the "tag" counts in the json_verify.py test as it is an outdated format
@dtlight
Copy link
Contributor

dtlight commented Mar 5, 2020

hello @skim1102, I'd love to work on this! May I?

@say-yawn
Copy link
Contributor Author

say-yawn commented Mar 5, 2020

@dave-light yes! Please let me know if you have any questions.

@dtlight
Copy link
Contributor

dtlight commented Mar 16, 2020

Hi @skim1102, the 4th acceptance criteria has been satisfied.

I'd just like to clarify that in the 1st acceptance criteria, where you mention disconnect-tracking-protection, are you referring to disconnect-blacklist.json (are we talking about disconnect-entitylist.json too)?
I'm considering adding another argument in to the run() method in json_verify.py so it takes disconnect-blacklist.json and compares it to the shavar-prod-lists. I'd love to know your thoughts on whether or not I'm going about this the right way.

@say-yawn
Copy link
Contributor Author

say-yawn commented Mar 17, 2020

@dave-light I linked a work-in-progress (WIP) PR in shavar-list-creation repo that will do most of what the first acceptance criteria:

Check the diff between shavar-prod-lists and disconnect-tracking-protection and have Travis CI fail if it ever finds domains in our list that aren't in Disconnect's

is asking for. You are welcome to take this code and refactor to suit the need of the first criterion. Does this answer your question?

@dtlight
Copy link
Contributor

dtlight commented Mar 17, 2020

@skim1102 it does indeed. Thanks 😃

@dtlight
Copy link
Contributor

dtlight commented Mar 18, 2020

Hi @skim1102, this is more of a sanity check and to ensure I've understood your code correctly.

Looking at category_domains.py, I see on line 44 a txt file called domain-diff-category is created. I also see a comment printed to the txt (such as line 53) beginning with exclamation marks.

Is it ok for me to assume that all comments like this will begin with an exclamation mark? The reason I ask is that anything else in the txt should only be there because a difference has been detected and so I can cause Travis CI to fail if it finds this.

Please correct me if I've stated something that isn't quite right, or doesn't make sense.

@say-yawn
Copy link
Contributor Author

☝️ #154 fulfills the a/c:

Remove the "tag" counts in the json_verify.py test as it is an outdated format

@say-yawn say-yawn reopened this Mar 19, 2020
@say-yawn
Copy link
Contributor Author

@dave-light, rather than using the .txt file as is to check for the differences, I suggest using the code to check for the differences between the Shavar and Disconnect lists and assert that there are no differences. And if there is a difference between the two, the assertion should fail and indicate what the differences are as it was done here and here for the block lists. Same should be applied to the entity lists.

Did you also look at where you will implement this? To follow the infrastructure seen in this Travis build, I suggest moving this script to scripts folder in shavar-prod-lists. Wdyt?

@dtlight
Copy link
Contributor

dtlight commented Mar 20, 2020

@skim1102 I was a little unsure about where to implement it. But I agree with moving category_domains.py to scripts in shavar-prod-lists.

I'm happy to add the code to shavar-prod-lists in my next commit, please let me know if you'd rather move it across yourself.

@dtlight
Copy link
Contributor

dtlight commented Mar 23, 2020

@skim1102 I noticed a handful of unused variables in category_domains.py (e.g. content_domains, fingerprinting_domains, sorted_data), shall I tidy the code up by removing them? Or are they supposed to be used somewhere?

@dtlight
Copy link
Contributor

dtlight commented Mar 26, 2020

@skim1102 category_domains requires imports from list2safebrowsing, it also requires disconnect_mapping.json, both of which are in shavar-list-creation. Initially, I thought I had figured out a way to 'import' or at least reference them by reading https://stackoverflow.com/questions/9357442/github-linking-with-other-repos but I'm a little reserved about following any of the suggestions there. Do you have any suggestions? I don't want to duplicate the required files but would it be feasible to move them to shavar-prod-lists?

@shreyagupta30
Copy link

@dave-light @skim1102 will it be okay if I take up one of the to-do of this issue?
I am interested to take "Check the diff between shavar-prod-lists and disconnect-tracking-protection and have Travis CI fail if it ever finds domains in our list that aren't in Disconnect's"

@dtlight
Copy link
Contributor

dtlight commented Mar 29, 2020

@dave-light @skim1102 will it be okay if I take up one of the to-do of this issue?
I am interested to take "Check the diff between shavar-prod-lists and disconnect-tracking-protection and have Travis CI fail if it ever finds domains in our list that aren't in Disconnect's"

Hey @shreyagupta30, thanks for your interest. I've been writing code to address that piece of acceptance criteria. I have a few questions regarding my approach, which I'm waiting for @skim1102 to get back to me on.

@shreyagupta30
Copy link

All right, thank you so much for letting me know. :)

@svensevenslow
Copy link

@skim1102 This task was listed as internship task in the outreachy project description. Will this still be a part of the outreachy project or not? Some of the subtasks seem to be done

@say-yawn
Copy link
Contributor Author

say-yawn commented Apr 3, 2020

@svensevenslow yes there are additional Travis CI check work needed and will be reasonable to assume that it will still be part of the Outreachy project. I have created #170 and #171 to indicate those work will be additional work that need to be done.

@say-yawn say-yawn added the travis-ci Bugs and features for Tracis CI label Jul 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first bug travis-ci Bugs and features for Tracis CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants