Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LIU-402: Improving regression testing in anticipation of schema refactor #278

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

myxie
Copy link
Collaborator

@myxie myxie commented Aug 27, 2024

Note: Merge after #280. This features some of those changes, which is why this branch is targeting LIU-404. Update this branch to merge to master once #280 has been merged.

Issue

Moving towards the new JSON schema, we want to have a comprehensive coverage of the existing logical and physical graph translator code, so we comfortably make changes without fear of introducing regressions that go unnoticed (See https://icrar.atlassian.net/browse/LIU-402 for more information).

There are also some frustrating examples of updating class-variables using proxy local variables in the translator code, which would benefit from being removed as they make the code much harder to read and navigate.

Solution

The following has been achieved in this PR:

  • Setup more tests for LG/LGNodes
  • Provide example refactor in the form of a non-recursive lgn_to_pgn
  • Remove unnecessary local-class variables from PGT/Scheduler code that makes navigating it much harder.

The refactor example demonstrates the value of the new test cases.

Summary by Sourcery

Set up regression testing for the schema refactor by adding extensive tests for logical and physical graph translators. Introduce a non-recursive implementation of the lgn_to_pgn method and refactor code to improve readability by removing unnecessary variables and using a new utility module for file path handling.

New Features:

  • Introduce a non-recursive implementation of the lgn_to_pgn method in the logical graph translator.

Enhancements:

  • Remove unnecessary local-class variables from the PGT/Scheduler code to improve code readability and navigation.
  • Refactor the code to use a new utility module path_utils for handling file paths, replacing previous inline implementations.

Tests:

  • Add comprehensive regression tests for logical and physical graph translator code to ensure backward compatibility and prevent unnoticed regressions.
  • Implement tests for the non-recursive lgn_to_pgn method to demonstrate its functionality and validate its output.

- Setup more tests for LG/LGNodes
- Provide example refactor in the form of a non-recursive lgn_to_pgn
- Remove unnecessary local-class variables from PGT/Scheduler code that makes navigating it much harder.
@myxie myxie changed the base branch from master to LIU-404 August 30, 2024 07:59
@myxie myxie marked this pull request as ready for review August 30, 2024 08:06
Copy link
Contributor

sourcery-ai bot commented Aug 30, 2024

Reviewer's Guide by Sourcery

This pull request implements initial work on setting up regression testing for a schema refactor in the DALiuGE project. The changes focus on improving test coverage for logical and physical graph translator code, refactoring some parts of the code, and removing unnecessary local-class variables to enhance code readability.

File-Level Changes

Change Details Files
Added new test cases for LG/LGNodes
  • Created TestLGInit class to test LG object construction
  • Implemented TestLGNToPGN class to verify LGN to PGN conversion
  • Added TestLGUnroll class to test LG unrolling process
  • Introduced TestLGNodeLoading class to test SubGraph data node storage
daliuge-translator/test/dropmake/test_lg.py
Refactored lgn_to_pgn method to include a non-recursive option
  • Added 'recursive' parameter to lgn_to_pgn method
  • Implemented non-recursive logic for traversing child nodes
  • Updated related code to support both recursive and non-recursive approaches
daliuge-translator/dlg/dropmake/lg.py
Removed unnecessary local-class variables from PGT/Scheduler code
  • Eliminated redundant variable assignments
  • Simplified code by directly using class attributes instead of local variables
  • Removed unused variables and imports
daliuge-translator/dlg/dropmake/scheduler.py
daliuge-translator/dlg/dropmake/pgt.py
Improved code organization and modularity
  • Created new path_utils.py file for handling file paths
  • Moved test-related utility functions to separate files
  • Refactored import statements to use new utility modules
daliuge-translator/dlg/dropmake/path_utils.py
daliuge-translator/test/dropmake/test_pg_gen.py
Enhanced error handling and type safety
  • Added type hints and docstrings to improve code clarity
  • Implemented more robust error checking in various functions
  • Updated exception handling to provide more informative error messages
daliuge-translator/dlg/dropmake/lg_node.py
daliuge-translator/dlg/dropmake/dm_utils.py

Tips
  • Trigger a new Sourcery review by commenting @sourcery-ai review on the pull request.
  • Continue your discussion with Sourcery by replying directly to review comments.
  • You can change your review settings at any time by accessing your dashboard:
    • Enable or disable the Sourcery-generated pull request summary or reviewer's guide;
    • Change the review language;
  • You can always contact us if you have any questions or feedback.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @myxie - I've reviewed your changes and they look great!

Here's what I looked at during the review
  • 🟡 General issues: 3 issues found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.

# print("N (makespan) is ", N, "M is ", M)
ma = np.zeros((M, N), dtype=int)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Consider impact on readability of using self._max_dop directly

While using self._max_dop directly is more consistent, consider if a local variable M = self._max_dop might improve readability in this method.

else:
for child in lgn.children:
# Approach next 'set' of children
c_copy = copy.deepcopy(child)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (performance): Consider performance impact of using deepcopy

Using copy.deepcopy can be expensive for large objects. Consider if a shallow copy would suffice, or if this operation is necessary at all.

c_copy = copy.copy(child)
if hasattr(child, '__dict__'):
    c_copy.__dict__ = copy.copy(child.__dict__)

Comment on lines -306 to -307
lm = self._oid_gid_map
lm2 = self._gid_island_id_map
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Consider keeping descriptive variable names

While removing these variables simplifies the code, the descriptive names 'lm' and 'lm2' might have aided in understanding. Consider adding a comment explaining what these maps represent.

# Map of object IDs to group IDs
oid_gid_map = self._oid_gid_map
# Map of group IDs to island IDs
gid_island_id_map = self._gid_island_id_map
# when #partitions < #nodes the oid_gid_map values are spread around range(#nodes)
# which leads to index out of range errors (TODO: find how _oid_gid_map is

daliuge-translator/dlg/dropmake/lg.py Show resolved Hide resolved
Comment on lines +50 to +56
for lg_name, num_keys in self.lg_names.items():
fp = path_utils.get_lg_fpath("logical_graphs", lg_name)
lg = LG(fp, ssid=TEST_SSID)
self.assertEqual(num_keys,
len(lg._done_dict.keys()),
f"Incorrect number of elements when constructing LG "
f"object using: {lg_name}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Avoid loops in tests. (no-loop-in-tests)

ExplanationAvoid complex code, like loops, in test functions.

Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:

  • loops
  • conditionals

Some ways to fix this:

  • Use parametrized tests to get rid of the loop.
  • Move the complex logic into helpers.
  • Move the complex part into pytest fixtures.

Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.

Software Engineering at Google / Don't Put Logic in Tests

Comment on lines +1021 to 1026
sk = "".join([str(int(round(xi))) for xi in x[0: self._topk]])
stuff = self._sspace_dict.get(sk, None)
if stuff is None:
if not stuff:
G = self._lite_dag.copy()
stuff = self._partition_G(G, x)
self._sspace_dict[sk] = stuff[0:2]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Replace a[0:x] with a[:x] and a[x:len(a)] with a[x:] [×2] (remove-redundant-slice-index)

@@ -1046,11 +1033,11 @@
indices of x is identical to the indices in G.edges().sort(key='weight')
"""
# first check if the solution is already available in the search space
sk = "".join([str(int(round(xi))) for xi in x[0 : self._topk]])
sk = "".join([str(int(round(xi))) for xi in x[0: self._topk]])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): We've found these issues:

@@ -1264,7 +1251,7 @@

@staticmethod
def build_dag_from_drops(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): We've found these issues:


Explanation

The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.

How can you solve this?

It might be worth refactoring this function to make it shorter and more readable.

  • Reduce the function length by extracting pieces of functionality out into
    their own functions. This is the most important thing you can do - ideally a
    function should be less than 10 lines.
  • Reduce nesting, perhaps by introducing guard clauses to return early.
  • Ensure that variables are tightly scoped, so that code using related concepts
    sits together within the function rather than being scattered.

iterate through them and get the number of Physical Graph Nodes for every one of
the Logical Graph nodes.
"""
return sum([len(drop_list) for drop_list in drop_values])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): Replace unneeded comprehension with generator (comprehension-to-generator)

Suggested change
return sum([len(drop_list) for drop_list in drop_values])
return sum(len(drop_list) for drop_list in drop_values)

for lgn in lg_non_recursive._start_list:
lg_non_recursive.lgn_to_pgn(lgn, recursive=False)

expected_test_loop_drops = 11
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Extract duplicate code into method (extract-duplicate-method)

@myxie myxie changed the title LIU-402: Initial work setting regression testing for schema refactor LIU-402: Improving regression testing in anticipation of schema refactor Aug 30, 2024
Base automatically changed from LIU-404 to master September 25, 2024 04:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant