Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v15 regression: temporary confusion with an inout (validating too hard) #1455

Closed
FlorianDeconinck opened this issue Nov 27, 2023 · 7 comments · Fixed by #1463
Closed

v15 regression: temporary confusion with an inout (validating too hard) #1455

FlorianDeconinck opened this issue Nov 27, 2023 · 7 comments · Fixed by #1463

Comments

@FlorianDeconinck
Copy link
Contributor

Previously working code doesn't pass validation anymore. It seems the code gets confused between a temporary it created and the original temporary field

E               dace.sdfg.validation.InvalidSDFGNodeError: Node validation failed: Inout connector lev is connected to different input ({'__tmp26'}) and output ({'__g_self__lev'}) arrays (at state call_frozen_set_interpolation_coefficients_683, node lagrangian_contributions_computation_140211407004672)
E               Invalid SDFG saved for inspection in /home/fgdeconi/work/git/pace/model_output/_dacegraphs/invalid.sdfg

Minimal reproducer is to run the Remapping numerical regression test on either cpu or gpu backend. Below a script to run on CPU (DaCe version is post #1446)

# Repo is to run the FiniteVolumeTransport regression test
# Original code: fv3core/pace/fv3core/stencils/fvtp2d.py
# DaCe is applied on the FiniteVolumeTransport.__call__ function
# The failing DaCe is in "pace/external/dace"

HOME=$PWD

# Get Pace repository
git clone git@github.com:GEOS-ESM/pace
cd pace
git checkout 911368
git submodule update --recursive --init

# Setup the venv
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install external/gt4py/
pip install external/dace/
pip install -r requirements_dev.txt -c constraints.txt
cd external/dace/
git checkout 7ea43c
cd $HOME/pace

# Download data
mkdir -p test_data
cd test_data
wget https://portal.nccs.nasa.gov/datashare/astg/smt/pace-regression-data/8.1.3_c12_6_ranks_standard.Remapping.tar.gz
tar -xzvf 8.1.3_c12_6_ranks_standard.Remapping.tar.gz
cd $HOME/pace

# Run test of Remapping
export FV3_DACEMODE=BuildAndRun
export PACE_CONSTANTS=GFS
pytest -v -s --data_path=./test_data/8.1.3/c12_6ranks_standard/dycore \
       --backend=dace:cpu --which_modules=Remapping --which_rank=0 \
       --threshold_overrides_file=./fv3core/tests/savepoint/translate/overrides/standard.yaml \
       ./fv3core/tests/savepoint

Attached is the invalid.sdfgz (renamed in .zip because github)

@FlorianDeconinck
Copy link
Contributor Author

Plot twists, if all validation is deactivated the code compiles, runs and validates. Seems like overvalidation. Changing the issue title.

@FlorianDeconinck FlorianDeconinck changed the title v15 regression: temporary confusion with an inout v15 regression: temporary confusion with an inout (validating too hard) Nov 27, 2023
@tbennun
Copy link
Collaborator

tbennun commented Nov 27, 2023

Looks like an input (tmp35) is not added to the nested SDFG as an input, but is used as one. So validation fails. @alexnick83 any ideas on why something would be output-only?

@alexnick83
Copy link
Contributor

alexnick83 commented Nov 28, 2023

I had a quick look at the SDFG and I cannot find the issue reported by the validation routine. The error concerns inout connector a4_1 being connected to different input (__tmp35) and output (__g_self__q4_1). As far as I can see, all a4_1 connectors are connected to __tmp35. What may explain the issue is the fact that, in the outermost nested SDFG, __tmp35 is a connector connected to __g_self__q4_1. I am not sure, though, why the validation would look at the outermost data container. It needs to be further investigated.

@tbennun __tmp35 is indeed output only because, even though it is read in the (outermost) nested SDFG, it is first written. Therefore, whatever value it holds before execution of the nested SDFG is irrelevant and, based on the original design of the nested SDFGs, it must only generate an output connector. Maybe this concept needs to be revisited.

@tbennun
Copy link
Collaborator

tbennun commented Nov 28, 2023

That makes sense. I propose we change validation, not behavior

As for validation, there are two methods (utils.get_global_memlet_path_{src,dst}) that traverse arrays upwards to the top level SDFG. Since __tmp35 is not an input connector, the _src version of it stops traversal. I think we should change those validation methods to prefer the connector if exists, but use the other one if it doesn't.

@tbennun
Copy link
Collaborator

tbennun commented Nov 28, 2023

This is the commit that added them: 4b14a73

@alexnick83
Copy link
Contributor

I see the problem. I haven't considered that case. I will make a PR.

@alexnick83
Copy link
Contributor

PR is up. I tested the SDFG and it passes validation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants