Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DependencyAnalysis misses dependency on non-exhaustive write-write #2727

Open
sergisiso opened this issue Oct 1, 2024 · 3 comments
Open

Comments

@sergisiso
Copy link
Collaborator

@hiker This is a simplified version of an issue we hit on NEMOv5, here my_val sometimes keeps its value for the next iteration, so it is essentially serialising the loop. Could we do something about this?

def test_new(fortran_reader):
    '''Tests can_loop_be_parallelised.'''
    source = '''program test
                integer :: i, my_val
                integer, dimension(10) :: array

                do i = 1, 10
                  if (array(i) > 3) then
                    my_val = array(i)
                    array(i) = my_val
                  else
                    array(i) = my_val
                  endif
                end do

                end program test'''

    psyir = fortran_reader.psyir_from_source(source)
    loop = psyir.children[0].children[0]
    dep_tools = DependencyTools()
    parallel = dep_tools.can_loop_be_parallelised(loop)
    assert parallel is False
@hiker
Copy link
Collaborator

hiker commented Oct 1, 2024

I don't see an easy way :( I tried to rewrite it as something like:

do i=1, 10
    if (array(i)<=3) then
      inner: do j= i, 1, -1
          if (array(j)>3) then
              array(i) = array(j)
              break  inner
          endif
     enddo inner    ! not sure what to do if there is no element >3 :)
enddo

I believe that should roughly be the same logic - for each element <=3 find the previous element >3

But I assume the dependency analysis will still see read and write accesses to the same index in different iterations (i and j), and there actually is a race condition (consider 4, 1, 1, in array... when the loop index 2 and 3 ... the 1s ... are executed in parallel, loop index 3 might read the value from loop 2, while loop 2 is writing ... ) except of course it doesn't matter, since the same value will be written in loop index 3 (the 4, either from array index 2 or array index 1).

But I think this rewrite should allow you to execute this in parallel (by disabling the DA). And of course, handling the problem what happens if the first element is <=3 ... my_val would be uninitialised, and the search loop won't find a value.

@sergisiso
Copy link
Collaborator Author

sergisiso commented Oct 1, 2024

I would like to try to identify the dependency in the original code as this leads to compliable but code with different semantics (psyclone adds a private(my_var) since it finds the write-write), and it's very difficult to find out where the difference in results come from.

What make it hard? Is it the fact that it is a "scalar"? If I do something like:

    def test_new2(fortran_reader):
        '''Tests can_loop_be_parallelised.'''
        source = '''program test
                    integer :: i
                    integer, dimension(10) :: array, my_val
                    do i = 1, 10
                        if (array(i) > 3) then
                          my_val(1) = 1
                          array(i) = my_val(1)
                        else
                          array(i) = my_val(1)
                        endif
                    end do
                    end program test'''

        psyir = fortran_reader.psyir_from_source(source)
        loop = psyir.children[0].children[0]
        dep_tools = DependencyTools()

        parallel = dep_tools.can_loop_be_parallelised(loop)
        assert parallel is False

It properly detects the write-write dependency on my_val in the outer loop

@sergisiso
Copy link
Collaborator Author

sergisiso commented Oct 1, 2024

In both cases, if in both condition bodies there was a my_val( = write (or my_val(1) =), they would be parallelisable with the [first]private(my_val).

Ignoring the scalar version for now, the problem seems to be that the write-write is not exhaustive to all branches, and therefore some are carried to the next iteration. Would it be possible to differentiate between:

  • exhaustive write-write -> can be parallelised with a [first]private
  • non-exhaustive write-write -> cannot be parallelised with a [first]private

@sergisiso sergisiso changed the title DependencyAnalysis misses dependency on scalar DependencyAnalysis misses dependency on non-exhaustive write-write Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants