Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neo4j parametrized from existing SCollection #4719

Merged
merged 5 commits into from
Mar 1, 2023

Conversation

sumitsu
Copy link
Contributor

@sumitsu sumitsu commented Feb 23, 2023

Expands the Scio Neo4j API to expose the ability to execute parameterized Cypher queries in parallel based upon inputs from an existing SCollection, as supported by the Beam Neo4jIO implementation.

In the original Scio-Neo4j PR (#4488), it appears that there was some discussion regarding a prospective general approach to sharding / parallel-reading. The implementation in this PR makes no such attempt to be transparent to the user, nor to calculate shard sizes automatically, but merely to expose the functionality already present in Beam. The client is responsible for:

  1. determining how to map / group an existing SCollection into reasonably-scoped querying shards
  2. incorporating WHERE clause parameters which limit query scope into the Cypher String
  3. providing the function which maps input SCollection elements to values for those parameters

@codecov
Copy link

codecov bot commented Feb 24, 2023

Codecov Report

Merging #4719 (b7bbaed) into main (00533d3) will decrease coverage by 0.01%.
The diff coverage is 0.00%.

❗ Current head b7bbaed differs from pull request most recent head 0f64847. Consider uploading reports for the commit 0f64847 to get more accurate results

@@            Coverage Diff             @@
##             main    #4719      +/-   ##
==========================================
- Coverage   60.90%   60.89%   -0.01%     
==========================================
  Files         286      286              
  Lines       10484    10485       +1     
  Branches      761      761              
==========================================
  Hits         6385     6385              
- Misses       4099     4100       +1     
Impacted Files Coverage Δ
...rc/main/scala/com/spotify/scio/neo4j/Neo4jIO.scala 15.38% <0.00%> (ø)
.../spotify/scio/neo4j/syntax/SCollectionSyntax.scala 66.66% <0.00%> (-33.34%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@RustedBones
Copy link
Contributor

Thanks a lot for the PR. I took the liberty of updating your branch directly to address some comments.

.withSessionConfig(neo4jConf.sessionConfig)
.withTransactionConfig(neo4jConf.transactionConfig)
.withCypher(cypher)
.withParametersFunction(neo4jInType.to(_).asMap())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice! Yes, agree that this looks much better. 🙂

Just to confirm that my understanding is correct: it would be incumbent upon the user to write the Cypher query String such that the parameter names match the field names used by the (implicit) ValueType[T] implementation? (Of course, in most cases, it would probably be easiest to use a case class (with the existing CaseMapper-powered ValueType) and match the query to the case class field names.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct.
The write API works the same way: if the cipher refers to a field which is not in the model (after the case mapper transform), there will be a runtime exception.

@RustedBones RustedBones changed the title Neo4j parallel-query from existing SCollection Neo4j parametrized from existing SCollection Mar 1, 2023
@RustedBones RustedBones merged commit e2b92c7 into spotify:main Mar 1, 2023
@sumitsu sumitsu deleted the neo4j_read_with_params-20230223 branch March 1, 2023 15:30
farzad-sedghi pushed a commit to farzad-sedghi/scio that referenced this pull request Mar 6, 2023
* Neo4j parallel-query from existing SCollection

* Apply review comments

* Simplify tests

* revise scaladoc to be consistent with refactor

* Remove useless interpolation

---------

Co-authored-by: Michel Davit <micheld@spotify.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants