Add simple rule indexing implementation #319

tsandall · 2017-05-09T20:06:21Z

These changes add a simple rule indexing implementation to OPA. In the future, we can add more sophisticated strategies and perform analysis on queries to determine which indices ought to be used.

The first indexing strategy supports expressions of form eq(ref, term) (or vice versa) where ref is a ground, non-nested reference to a base document (input or data) and term is scalar, non-nested array, or var. The implementation builds a trie from the expressions in rule sets that match this pattern. The trie shrinks the search space required at evaluation time.

Previously the query compiler checked whether input was defined to catch two cases: 1) Input was required by query or transitive dependencies of query. Input was required if the input document was referenced at all. Motivation for this was to produce "correct" results when negation is used. In practice, this never proved to be very helpful. 2) Input document was specified multiple times, causing a conflict. Again, in practice, this never proved to be very helpful. In both cases, if the policy decision is *incorrect* someone has to look at (i) the input (ii) the data and (iii) the policy to understand why. Ultimately we expect users to push schema information into OPA so that we can validate inputs and data conform to those schema. In that case, if an input was not specified (or conflicting); the type checking should catch it.

These changes add basic indexing support to the compiler. Rules can be indexed based on their contents. In this case, the first indexing strategy we support is based on expressions of the form: eq(term, base_doc_ref) where the term doesn't require evaluation (except vars). In these cases, we construct a trie that allows us to efficiently search for rules that must be evaluated (rules that do not have to be evaluated are excluded by the search.) As part of these changes the virtual doc benchmark has been updated to include rules that are hit and missed by the rule indexer.

timothyhinrichs

I agree with removing this restriction. In practice I think most policies that use input will make it clear that they need input. Hidden dependencies on input seem unlikely. Eventually I could see this kind of analysis be provided upon request.

timothyhinrichs

This change gives asymptotic improvement in performance, assuming the policy uses the fragment that we're optimizing. Cool stuff!

tsandall added 2 commits May 3, 2017 13:21

Refactor Value.Find to use Ref instead of []string

db4df6a

tsandall requested a review from timothyhinrichs May 9, 2017 20:06

tsandall force-pushed the http-api-optimization branch from e305736 to fec318b Compare May 9, 2017 23:27

timothyhinrichs reviewed May 9, 2017

View reviewed changes

timothyhinrichs approved these changes May 9, 2017

View reviewed changes

tsandall merged commit 0267568 into open-policy-agent:master May 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add simple rule indexing implementation #319

Add simple rule indexing implementation #319

tsandall commented May 9, 2017

timothyhinrichs left a comment

timothyhinrichs left a comment

Add simple rule indexing implementation #319

Add simple rule indexing implementation #319

Conversation

tsandall commented May 9, 2017

timothyhinrichs left a comment

Choose a reason for hiding this comment

timothyhinrichs left a comment

Choose a reason for hiding this comment