Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add simple rule indexing implementation #319

Merged
merged 3 commits into from
May 10, 2017

Conversation

tsandall
Copy link
Member

@tsandall tsandall commented May 9, 2017

These changes add a simple rule indexing implementation to OPA. In the future, we can add more sophisticated strategies and perform analysis on queries to determine which indices ought to be used.

The first indexing strategy supports expressions of form eq(ref, term) (or vice versa) where ref is a ground, non-nested reference to a base document (input or data) and term is scalar, non-nested array, or var. The implementation builds a trie from the expressions in rule sets that match this pattern. The trie shrinks the search space required at evaluation time.

Previously the query compiler checked whether input was defined to catch
two cases:

1) Input was required by query or transitive dependencies of query.
Input was required if the input document was referenced at all.
Motivation for this was to produce "correct" results when negation is
used. In practice, this never proved to be very helpful.

2) Input document was specified multiple times, causing a conflict.
Again, in practice, this never proved to be very helpful.

In both cases, if the policy decision is *incorrect* someone has to look
at (i) the input (ii) the data and (iii) the policy to understand why.

Ultimately we expect users to push schema information into OPA so that
we can validate inputs and data conform to those schema. In that case,
if an input was not specified (or conflicting); the type checking should
catch it.
These changes add basic indexing support to the compiler. Rules can be
indexed based on their contents. In this case, the first indexing
strategy we support is based on expressions of the form: eq(term,
base_doc_ref) where the term doesn't require evaluation (except vars).
In these cases, we construct a trie that allows us to efficiently search
for rules that must be evaluated (rules that do not have to be evaluated
are excluded by the search.)

As part of these changes the virtual doc benchmark has been updated to
include rules that are hit and missed by the rule indexer.
Copy link
Member

@timothyhinrichs timothyhinrichs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with removing this restriction. In practice I think most policies that use input will make it clear that they need input. Hidden dependencies on input seem unlikely. Eventually I could see this kind of analysis be provided upon request.

Copy link
Member

@timothyhinrichs timothyhinrichs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change gives asymptotic improvement in performance, assuming the policy uses the fragment that we're optimizing. Cool stuff!

@tsandall tsandall merged commit 0267568 into open-policy-agent:master May 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants