Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor COMPRESS-INSTRUCTIONS #830

Merged

Conversation

markasoftware
Copy link
Contributor

@markasoftware markasoftware commented Jun 30, 2022

Passes all tests, and produces output just as good as the original implementation on some basic benchmarks I ran.

(commits should probably be squashed)

Mark Polyakov added 4 commits June 27, 2022 13:51
I believe it's sound, but does not always optimize as well -- it does not yet implement some of the
repeated passes that the old COMPRESS-INSTRUCTIONS did.
I believe the rewrite is more or less up to the same performance level as the original (passes
tests, but manual testing in REPL indicates its sequences are often a few gates longer). However,
there is some bug wrt state prep that is causing some tests still to fail.
IT's not 100% functionally equivalent with the original, so I still need to run some manual tests to
evaluate its performance.
In last commit, flushed all existing queues. I misread the original implementation, which in fact
only flushed the sub-queue which was in "passing" mode, which is the queue involving the greatest
number of qubits.

As of this commit, the output of the new COMPRESS-INSTRUCTIONS seems just as good as the original
implementation.
@markasoftware
Copy link
Contributor Author

While the output is just as short as the output from the original COMPRESS-INSTRUCTIONS, my implementation is not exactly functionally identical. The main difference is that the original requires all queues to correspond to actual hardware objects, and has one "global" queue which does not correspond to a hardware object. My implementation allows each queue to use an arbitrary set of resources, not necessarily corresponding to a real hardware object, and does not have a concept of a "global" queue. This means my implementation can have multiple 3-qubit queues simultaneously, for example.

If these details of the original implementation are important for some reason (maybe speed in certain situations?), adding them to my branch will be easy.

@braised-babbage
Copy link
Contributor

braised-babbage commented Jun 30, 2022

This is well written code, and notably shorter than what it replaces, so to the extent that it behaves similarly (as a functional component of quilc) to what we had before, I'm happy to see it merged.

However, I am a bit hesitant to encourage piecemeal changes at this stage, in the sense that I don't think we have an entirely clear perspective on the compressor, i.e. what we want from it, what we expect, what sort of compromises we are making, and so forth. I have some limited perspective, of course: I know that we want peephole optimization to clean up after the translation to native code, I know that we additionally support rather generic "decompilation" for certain arity (e.g. take a 1Q sequence to ZXZ), and so forth. But to the extent that the compressor represents a kind of heuristic solution to some well posed but intractable problem, it might be valuable to articulate that problem a bit more explicitly. In other words, I think this PR looks good, but (to me) it's most exciting when it is in the context of a larger renewal of thinking on how quilc does its job.

Along similar lines, I wonder the extent to which we can or should think of compression as simply being a one-size fits all mechanism "parameterized" by the chip specification and defined compilers. As you note the original implementation makes a conscious decision to associate queues to hardware objects. I don't take this by itself as a good or bad decision, but I do take it as indicative that the original design envisioned a certain relationship between our understanding of "compression" and the device that we are compiling to.

One can imagine quite disparate architectures

  • the traditionally supported CZ/CNOT + 1Q rotations
  • ion trap devices with native high-arity gates
  • exchange only quantum dots
  • "discrete" gatesets (e.g. the Clifford + T work that recently gone into quilc)

I suspect (but might be wrong) that many of the comparisons made in comparing this PR against the old compressor are with respect to the first item in that list, because this is mostly what we have tested & benchmarked quilc on. It is in the spirit of quilc that many of the difference between the aforementioned architectures can come simply down to the available compilers on the chip and how they are exposed via hardware objects. On the other hand, it is not obvious to me that compressing well is quite as simple as that.

@markasoftware
Copy link
Contributor Author

The purpose of this PR is not to change the paradigm of compression in quilc from chip-specific to generic. In fact, my goal is not to change the behavior of COMPRESS-INSTRUCTIONS at all. I only changed the behavior when I believed the original behavior to be more a quirk of the implementation rather than an intentional, well-thought-out choice. (The old implementation creates all the state machines/queues that it needs right at the start. If your implementation is going to precreate all its queues, then hardware objects are a reasonable way to pick those queues. But in an implementation like my own where the queues are created on-demand, there is no reason to limit yourself to hardware objects afaik).

A more philosophical discussion about the purpose of the compressor is certainly a good idea before I start really messing with it in later PRs.

@braised-babbage
Copy link
Contributor

The purpose of this PR is not to change the paradigm of compression in quilc from chip-specific to generic. In fact, my goal is not to change the behavior of COMPRESS-INSTRUCTIONS at all. I only changed the behavior when I believed the original behavior to be more a quirk of the implementation rather than an intentional, well-thought-out choice. (The old implementation creates all the state machines/queues that it needs right at the start. If your implementation is going to precreate all its queues, then hardware objects are a reasonable way to pick those queues. But in an implementation like my own where the queues are created on-demand, there is no reason to limit yourself to hardware objects afaik).

A more philosophical discussion about the purpose of the compressor is certainly a good idea before I start really messing with it in later PRs.

That's very fair, and I didn't intend to impute any additional meaning to these changes than what you had written in your earlier comment. The second half of my remark was more just me running down a tangent. Regardless, I think this PR is good!

src/compressor/compressor.lisp Outdated Show resolved Hide resolved
ecpeterson
ecpeterson previously approved these changes Jul 3, 2022
Copy link
Contributor

@ecpeterson ecpeterson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m glad to see this. I agree that the compressor was not designed by pure principle, particularly around the “global” queue, and I think simplifying it is a good idea. Even if it were to underperform the old implementation, the new simplicity is appealing enough to me that I’d rather build atop it than hold on to the old one.

I also agree with Erik that the coming moments are a good opportunity to try to capture what the principle of a compressor should be and to pin further modifications against that. For that matter, while we’re all here, I’d like to draw everyone’s attention back to #486. Our compressor does some things that their rewriter cannot, and vice versa, which should lead both to an interesting integration problem and to fruitful food for thought about the definition of a compressor.

My only PR comment is superficial: there’s an if In flush-queue that I wish were a cond.

@markasoftware
Copy link
Contributor Author

Thanks for the feedback @ecpeterson.

I've been looking a little bit more into the speed of my implementation. Without a global queue, it does tend to create a greater number of queues than the original implementation, and is typically about 10% slower on circuits involving more than 2 qubits. I'd assume the extra flexibility could lead to better compression in some cases, so this isn't a problem IMO.

However, when increasing *global-queue-tolerance-threshold* to be greater than 3, the speed gets worse. On some circuits I tried, the performance was 50% worse than the original when *global-queue-tolerance-threshold* is set to 4. That's because if we start with size 4 queues, then my implementation creates size 3 queues next, then size 2, and finally size 1. On the other hand, the original implementation would create a size 4 global queue, but then immediately drop down to size 2 queues, then size 1.

I could use some guidance on whether there's any reason to compress with an "intermediate" sized queue (ie, a queue with size strictly between 2 and *global-queue-tolerance-threshold*). For example, I'm skeptical that compression on a size-3 queue will make additional progress on a sequence of instructions that was already compressed as part of a size-4 queue, since no new compilers are introduced until we go down to size 2.

In the examples I've been playing around with, my implementation doesn't compress noticeably better than the original. But maybe there are some situations where it could.

It's very easy for me to replicate the original behavior here if desired and jump straight down to size 2 queues during recursion. In fact, I have already been playing around with this, and it performs almost identically to the original in terms of both output quality and speed.

@markasoftware
Copy link
Contributor Author

I just pushed a change to restrict the recursive queue sizes to 2 as discussed in my previous comment.

IMO the PR is now ready to merge.

@ecpeterson
Copy link
Contributor

ecpeterson commented Sep 1, 2022

Sorry for forgetting to reply to your query! I'm not surprised about the speed degradation for "wide" queues, but I also don't think it's worth fretting about. You can jump to 2 or not bother. I think it's exceedingly rare that traversing all the intermediate queue sizes will yield an improvement — even if quilc were capable of optimal nQ compilation, which it is not. On the other hand, jumping to 2 is kind of a weird quirk that may spook some future maintainer. 🤷‍♀️

Merge it either way imo.

@stylewarning
Copy link
Member

Sorry for forgetting to reply to your query! I'm not surprised about the speed degradation for "wide" queues, but I also don't think it's worth fretting about. You can jump to 2 or not bother. I think it's exceedingly rare that traversing all the intermediate queue sizes will yield an improvement — even if quilc were capable of optimal nQ compilation, which it is not. On the other hand, jumping to 2 is kind of a weird quirk that may spook some future maintainer. 🤷‍♀️

Merge it either way imo.

I'm going to merge this, but @markasoftware could you add a comment to the source code with some of these sentiments so as to not spook future maintainers, and make a separate PR?

@stylewarning stylewarning merged commit c3c142f into quil-lang:master Sep 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants