-
Notifications
You must be signed in to change notification settings - Fork 399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove intermediate casting in product #490
Remove intermediate casting in product #490
Conversation
I understand the motivation to return the product at full precision, I think the solution needs more work. Unfortunately they're not captured by current tests, but there are examples where this solution will produce C++ that doesn't compile. You can see an example changing |
Thanks. I was able to reproduce a failure. I will try to come up with a solution. |
I added Is this the direction we want to go? One could argue for moving more of this logic into the python and not in the templates, but other than for the exponential, this is pretty straightforward, so I am still leaning towards going this direction. Nevertheless, this fixes the failure mentioned above by @thesps. |
Cool! I'm wondering now whether instead of keeping both a "with cast" and "no cast" version of all the products, we can keep only the "no cast" versions (i.e. drop the And then the logic for both the XNOR and Exponent products would need to be anyway derived. But perhaps the Edit: pushed to local branch for CI |
About whether we should drop the |
Yeah I think these functions are so fundamental that it could be very confusing to have different behaviour in different places. So let's go for consistency and simplicity and make them all behave the same (-> return the full available precision). I've been meaning to check how to construct the lossless return type for the special operators but didn't get a chance to yet. |
I went ahead and made all the nnet::product terms not cast the output. It would be useful to look at all the product terms to make sure I am returning in the appropriate precision. |
I think this is the right direction. How about this for the
? I'm thinking for completeness, some of those binary/ternary products may need a wider type, since for ints/fixed point
->
I think we need that for all those that can return
|
I was struggling a bit with the exponential weight. I think I agree with all your suggestions. I will implement them and try the tests. |
I updated the code. I changed some I did not check the |
That's very strange. It's from |
So the error comes from:
What fixes the timing is doing an explicit cast: ReuseLoop:
for (int ir = 0; ir < rufactor; ir++) {
#pragma HLS PIPELINE II=1 rewind
int w_index = ir;
int in_index = ir;
int out_index = 0;
int acc_step = 0;
MultLoop:
for (int im = 0; im < block_factor; im++) {
#pragma HLS UNROLL
acc[out_index] += static_cast<typename CONFIG_T::accum_t>(CONFIG_T::template product<data_T, typename CONFIG_T::weight_t>::product(data[in_index], weights[w_index]));
// Increment w_index
w_index += rufactor;
// Increment in_index
in_index += rufactor;
if (in_index >= nin) {
in_index = ir;
}
// Increment out_index
if (acc_step + 1 >= multscale) {
acc_step = 0;
out_index++;
} else {
acc_step++;
}
}
} This effectively puts back the intermediate cast that we removed. What is your recommendation? |
Does the hasting help because the sizes become too big, or is the reason why this meets timing more of an artifact that the casting forces the addition tree to not be attempted to be done in one cycle? |
I saw that the generated HDL for that is now using the accumulator in the DSP, where previously it was using LUTs, I think. Anyway, I think it's okay to add the cast in, which should make the dense layer behave exactly as it used to. I think we should do this for all uses of product in dense layers for consistency. |
I added casts to break up the accumulates--basically wherever the product was preceded by a |
I think this is in good shape to merge now. But perhaps we hold off until merging the backends PR? I think this one should be easy to adapt, some files will just need to move to different places. |
Sounds good. I'll try to update it in the next few days. |
The quartus code also needs to be updated and tested. I will try to do it and also add tests for it. |
The |
I updated the quartus product to also not have The only remaining issue is the |
I've addressed it in #499. I just did it as a standalone since it's an issue already on master directly, but I included your test. Hopefully we merge that quickly and you can pull master here again. |
I think this passes all the pytests now. |
…cast Remove intermediate casting in product
This removes the intermediate casting of
scale*x
to ares_T
type in aBatchNormalization
, so casting ares_t
is only of the finalscale*x + bias
. I did this by introducingProduct_nocast
andmult_nocast
, which multiply and return in the full precision. That seemed easier and less error-prone than manually determining the resulting type in python, and I was hesitant to always make mult and Product return the full precision, though that would be straightforward.I did confirm that the simple test model is synthesizable with Vivado_HLS 2019.2.
This patch does fix some issues I saw working with low bit values on QONNX.