To what part in the equations does each part of the loss correspond? #18

JappaB · 2019-08-28T14:33:45Z

Hi @yihui-he,
I found this issue: #7, where it is mentioned that there are 3 parts to the KL-Loss:

The normal bbox regression loss: loss_bbox (basically the mean of the bbox coordinate prediction)
bbox_pred_std_abs_logw_loss
bbox_pred_std_abs_mulw_loss

I have a couple of questions. Firstly, to what part of what formula in the paper does each of the above correspond.

Similarly, what do bbox_inside_weights and bbox_outside_weights and 'val' (in comments e.g. line 120) correspond to?

Secondly, I wondered how you backpropagate the gradients from the Loss function, as you use the 'StopGradient' function. Do you backpropagate the gradient from all three components trough the whole network? Or only the normal bbox regression Loss part?

I've never used caffe2 before, so it has taken quite a bit of work to get a feel for the code. As I am trying to implement your work in a (PyTorch) SSD, I want to be sure I do the correct things.

@EternityZY,
I saw you attempted to implement the KL-Loss in YOLOv3. Did you succeed?
As I'm trying to implement the KL-Loss in SSD (a Pytorch version), your YOLOv3 implementation might have some overlap/give some intuition. Would you be willing to you share your code?

The text was updated successfully, but these errors were encountered:

ethanhe42 · 2019-08-28T16:41:42Z

in Eq. 9, loss_bbox (bbox_pred_std_abs_mulw_loss) is the first term and bbox_pred_std_abs_logw_loss is the second term.
in bbox_inside_weights and bbox_outside_weights are alpha_in and alpha_out in smooth_l1_loss_op. Check detectron for details. u can ignore val. I just used it to denote the value before Abs.
There's nothing fancy about the loss function backpropagation. Sorry for any confusion! The loss values of loss_bbox and bbox_pred_std_abs_mulw_loss are the same, as u can see in training. In caffe2, loss_bbox (SmoothL1Loss) does not backprop the outside weight (bbox_pred_std_nexp, which has std). To backprop std, I use StopGradient in bbox_pred_std_abs_mulw_loss so that it only backprop std but not coordinates. I implemented this way just for a sanity check. Actually, bbox_pred_std_abs_mulw_loss can get the job done without StopGradient.

I'd love to see KL-Loss implemented in pytorch. let me know when it's done. :)

EternityZY · 2019-08-29T03:27:17Z

I did try to reproduce KL-Loss in yolov3 with TensorFlow, but it failed. During the training, bbox_pred_std_abs_logw_loss will be a very large negative number, resulting in a final loss=nan.

JappaB · 2019-08-29T12:40:49Z

@yihui-he,
Thanks for your swift response. I'm currently working on the PyTorch implementation. When I'm done I'll let you know, perhaps you can go through it as a sanity check. I'll open source the implementation.

I have another question. As you basically have two different losses, one for when |xg-xe| > 1 and another one otherwise. I was wondering on what range the predications of xe live. Or maybe more precise. Are the images resized to have a height and width between 0 and 1, resulting in |xg-xe| < 1 for almost all predictions..?

@EternityZY, that's unfortunate. If I get it to work in PyTorch, I'll let you know.

ethanhe42 · 2019-08-29T14:33:36Z

@JappaB you are right. bounding boxes are resized so that height and width are 1x1. It's just for robustness, which resembles smoothL1loss.
I heard from a reader that KL Loss had improvement on YOLO, but his implementation is not open source yet.

JappaB · 2019-08-29T14:35:31Z

Thanks again for the fast response. I'll close this issue for now, but perhaps I'll comment some other questions later down the line.

EternityZY · 2019-08-31T03:55:22Z

@yihui-he,
Thanks for your swift response. I'm currently working on the PyTorch implementation. When I'm done I'll let you know, perhaps you can go through it as a sanity check. I'll open source the implementation.

I have another question. As you basically have two different losses, one for when |xg-xe| > 1 and another one otherwise. I was wondering on what range the predications of xe live. Or maybe more precise. Are the images resized to have a height and width between 0 and 1, resulting in |xg-xe| < 1 for almost all predictions..?

@EternityZY, that's unfortunate. If I get it to work in PyTorch, I'll let you know.

OK！Waiting for your good news！

JappaB · 2019-09-02T14:17:36Z

@yihui-he,
Thanks for your swift response. I'm currently working on the PyTorch implementation. When I'm done I'll let you know, perhaps you can go through it as a sanity check. I'll open source the implementation.
I have another question. As you basically have two different losses, one for when |xg-xe| > 1 and another one otherwise. I was wondering on what range the predications of xe live. Or maybe more precise. Are the images resized to have a height and width between 0 and 1, resulting in |xg-xe| < 1 for almost all predictions..?
@EternityZY, that's unfortunate. If I get it to work in PyTorch, I'll let you know.

OK！Waiting for your good news！

I'm currently training my Pytorch SSD with it. So far the loss goes down in a way I would expect. I'll let you know when I finished training if it learned something interesting. I can't share the complete code of the SSD (yet). But I'll make the KL-Loss function public this week if it really works.

EDIT: never mind, I also get nans during training after some time.. I'll try to figure out why it happens.

JappaB · 2019-09-04T14:27:24Z

Alright, there was still a bug, but I'm able to train an SSD with it now and the result looks reasonable so far. Only tested it with a Pytorch SSD, but am fairly certain it should work with any detection framework. Don't have results comparing it with the normal loss function. I am still doing some hyperparameter tuning (learning rate, learning rate schedule etc.).

@yihui-he, how would you like to do this:

I'll make a repo containing only the loss function and you can link to it
I'll do a pull request and you add it to this repo

I don't have a preference.

To be clear, for now, it will only be a single file with the KL-Loss part implemented in Pytorch. Later I can look at sharing the SSD as well.

ethanhe42 · 2019-09-04T14:45:19Z

@JappaB I guess the second way is better, since this repo is based on caffe2

JappaB · 2019-09-05T16:44:40Z

@yihui-he, The SSD with the KL-Loss performs (quite a bit) worse than the SSD with the normal loss. Do you have an idea whether the person whom improved YOLO with the KL-LOSS, did it with YOLOv3 or YOLOv2/YOLO9000 or the original YOLO?

I think the number of reference/default boxes might be the cause (which is very high in SSD and in YOLOv3, but relatively low in Faster-RCNN after ROI Pooling and in YOLOv2). Because there are so many alphas this might introduce a lot of noise during learning, decreasing performance.
Also, do the alphas all become positive when you train it in your implementation? Because sometimes I can still get a negative loss, which is due to the -0.5 alpha part. (I'm sorry, but can't seem to get caffe2 installed on the server I work on...)

If it is not that, then I still might be doing something wrong in the implementation...

@EternityZY, If I'm more certain that I didn't screw up, I'll release the code..

ethanhe42 · 2019-09-05T19:38:44Z

@JappaB YOLO-Lite (mAP 79.4%) https://github.com/Stinky-Tofu/Stronger-yolo He told me mAP70 mAP75 mAP90 are improved 4%, 8%, 8% respectively on VOC2007 test, though mAP50 1% drops.

Make sense. This might be a problem.
The loss can be negative, which is normal. Alpha can be either positive or negative. btw, it is +0.5alpha (eq. 9)

JappaB · 2019-09-06T10:11:20Z

@yihui-he,

Sorry I meant +0.5 alpha (have that in the code as well).
I see YOLO Lite is a YOLOv3 type of network, so probably the performance dip is not due to the number of bounding boxes.
Interestingly, I ran an experiment where I put the absolute operator (torch.abs(...)) around the output of the alphas. I understand that the network then cannot give variances anymore in the range between 0 and 1, but the performance increased greatly with the exact same hyperparams. It now performs only a bit worse than the 'normal SSD' (about 2 percentage points on all mAP levels).
Last thing I can think of being a problem is the way I transform the bounding boxes to x1y1x2y2 format. If I understand correctly from your answer on the other Issue I posted (What to do with bounding box regression variance typically used when normalizing bounding box targets #19), you use the bounding box variance of cx, cy (the original MSCOCO 0.1, 0.1 or in your case the inversed 10, 10) x1,y1,x2,y2. I didn't do that as it seemed strange to use the variance calculated for cx, cy for x1,y1 and x2,y2 as they might have different variances (perhaps I misunderstood how they are calculated).
Therefore, I made the prior boxes in cx,cy,w,h format. Then, when encoding the targets, I first calculated the encoding how I would do normally and then transform this to x1,y1,x2,y2 format. Therefore the network does bounding box regression on the x1,y1,x2,y2 format, but I could use the variances from the cx,cy,w,h encoding. Perhaps there is a mistake in my reasoning. But otherwise, I think that the KL-Loss just doesn't improve the SSD..

ethanhe42 · 2019-09-06T16:24:34Z

@JappaB no way to debug this without looking at the code.
But I guess this can be a bug. In my repo, the transformation between xyxy and xywh is done on the pixel level: https://github.com/yihui-he/KL-Loss/blob/1c67310c9f5a79cfa985fea241791ccedbdb7dcf/detectron/utils/boxes.py#L78-L109
pay attention to - 1.

JappaB · 2019-09-07T11:36:11Z

@yihui-he , With pixel level, do you mean that this transformation takes place before any resizing to a fixed input size and before the image and the bounding boxes are scaled to be between 0 and 1?

I do use a transformation that is a bit different, without the -1. But as Ros Girshick mentions in the comment at the top of boxes.py: "in practice, as long as a model is trained and tested with a consistent convention either decision seems to be ok (at least in our experience on COCO)"

Btw, thanks for all your replies and thinking along.

def point_form(boxes):
    """ Convert prior_boxes to (xmin, ymin, xmax, ymax)
    representation for comparison to point form ground truth data.
    Args:
        boxes: (tensor) center-size default boxes from priorbox layers.
    Return:
        boxes: (tensor) Converted xmin, ymin, xmax, ymax form of boxes.
    """
    return torch.cat((boxes[:, :2] - boxes[:, 2:]/2,     # xmin, ymin
                     boxes[:, :2] + boxes[:, 2:]/2), 1)  # xmax, ymax


def center_size(boxes):
    """ Convert prior_boxes to (cx, cy, w, h)
    representation for comparison to center-size form ground truth data.
    Args:
        boxes: (tensor) point_form boxes
    Return:
        boxes: (tensor) Converted xmin, ymin, xmax, ymax form of boxes.
    """
    return torch.cat(((boxes[:, 2:] + boxes[:, :2])/2,  # cx, cy
                     boxes[:, 2:] - boxes[:, :2]), 1)  # w, h

ethanhe42 · 2019-09-08T02:59:46Z

@JappaB ok, maybe there're some other issues we don't know yet

JappaB · 2019-09-08T14:24:57Z

@yihui-he, If I take the time to break the SSD with KL-Loss out of my current repo and make a new repo with it (including a training, and an evaluation script) and share it with you. Do you think you have time to go through it?

ethanhe42 · 2019-09-08T21:45:44Z

@JappaB sure. point to the critical parts where u make changes

ethanhe42 · 2019-11-25T15:53:13Z

@JappaB @EternityZY FYI, YOLO with KL-loss is released https://github.com/wlguan/Stronger-yolo-pytorch

xixiobba · 2020-09-04T08:51:45Z

@JappaB @EternityZY FYI, YOLO with KL-loss is released https://github.com/wlguan/Stronger-yolo-pytorch
这个链接已经404了，有其他地址吗？

ethanhe42 · 2020-09-06T23:49:48Z

@xixiobba https://github.com/yihui-he/Stronger-yolo-pytorch

devdut1999 · 2021-10-26T09:43:47Z

@JappaB could you please share the KL LOSS with SSD if you have completed the implementation.

JappaB closed this as completed Aug 29, 2019

ethanhe42 mentioned this issue Nov 1, 2019

Sorry,How to add the KL-LOSS to faster-rcnn?I only find the fast-rcnn version #22

Closed

ethanhe42 pinned this issue Nov 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

To what part in the equations does each part of the loss correspond? #18

To what part in the equations does each part of the loss correspond? #18

JappaB commented Aug 28, 2019 •

edited

Loading

ethanhe42 commented Aug 28, 2019 •

edited

Loading

EternityZY commented Aug 29, 2019

JappaB commented Aug 29, 2019

ethanhe42 commented Aug 29, 2019 •

edited

Loading

JappaB commented Aug 29, 2019

EternityZY commented Aug 31, 2019

JappaB commented Sep 2, 2019 •

edited

Loading

JappaB commented Sep 4, 2019

ethanhe42 commented Sep 4, 2019

JappaB commented Sep 5, 2019

ethanhe42 commented Sep 5, 2019 •

edited

Loading

JappaB commented Sep 6, 2019

ethanhe42 commented Sep 6, 2019 •

edited

Loading

JappaB commented Sep 7, 2019 •

edited

Loading

ethanhe42 commented Sep 8, 2019

JappaB commented Sep 8, 2019

ethanhe42 commented Sep 8, 2019

ethanhe42 commented Nov 25, 2019

xixiobba commented Sep 4, 2020

ethanhe42 commented Sep 6, 2020

devdut1999 commented Oct 26, 2021

To what part in the equations does each part of the loss correspond? #18

To what part in the equations does each part of the loss correspond? #18

Comments

JappaB commented Aug 28, 2019 • edited Loading

ethanhe42 commented Aug 28, 2019 • edited Loading

EternityZY commented Aug 29, 2019

JappaB commented Aug 29, 2019

ethanhe42 commented Aug 29, 2019 • edited Loading

JappaB commented Aug 29, 2019

EternityZY commented Aug 31, 2019

JappaB commented Sep 2, 2019 • edited Loading

JappaB commented Sep 4, 2019

ethanhe42 commented Sep 4, 2019

JappaB commented Sep 5, 2019

ethanhe42 commented Sep 5, 2019 • edited Loading

JappaB commented Sep 6, 2019

ethanhe42 commented Sep 6, 2019 • edited Loading

JappaB commented Sep 7, 2019 • edited Loading

ethanhe42 commented Sep 8, 2019

JappaB commented Sep 8, 2019

ethanhe42 commented Sep 8, 2019

ethanhe42 commented Nov 25, 2019

xixiobba commented Sep 4, 2020

ethanhe42 commented Sep 6, 2020

devdut1999 commented Oct 26, 2021

JappaB commented Aug 28, 2019 •

edited

Loading

ethanhe42 commented Aug 28, 2019 •

edited

Loading

ethanhe42 commented Aug 29, 2019 •

edited

Loading

JappaB commented Sep 2, 2019 •

edited

Loading

ethanhe42 commented Sep 5, 2019 •

edited

Loading

ethanhe42 commented Sep 6, 2019 •

edited

Loading

JappaB commented Sep 7, 2019 •

edited

Loading