Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To what part in the equations does each part of the loss correspond? #18

Closed
JappaB opened this issue Aug 28, 2019 · 21 comments
Closed

To what part in the equations does each part of the loss correspond? #18

JappaB opened this issue Aug 28, 2019 · 21 comments

Comments

@JappaB
Copy link

JappaB commented Aug 28, 2019

Hi @yihui-he,
I found this issue: #7, where it is mentioned that there are 3 parts to the KL-Loss:

  • The normal bbox regression loss: loss_bbox (basically the mean of the bbox coordinate prediction)

  • bbox_pred_std_abs_logw_loss

  • bbox_pred_std_abs_mulw_loss

I have a couple of questions. Firstly, to what part of what formula in the paper does each of the above correspond.

Similarly, what do bbox_inside_weights and bbox_outside_weights and 'val' (in comments e.g. line 120) correspond to?

Secondly, I wondered how you backpropagate the gradients from the Loss function, as you use the 'StopGradient' function. Do you backpropagate the gradient from all three components trough the whole network? Or only the normal bbox regression Loss part?

I've never used caffe2 before, so it has taken quite a bit of work to get a feel for the code. As I am trying to implement your work in a (PyTorch) SSD, I want to be sure I do the correct things.


@EternityZY,
I saw you attempted to implement the KL-Loss in YOLOv3. Did you succeed?
As I'm trying to implement the KL-Loss in SSD (a Pytorch version), your YOLOv3 implementation might have some overlap/give some intuition. Would you be willing to you share your code?

@ethanhe42
Copy link
Owner

ethanhe42 commented Aug 28, 2019

  • in Eq. 9, loss_bbox (bbox_pred_std_abs_mulw_loss) is the first term and bbox_pred_std_abs_logw_loss is the second term.
  • in bbox_inside_weights and bbox_outside_weights are alpha_in and alpha_out in smooth_l1_loss_op. Check detectron for details. u can ignore val. I just used it to denote the value before Abs.
  • There's nothing fancy about the loss function backpropagation. Sorry for any confusion! The loss values of loss_bbox and bbox_pred_std_abs_mulw_loss are the same, as u can see in training. In caffe2, loss_bbox (SmoothL1Loss) does not backprop the outside weight (bbox_pred_std_nexp, which has std). To backprop std, I use StopGradient in bbox_pred_std_abs_mulw_loss so that it only backprop std but not coordinates. I implemented this way just for a sanity check. Actually, bbox_pred_std_abs_mulw_loss can get the job done without StopGradient.

I'd love to see KL-Loss implemented in pytorch. let me know when it's done. :)

@EternityZY
Copy link

I did try to reproduce KL-Loss in yolov3 with TensorFlow, but it failed. During the training, bbox_pred_std_abs_logw_loss will be a very large negative number, resulting in a final loss=nan.

@JappaB
Copy link
Author

JappaB commented Aug 29, 2019

@yihui-he,
Thanks for your swift response. I'm currently working on the PyTorch implementation. When I'm done I'll let you know, perhaps you can go through it as a sanity check. I'll open source the implementation.

I have another question. As you basically have two different losses, one for when |xg-xe| > 1 and another one otherwise. I was wondering on what range the predications of xe live. Or maybe more precise. Are the images resized to have a height and width between 0 and 1, resulting in |xg-xe| < 1 for almost all predictions..?


@EternityZY, that's unfortunate. If I get it to work in PyTorch, I'll let you know.

@ethanhe42
Copy link
Owner

ethanhe42 commented Aug 29, 2019

@JappaB you are right. bounding boxes are resized so that height and width are 1x1. It's just for robustness, which resembles smoothL1loss.
I heard from a reader that KL Loss had improvement on YOLO, but his implementation is not open source yet.

@JappaB
Copy link
Author

JappaB commented Aug 29, 2019

Thanks again for the fast response. I'll close this issue for now, but perhaps I'll comment some other questions later down the line.

@JappaB JappaB closed this as completed Aug 29, 2019
@EternityZY
Copy link

@yihui-he,
Thanks for your swift response. I'm currently working on the PyTorch implementation. When I'm done I'll let you know, perhaps you can go through it as a sanity check. I'll open source the implementation.

I have another question. As you basically have two different losses, one for when |xg-xe| > 1 and another one otherwise. I was wondering on what range the predications of xe live. Or maybe more precise. Are the images resized to have a height and width between 0 and 1, resulting in |xg-xe| < 1 for almost all predictions..?

@EternityZY, that's unfortunate. If I get it to work in PyTorch, I'll let you know.

OK!Waiting for your good news!

@JappaB
Copy link
Author

JappaB commented Sep 2, 2019

@yihui-he,
Thanks for your swift response. I'm currently working on the PyTorch implementation. When I'm done I'll let you know, perhaps you can go through it as a sanity check. I'll open source the implementation.
I have another question. As you basically have two different losses, one for when |xg-xe| > 1 and another one otherwise. I was wondering on what range the predications of xe live. Or maybe more precise. Are the images resized to have a height and width between 0 and 1, resulting in |xg-xe| < 1 for almost all predictions..?
@EternityZY, that's unfortunate. If I get it to work in PyTorch, I'll let you know.

OK!Waiting for your good news!

I'm currently training my Pytorch SSD with it. So far the loss goes down in a way I would expect. I'll let you know when I finished training if it learned something interesting. I can't share the complete code of the SSD (yet). But I'll make the KL-Loss function public this week if it really works.

EDIT: never mind, I also get nans during training after some time.. I'll try to figure out why it happens.

@JappaB
Copy link
Author

JappaB commented Sep 4, 2019

Alright, there was still a bug, but I'm able to train an SSD with it now and the result looks reasonable so far. Only tested it with a Pytorch SSD, but am fairly certain it should work with any detection framework. Don't have results comparing it with the normal loss function. I am still doing some hyperparameter tuning (learning rate, learning rate schedule etc.).

@yihui-he, how would you like to do this:

  • I'll make a repo containing only the loss function and you can link to it
  • I'll do a pull request and you add it to this repo

I don't have a preference.

To be clear, for now, it will only be a single file with the KL-Loss part implemented in Pytorch. Later I can look at sharing the SSD as well.

@ethanhe42
Copy link
Owner

@JappaB I guess the second way is better, since this repo is based on caffe2

@JappaB
Copy link
Author

JappaB commented Sep 5, 2019

@yihui-he, The SSD with the KL-Loss performs (quite a bit) worse than the SSD with the normal loss. Do you have an idea whether the person whom improved YOLO with the KL-LOSS, did it with YOLOv3 or YOLOv2/YOLO9000 or the original YOLO?

  • I think the number of reference/default boxes might be the cause (which is very high in SSD and in YOLOv3, but relatively low in Faster-RCNN after ROI Pooling and in YOLOv2). Because there are so many alphas this might introduce a lot of noise during learning, decreasing performance.

  • Also, do the alphas all become positive when you train it in your implementation? Because sometimes I can still get a negative loss, which is due to the -0.5 alpha part. (I'm sorry, but can't seem to get caffe2 installed on the server I work on...)

If it is not that, then I still might be doing something wrong in the implementation...

@EternityZY, If I'm more certain that I didn't screw up, I'll release the code..

@ethanhe42
Copy link
Owner

ethanhe42 commented Sep 5, 2019

@JappaB YOLO-Lite (mAP 79.4%) https://github.com/Stinky-Tofu/Stronger-yolo He told me mAP70 mAP75 mAP90 are improved 4%, 8%, 8% respectively on VOC2007 test, though mAP50 1% drops.

  • Make sense. This might be a problem.
  • The loss can be negative, which is normal. Alpha can be either positive or negative. btw, it is +0.5alpha (eq. 9)

@JappaB
Copy link
Author

JappaB commented Sep 6, 2019

@yihui-he,

  • Sorry I meant +0.5 alpha (have that in the code as well).

  • I see YOLO Lite is a YOLOv3 type of network, so probably the performance dip is not due to the number of bounding boxes.

  • Interestingly, I ran an experiment where I put the absolute operator (torch.abs(...)) around the output of the alphas. I understand that the network then cannot give variances anymore in the range between 0 and 1, but the performance increased greatly with the exact same hyperparams. It now performs only a bit worse than the 'normal SSD' (about 2 percentage points on all mAP levels).

  • Last thing I can think of being a problem is the way I transform the bounding boxes to x1y1x2y2 format. If I understand correctly from your answer on the other Issue I posted (What to do with bounding box regression variance typically used when normalizing bounding box targets #19), you use the bounding box variance of cx, cy (the original MSCOCO 0.1, 0.1 or in your case the inversed 10, 10) x1,y1,x2,y2. I didn't do that as it seemed strange to use the variance calculated for cx, cy for x1,y1 and x2,y2 as they might have different variances (perhaps I misunderstood how they are calculated).
    Therefore, I made the prior boxes in cx,cy,w,h format. Then, when encoding the targets, I first calculated the encoding how I would do normally and then transform this to x1,y1,x2,y2 format. Therefore the network does bounding box regression on the x1,y1,x2,y2 format, but I could use the variances from the cx,cy,w,h encoding. Perhaps there is a mistake in my reasoning. But otherwise, I think that the KL-Loss just doesn't improve the SSD..

@ethanhe42
Copy link
Owner

ethanhe42 commented Sep 6, 2019

@JappaB no way to debug this without looking at the code.
But I guess this can be a bug. In my repo, the transformation between xyxy and xywh is done on the pixel level: https://github.com/yihui-he/KL-Loss/blob/1c67310c9f5a79cfa985fea241791ccedbdb7dcf/detectron/utils/boxes.py#L78-L109
pay attention to - 1.

@JappaB
Copy link
Author

JappaB commented Sep 7, 2019

@yihui-he , With pixel level, do you mean that this transformation takes place before any resizing to a fixed input size and before the image and the bounding boxes are scaled to be between 0 and 1?

I do use a transformation that is a bit different, without the -1. But as Ros Girshick mentions in the comment at the top of boxes.py: "in practice, as long as a model is trained and tested with a consistent convention either decision seems to be ok (at least in our experience on COCO)"

Btw, thanks for all your replies and thinking along.

def point_form(boxes):
    """ Convert prior_boxes to (xmin, ymin, xmax, ymax)
    representation for comparison to point form ground truth data.
    Args:
        boxes: (tensor) center-size default boxes from priorbox layers.
    Return:
        boxes: (tensor) Converted xmin, ymin, xmax, ymax form of boxes.
    """
    return torch.cat((boxes[:, :2] - boxes[:, 2:]/2,     # xmin, ymin
                     boxes[:, :2] + boxes[:, 2:]/2), 1)  # xmax, ymax

def center_size(boxes):
    """ Convert prior_boxes to (cx, cy, w, h)
    representation for comparison to center-size form ground truth data.
    Args:
        boxes: (tensor) point_form boxes
    Return:
        boxes: (tensor) Converted xmin, ymin, xmax, ymax form of boxes.
    """
    return torch.cat(((boxes[:, 2:] + boxes[:, :2])/2,  # cx, cy
                     boxes[:, 2:] - boxes[:, :2]), 1)  # w, h

@ethanhe42
Copy link
Owner

@JappaB ok, maybe there're some other issues we don't know yet

@JappaB
Copy link
Author

JappaB commented Sep 8, 2019

@yihui-he, If I take the time to break the SSD with KL-Loss out of my current repo and make a new repo with it (including a training, and an evaluation script) and share it with you. Do you think you have time to go through it?

@ethanhe42
Copy link
Owner

@JappaB sure. point to the critical parts where u make changes

@ethanhe42
Copy link
Owner

@JappaB @EternityZY FYI, YOLO with KL-loss is released https://github.com/wlguan/Stronger-yolo-pytorch

@xixiobba
Copy link

xixiobba commented Sep 4, 2020

@JappaB @EternityZY FYI, YOLO with KL-loss is released https://github.com/wlguan/Stronger-yolo-pytorch
这个链接已经404了,有其他地址吗?

@ethanhe42
Copy link
Owner

@devdut1999
Copy link

@JappaB could you please share the KL LOSS with SSD if you have completed the implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants