Skip to content

Commit 04007ff

Browse files
committed
Update ArmoRM blog
1 parent 51778ee commit 04007ff

File tree

1 file changed

+5
-5
lines changed
  • content/posts/2024-05-29-multi-objective-reward-modeling

1 file changed

+5
-5
lines changed

content/posts/2024-05-29-multi-objective-reward-modeling/index.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ This work is authored by [Haoxiang Wang*](https://haoxiang-wang.github.io/), [We
1919

2020
- **Code:** [https://github.com/RLHFlow/RLHF-Reward-Modeling](https://github.com/RLHFlow/RLHF-Reward-Modeling)
2121
- **Model:** [https://huggingface.co/RLHFlow/ArmoRM-Llama3-8B-v0.1](https://huggingface.co/RLHFlow/ArmoRM-Llama3-8B-v0.1)
22-
- **Technical Report:** To be released in June, 2024
22+
- **Technical Report:** [Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts](https://arxiv.org/abs/2406.12845)
2323
- **Contact:** Haoxiang Wang ([hwang264@illinois.edu](mailto:wx13@illinois.edu))
2424
---
2525
# Abstract
@@ -245,10 +245,10 @@ print(helpsteer_rewards_pred)
245245
If you find this work useful for your research, please consider citing:
246246

247247
```
248-
@article{wang2024interpretable,
249-
title={Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts},
250-
author={Wang, Haoxiang and Xiong, Wei and Xie, Tengyang and Zhao, Han and Zhang, Tong},
251-
year={2024}
248+
@article{ArmoRM,
249+
title={Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts},
250+
author={Haoxiang Wang and Wei Xiong and Tengyang Xie and Han Zhao and Tong Zhang},
251+
journal={arXiv preprint arXiv:2406.12845},
252252
}
253253
254254
@inproceedings{wang2024arithmetic,

0 commit comments

Comments
 (0)