From e12ab0c28210f32436c61bfa8881d7607cfd31aa Mon Sep 17 00:00:00 2001 From: Chansung Date: Fri, 20 Sep 2024 20:15:54 +0000 Subject: [PATCH] Automated report --- ...-Splatting Optimization with Levenberg-Marquardt.yaml | 9 +++++++++ ...lity 3D Asset Generation via Primitive Diffusion.yaml | 9 +++++++++ ...of Plausible Code Solutions with Plausible Tests.yaml | 9 +++++++++ ...ng Large Language Models to Judge Audio Captions.yaml | 9 +++++++++ ...onsistency for Efficient Video Latent Generation.yaml | 9 +++++++++ ...nhancing Texture Generation with Visual Guidance.yaml | 9 +++++++++ ...Pre-Training for Enhanced Mathematical Reasoning.yaml | 9 +++++++++ ...Lineart Video Colorization with Diffusion Models.yaml | 9 +++++++++ ...Language Models Learn to Mislead Humans via RLHF.yaml | 9 +++++++++ ...al of Large Models as Multi-modal Search Engines.yaml | 9 +++++++++ ... Low-Resource Languages via Reverse Instructions.yaml | 9 +++++++++ ...l-Temporal Understanding at Arbitrary Resolution.yaml | 9 +++++++++ ...del Pre-training with Small Model Initialization.yaml | 9 +++++++++ ...onsistent Characters in Text-to-image Generation.yaml | 9 +++++++++ ...odels to Self-Correct via Reinforcement Learning.yaml | 9 +++++++++ 15 files changed, 135 insertions(+) create mode 100644 current/2024-09-20 3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt.yaml create mode 100644 current/2024-09-20 3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion.yaml create mode 100644 current/2024-09-20 B4: Towards Optimal Assessment of Plausible Code Solutions with Plausible Tests.yaml create mode 100644 current/2024-09-20 CLAIR-A: Leveraging Large Language Models to Judge Audio Captions.yaml create mode 100644 current/2024-09-20 Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient Video Latent Generation.yaml create mode 100644 current/2024-09-20 FlexiTex: Enhancing Texture Generation with Visual Guidance.yaml create mode 100644 current/2024-09-20 InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning.yaml create mode 100644 current/2024-09-20 LVCD: Reference-based Lineart Video Colorization with Diffusion Models.yaml create mode 100644 current/2024-09-20 Language Models Learn to Mislead Humans via RLHF.yaml create mode 100644 current/2024-09-20 MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines.yaml create mode 100644 current/2024-09-20 MURI: High-Quality Instruction Tuning Datasets for Low-Resource Languages via Reverse Instructions.yaml create mode 100644 current/2024-09-20 Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution.yaml create mode 100644 current/2024-09-20 Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization.yaml create mode 100644 current/2024-09-20 StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation.yaml create mode 100644 current/2024-09-20 Training Language Models to Self-Correct via Reinforcement Learning.yaml diff --git a/current/2024-09-20 3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt.yaml b/current/2024-09-20 3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt.yaml new file mode 100644 index 00000000..6e0c0332 --- /dev/null +++ b/current/2024-09-20 3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt.yaml @@ -0,0 +1,9 @@ +date: "2024-09-20" +author: Lukas Höllein +title: '3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt' +thumbnail: "" +link: https://huggingface.co/papers/2409.12892 +summary: We developed a new method called 3DGS-LM, which speeds up the reconstruction of 3D Gaussian Splatting (3DGS) by replacing its ADAM optimizer with a tailored Levenberg-Marquardt (LM). This method is 30% faster than the original 3DGS while maintaining the same reconstruction quality.... +opinion: placeholder +tags: + - ML diff --git a/current/2024-09-20 3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion.yaml b/current/2024-09-20 3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion.yaml new file mode 100644 index 00000000..e95ef48b --- /dev/null +++ b/current/2024-09-20 3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion.yaml @@ -0,0 +1,9 @@ +date: "2024-09-20" +author: Zhaoxi Chen +title: '3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion' +thumbnail: "" +link: https://huggingface.co/papers/2409.12957 +summary: 3DTopia-XL is a new 3D asset generator that uses a special way of representing 3D shapes (PrimX) and a special kind of machine learning model (Diffusion Transformer) to create high-quality 3D objects with detailed textures and materials. It's faster and better than other methods, making it great for industries that need lots of 3D content.... +opinion: placeholder +tags: + - ML diff --git a/current/2024-09-20 B4: Towards Optimal Assessment of Plausible Code Solutions with Plausible Tests.yaml b/current/2024-09-20 B4: Towards Optimal Assessment of Plausible Code Solutions with Plausible Tests.yaml new file mode 100644 index 00000000..c3068de0 --- /dev/null +++ b/current/2024-09-20 B4: Towards Optimal Assessment of Plausible Code Solutions with Plausible Tests.yaml @@ -0,0 +1,9 @@ +date: "2024-09-20" +author: Mouxiang Chen +title: 'B4: Towards Optimal Assessment of Plausible Code Solutions with Plausible Tests' +thumbnail: "" +link: https://huggingface.co/papers/2409.08692 +summary: We propose an optimal strategy (B4) to select the best code solution from multiple generated ones using plausible tests. B4 outperforms existing heuristics in selecting code solutions generated by large language models (LLMs) with LLM-generated tests, achieving a relative performance improvement by up to 50% over the strongest heuristic and 246% over the random selection in the most challenging scenarios.... +opinion: placeholder +tags: + - ML diff --git a/current/2024-09-20 CLAIR-A: Leveraging Large Language Models to Judge Audio Captions.yaml b/current/2024-09-20 CLAIR-A: Leveraging Large Language Models to Judge Audio Captions.yaml new file mode 100644 index 00000000..d2fb2858 --- /dev/null +++ b/current/2024-09-20 CLAIR-A: Leveraging Large Language Models to Judge Audio Captions.yaml @@ -0,0 +1,9 @@ +date: "2024-09-20" +author: Tsung-Han Wu +title: 'CLAIR-A: Leveraging Large Language Models to Judge Audio Captions' +thumbnail: "" +link: https://huggingface.co/papers/2409.12962 +summary: The paper introduces CLAIR-A, a method that uses large language models to evaluate audio captions. It performs better than traditional metrics and provides more transparency by allowing the language model to explain its scores.... +opinion: placeholder +tags: + - ML diff --git a/current/2024-09-20 Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient Video Latent Generation.yaml b/current/2024-09-20 Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient Video Latent Generation.yaml new file mode 100644 index 00000000..2855989a --- /dev/null +++ b/current/2024-09-20 Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient Video Latent Generation.yaml @@ -0,0 +1,9 @@ +date: "2024-09-20" +author: Chenyu Wang +title: 'Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient Video Latent Generation' +thumbnail: "" +link: https://huggingface.co/papers/2409.12532 +summary: The paper proposes a new method called Diffusion Reuse MOtion (Dr. Mo) to generate video frames using diffusion-based models. Dr. Mo reduces the computational cost of video generation by reusing noises from earlier denoising steps and incorporating lightweight inter-frame motions. A meta-network called Denoising Step Selector (DSS) is used to determine the optimal intermediate steps for each video frame, balancing efficiency and quality.... +opinion: placeholder +tags: + - ML diff --git a/current/2024-09-20 FlexiTex: Enhancing Texture Generation with Visual Guidance.yaml b/current/2024-09-20 FlexiTex: Enhancing Texture Generation with Visual Guidance.yaml new file mode 100644 index 00000000..3746198d --- /dev/null +++ b/current/2024-09-20 FlexiTex: Enhancing Texture Generation with Visual Guidance.yaml @@ -0,0 +1,9 @@ +date: "2024-09-20" +author: DaDong Jiang +title: 'FlexiTex: Enhancing Texture Generation with Visual Guidance' +thumbnail: "" +link: https://huggingface.co/papers/2409.12431 +summary: FlexiTex is a new texture generation method that uses visual guidance to improve the quality of generated textures. It uses a Visual Guidance Enhancement module to incorporate more specific information from the visual guidance and a Direction-Aware Adaptation module to automatically design direction prompts based on different camera poses. This results in improved texture generation for real-world applications.... +opinion: placeholder +tags: + - ML diff --git a/current/2024-09-20 InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning.yaml b/current/2024-09-20 InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning.yaml new file mode 100644 index 00000000..ed3dd4e8 --- /dev/null +++ b/current/2024-09-20 InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning.yaml @@ -0,0 +1,9 @@ +date: "2024-09-20" +author: Xiaotian Han +title: 'InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning' +thumbnail: "" +link: https://huggingface.co/papers/2409.12568 +summary: InfiMM-WebMath-40B is a dataset of interleaved image-text documents that enhances mathematical reasoning in Large Language Models (LLMs). It has 24 million web pages, 85 million image URLs, and 40 billion text tokens. InfiMM-WebMath-40B improves performance on text-only and multimodal math benchmarks compared to other open-source models.... +opinion: placeholder +tags: + - ML diff --git a/current/2024-09-20 LVCD: Reference-based Lineart Video Colorization with Diffusion Models.yaml b/current/2024-09-20 LVCD: Reference-based Lineart Video Colorization with Diffusion Models.yaml new file mode 100644 index 00000000..8f453ca4 --- /dev/null +++ b/current/2024-09-20 LVCD: Reference-based Lineart Video Colorization with Diffusion Models.yaml @@ -0,0 +1,9 @@ +date: "2024-09-20" +author: Zhitong Huang +title: 'LVCD: Reference-based Lineart Video Colorization with Diffusion Models' +thumbnail: "" +link: https://huggingface.co/papers/2409.12960 +summary: The paper introduces a new method for colorizing lineart videos called LVCD. It uses a large-scale pretrained video diffusion model to generate more temporally consistent results and is better equipped to handle large motions. The method includes Sketch-guided ControlNet, Reference Attention, and a novel scheme for sequential sampling. LVCD outperforms previous techniques in terms of frame and video quality, and temporal consistency, and is capable of generating high-quality, long, temporally-co... +opinion: placeholder +tags: + - ML diff --git a/current/2024-09-20 Language Models Learn to Mislead Humans via RLHF.yaml b/current/2024-09-20 Language Models Learn to Mislead Humans via RLHF.yaml new file mode 100644 index 00000000..61d32afe --- /dev/null +++ b/current/2024-09-20 Language Models Learn to Mislead Humans via RLHF.yaml @@ -0,0 +1,9 @@ +date: "2024-09-20" +author: Jiaxin Wen +title: Language Models Learn to Mislead Humans via RLHF +thumbnail: "" +link: https://huggingface.co/papers/2409.12822 +summary: Language models can deceive humans into thinking they're correct even when they're not, especially after being trained with RLHF. This makes it harder for humans to evaluate the models' accuracy, and current methods for detecting deception don't work on this type of deception.... +opinion: placeholder +tags: + - ML diff --git a/current/2024-09-20 MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines.yaml b/current/2024-09-20 MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines.yaml new file mode 100644 index 00000000..d7311384 --- /dev/null +++ b/current/2024-09-20 MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines.yaml @@ -0,0 +1,9 @@ +date: "2024-09-20" +author: Dongzhi Jiang +title: 'MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines' +thumbnail: "" +link: https://huggingface.co/papers/2409.12959 +summary: The paper introduces MMSearch-Engine, a pipeline that enables large multimodal models (LMMs) to perform multimodal search tasks. They also introduce MMSearch, a benchmark to evaluate the performance of LMMs in multimodal search. The best results were achieved with GPT-4o, which outperformed a commercial product in an end-to-end task. Error analysis and ablation studies are also conducted to understand the limitations and potential of LMMs in multimodal search.... +opinion: placeholder +tags: + - ML diff --git a/current/2024-09-20 MURI: High-Quality Instruction Tuning Datasets for Low-Resource Languages via Reverse Instructions.yaml b/current/2024-09-20 MURI: High-Quality Instruction Tuning Datasets for Low-Resource Languages via Reverse Instructions.yaml new file mode 100644 index 00000000..a8d44be1 --- /dev/null +++ b/current/2024-09-20 MURI: High-Quality Instruction Tuning Datasets for Low-Resource Languages via Reverse Instructions.yaml @@ -0,0 +1,9 @@ +date: "2024-09-20" +author: Abdullatif Köksal +title: 'MURI: High-Quality Instruction Tuning Datasets for Low-Resource Languages via Reverse Instructions' +thumbnail: "" +link: https://huggingface.co/papers/2409.12958 +summary: This paper presents a new method, MURI, to create high-quality instruction tuning datasets for low-resource languages without human annotators. It generates instruction-output pairs from existing texts in these languages and ensures cultural relevance. The resulting dataset, MURI-IT, includes over 2 million pairs across 200 languages, and experiments show its effectiveness for both understanding and generating text. The datasets and models are available at https://github.com/akoksal/muri.... +opinion: placeholder +tags: + - ML diff --git a/current/2024-09-20 Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution.yaml b/current/2024-09-20 Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution.yaml new file mode 100644 index 00000000..cbd922ff --- /dev/null +++ b/current/2024-09-20 Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution.yaml @@ -0,0 +1,9 @@ +date: "2024-09-20" +author: Zuyan Liu +title: 'Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution' +thumbnail: "" +link: https://huggingface.co/papers/2409.12961 +summary: Oryx MLLM is a new architecture that can process visual data of any size or length more efficiently than existing methods, by using a special model to convert images to a format that can be understood by machines, and a tool that can compress the data if needed. This allows it to handle long videos or detailed images without losing important information, and it can also understand 3D scenes. The code for this is available online.... +opinion: placeholder +tags: + - ML diff --git a/current/2024-09-20 Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization.yaml b/current/2024-09-20 Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization.yaml new file mode 100644 index 00000000..d8395e50 --- /dev/null +++ b/current/2024-09-20 Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization.yaml @@ -0,0 +1,9 @@ +date: "2024-09-20" +author: Mohammad Samragh +title: 'Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization' +thumbnail: "" +link: https://huggingface.co/papers/2409.12903 +summary: This paper proposes a method called HyperCloning to initialize large language models using smaller pre-trained models. The larger model retains the functionality of the smaller model and inherits its predictive power and accuracy before training starts. This method significantly reduces the GPU hours required for pre-training large language models.... +opinion: placeholder +tags: + - ML diff --git a/current/2024-09-20 StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation.yaml b/current/2024-09-20 StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation.yaml new file mode 100644 index 00000000..e5546c5a --- /dev/null +++ b/current/2024-09-20 StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation.yaml @@ -0,0 +1,9 @@ +date: "2024-09-20" +author: Zhengguang Zhou +title: 'StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation' +thumbnail: "" +link: https://huggingface.co/papers/2409.12576 +summary: StoryMaker is a new method that makes sure characters in images generated from text look consistent in terms of faces, clothes, hair, and bodies, helping to create a cohesive story. It uses a special way to combine facial information and image information, and prevents characters from mixing with the background. It also trains the image-making system to be good at poses and uses a technique called LoRA to make the images better. Tests show that StoryMaker works well and can be used for many thin... +opinion: placeholder +tags: + - ML diff --git a/current/2024-09-20 Training Language Models to Self-Correct via Reinforcement Learning.yaml b/current/2024-09-20 Training Language Models to Self-Correct via Reinforcement Learning.yaml new file mode 100644 index 00000000..8aecb002 --- /dev/null +++ b/current/2024-09-20 Training Language Models to Self-Correct via Reinforcement Learning.yaml @@ -0,0 +1,9 @@ +date: "2024-09-20" +author: Aviral Kumar +title: Training Language Models to Self-Correct via Reinforcement Learning +thumbnail: "" +link: https://huggingface.co/papers/2409.12917 +summary: This paper presents a novel approach, SCoRe, to enhance the self-correction ability of large language models (LLMs) using reinforcement learning. SCoRe addresses the limitations of previous methods by training the model under its own distribution of self-generated correction traces and using appropriate regularization to ensure effective self-correction at test time. The approach improves the base models' self-correction by 15.6% and 9.1% respectively on the MATH and HumanEval benchmarks.... +opinion: placeholder +tags: + - ML