Automated report

deep-diver · Sep 13, 2024 · 0177d62 · 0177d62
1 parent e892bbb
commit 0177d62
Show file tree

Hide file tree

Showing 10 changed files with 90 additions and 0 deletions.
diff --git a/...s Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers.yaml b/...s Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers.yaml
@@ -0,0 +1,9 @@
+date: "2024-09-13"
+author: Chenglei Si
+title: Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
+thumbnail: ""
+link: https://huggingface.co/papers/2409.04109
+summary: This paper presents a large-scale human study evaluating the ability of large language models (LLMs) to generate novel research ideas in natural language processing (NLP). The study compares LLM-generated ideas with those of human experts and finds that LLM-generated ideas are judged as more novel but slightly weaker on feasibility. The study also identifies open problems in building and evaluating research agents and proposes an end-to-end study design to further evaluate the research outcome o...
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-09-13 Can OOD Object Detectors Learn from Foundation Models?.yaml b/current/2024-09-13 Can OOD Object Detectors Learn from Foundation Models?.yaml
@@ -0,0 +1,9 @@
+date: "2024-09-13"
+author: Jiahui Liu
+title: Can OOD Object Detectors Learn from Foundation Models?
+thumbnail: ""
+link: https://huggingface.co/papers/2409.05162
+summary: The paper explores the use of generative models trained on large-scale open-set data to synthesize out-of-distribution (OOD) samples and enhance OOD object detection. The researchers introduce SyncOOD, a data curation method that leverages the capabilities of large foundation models to extract meaningful OOD data from text-to-image generative models. The synthetic OOD samples are then used to augment the training of a lightweight OOD detector, resulting in improved ID/OOD decision boundaries. Ex...
+opinion: placeholder
+tags:
+    - ML
diff --git a/...024-09-13 DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?.yaml b/...024-09-13 DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?.yaml
@@ -0,0 +1,9 @@
+date: "2024-09-13"
+author: Liqiang Jing
+title: 'DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?'
+thumbnail: ""
+link: https://huggingface.co/papers/2409.07703
+summary: DSBench is a benchmark for evaluating data science agents' performance in realistic tasks. It includes 466 data analysis tasks and 74 data modeling tasks, and state-of-the-art models struggle with most tasks, underscoring the need for further advancements in developing more practical, intelligent, and autonomous data science agents....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...HOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors.yaml b/...HOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors.yaml
@@ -0,0 +1,9 @@
+date: "2024-09-13"
+author: Thomas Hanwen Zhu
+title: 'DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors'
+thumbnail: ""
+link: https://huggingface.co/papers/2409.08278
+summary: DreamHOI is a new way to make 3D people interact with any object described in text, using a computer model that learns from lots of pictures and their descriptions. It uses a special kind of computer model called a diffusion model, and a new way to move a 3D person's body parts, to make the interaction look more realistic....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...ent/2024-09-13 FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally.yaml b/...ent/2024-09-13 FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally.yaml
@@ -0,0 +1,9 @@
+date: "2024-09-13"
+author: Qiuhong Shen
+title: 'FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally'
+thumbnail: ""
+link: https://huggingface.co/papers/2409.08270
+summary: This study introduces a method for accurately segmenting 3D Gaussian Splatting from 2D masks, which is faster and more accurate than existing methods. The key insight is that the rendering of 2D masks is a linear function with respect to the labels of each Gaussian, allowing for a globally optimal solution to be found via linear programming in closed form. This method is robust against noises and completes within 30 seconds, about 50 times faster than the best existing methods....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...2024-09-13 IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation.yaml b/...2024-09-13 IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation.yaml
@@ -0,0 +1,9 @@
+date: "2024-09-13"
+author: Yinwei Wu
+title: 'IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation'
+thumbnail: ""
+link: https://huggingface.co/papers/2409.08240
+summary: This paper introduces the IFAdapter, a method to improve the accuracy and fidelity of generated instances in Text-to-Image diffusion models. The IFAdapter incorporates additional appearance tokens and an Instance Semantic Map to guide the diffusion process and improve feature depiction. The paper also introduces a new benchmark and verification pipeline for evaluating the performance of models on the Instance Feature Generation task....
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-09-13 PiTe: Pixel-Temporal Alignment for Large Video-Language Model.yaml b/current/2024-09-13 PiTe: Pixel-Temporal Alignment for Large Video-Language Model.yaml
@@ -0,0 +1,9 @@
+date: "2024-09-13"
+author: Yang Liu
+title: 'PiTe: Pixel-Temporal Alignment for Large Video-Language Model'
+thumbnail: ""
+link: https://huggingface.co/papers/2409.07239
+summary: Fueled by the Large Language Models (LLMs) wave, Large Visual-Language Models (LVLMs) have emerged as a pivotal advancement, bridging the gap between image and text. However, video making it challenging for LVLMs to perform adequately due to the complexity of the relationship between language and spatial-temporal data structure. Recent Large Video-Language Models (LVidLMs) align feature of static visual data like image into latent space of language feature, by general multi-modal tasks to levera...
+opinion: placeholder
+tags:
+    - ML
diff --git a/...3 Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources.yaml b/...3 Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources.yaml
@@ -0,0 +1,9 @@
+date: "2024-09-13"
+author: Alisia Lupidi
+title: 'Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources'
+thumbnail: ""
+link: https://huggingface.co/papers/2409.08239
+summary: Source2Synth is a method that generates synthetic data points with real-world sources to improve dataset quality and teach LLMs new skills without relying on costly human annotations. This method improves performance in multi-hop question answering and tabular question answering tasks....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...owards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder.yaml b/...owards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder.yaml
@@ -0,0 +1,9 @@
+date: "2024-09-13"
+author: NaHyeon Park
+title: 'TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder'
+thumbnail: ""
+link: https://huggingface.co/papers/2409.08248
+summary: This paper proposes a method to improve the performance of text-to-image models by fine-tuning the text encoder and introducing techniques to enhance personalization, such as augmentation tokens, knowledge-preservation loss, and SNR-weighted sampling. The goal is to generate high-quality, diverse images using only a single reference image while reducing memory and storage requirements....
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-09-13 Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale.yaml b/current/2024-09-13 Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale.yaml
@@ -0,0 +1,9 @@
+date: "2024-09-13"
+author: Rogerio Bonatti
+title: 'Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale'
+thumbnail: ""
+link: https://huggingface.co/papers/2409.08264
+summary: The paper introduces Windows Agent Arena, a benchmark for evaluating multi-modal OS agents in a realistic Windows environment. It includes 150+ diverse tasks and can be evaluated in 20 minutes. The paper also introduces a new agent, Navi, which achieves a success rate of 19.5% in the Windows domain....
+opinion: placeholder
+tags:
+    - ML