2026-05-13

20件

← アーカイブ一覧

論文 深掘り Hugging Face 2026-05-11 HF ↑12

On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment

Tool-using LLM agents fail through trajectories rather than only final responses, as they may execute unsafe tool calls, follow injected instructions, comply with harmful requests, or over-refuse benign tasks despite producing a seemingly safe answer. Existing safety-alignment signals are largely re...

#agent#alignment#llm
論文 深掘り Hugging Face 2026-05-11 HF ↑81

δ-mem: Efficient Online Memory for Large Language Models

Large language models increasingly need to accumulate and reuse historical information in long-term assistants and agent systems. Simply expanding the context window is costly and often fails to ensure effective context utilization. We propose δ-mem, a lightweight memory mechanism that augments a fr...

#llm#agent#fine-tuning#benchmark
論文 深掘り Hugging Face 2026-05-11 HF ↑116

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

Recent large vision-language models (VLMs) remain fundamentally constrained by a persistent dichotomy: understanding and generation are treated as distinct problems, leading to fragmented architectures, cascaded pipelines, and misaligned representation spaces. We argue that this divide is not merely...

#multimodal#agent
論文 Hugging Face 2026-05-11 HF ↑43

World Action Models: The Next Frontier in Embodied AI

Vision-Language-Action (VLA) models have achieved strong semantic generalization for embodied policy learning, yet they learn reactive observation-to-action mappings without explicitly modeling how the physical world evolves under intervention. A growing body of work addresses this limitation by int...

#coding#robotics#benchmark
論文 Hugging Face 2026-05-11 HF ↑21

L2P: Unlocking Latent Potential for Pixel Generation

Pixel diffusion models have recently regained attention for visual generation. However, training advanced pixel-space models from scratch demands prohibitive computational and data resources. To address this, we propose the Latent-to-Pixel (L2P) transfer paradigm, an efficient framework that directl...

#diffusion#benchmark
論文 Hugging Face 2026-05-11 HF ↑6

MEME: Multi-entity & Evolving Memory Evaluation

LLM-based agents increasingly operate in persistent environments where they must store, update, and reason over information across many sessions. While prior benchmarks evaluate only single-entity updates, MEME defines six tasks spanning the full space defined by the multi-entity and evolving axes, ...

#llm#agent#benchmark
論文 Hugging Face 2026-05-11 HF ↑22

ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents

Computer Use Agents (CUAs) can act through both atomic GUI actions, such as click and type, and high-level tool calls, such as API-based file operations, but this hybrid action space often leaves them uncertain about when to continue with GUI actions or switch to tools, leading to suboptimal executi...

#agent#rl
企業動向 Microsoft Research 2026-05-13

mimalloc: A new, high-performance, scalable memory allocator for the modern era

mimalloc is an open-source, modern, scalable memory allocator that is a drop-in replacement for malloc and free. It is relatively small (~12K lines), with clear internal data structures, and is easy to build and integrate into other projects. It provides bounded worst-case allocation times (up to OS...

論文 深掘り arXiv 2026-05-12

Model-based Bootstrap of Controlled Markov Chains

We propose and analyze a model-based bootstrap for transition kernels in finite controlled Markov chains (CMCs) with possibly nonstationary or history-dependent control policies, a setting that arises naturally in offline reinforcement learning (RL) when the behavior policy generating the data is un...

#llm#rl#benchmark
モデル Microsoft Research 2026-05-13

GridSFM: A new, small foundation model for the electric grid

Introducing GridSFM, a small foundation model that can predict AC optimal power flow in milliseconds, boosting efficiency and unlocking cost savings. Learn how GridSFM gives grid operators direct visibility into congestion, stability, and system health. The post GridSFM: A new, small foundation mode...

論文 arXiv 2026-05-12

Learning, Fast and Slow: Towards LLMs That Adapt Continually

Large language models (LLMs) are trained for downstream tasks by updating their parameters (e.g., via RL). However, updating parameters forces them to absorb task-specific information, which can result in catastrophic forgetting and loss of plasticity. In contrast, in-context learning with fixed LLM...

#llm#rl
論文 arXiv 2026-05-12

OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation

Recent advances in joint audio-video generation have been remarkable, yet real-world applications demand strong per-modality fidelity, cross-modal alignment, and fine-grained synchronization. Reinforcement Learning (RL) offers a promising paradigm, but its extension to multi-objective and multi-moda...

#alignment#diffusion#rl#multimodal#fine-tuning