2026-05-13

20件

論文深掘り Hugging Face 2026-05-11 HF ↑12

On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment

Tool-using LLM agents fail through trajectories rather than only final responses, as they may execute unsafe tool calls, follow injected instructions, comply with harmful requests, or over-refuse benign tasks despite producing a seemingly safe answer. Existing safety-alignment signals are largely re...

#agent#alignment#llm

論文深掘り Hugging Face 2026-05-11 HF ↑81

δ-mem: Efficient Online Memory for Large Language Models

Large language models increasingly need to accumulate and reuse historical information in long-term assistants and agent systems. Simply expanding the context window is costly and often fails to ensure effective context utilization. We propose δ-mem, a lightweight memory mechanism that augments a fr...

#llm#agent#fine-tuning#benchmark

論文深掘り Hugging Face 2026-05-11 HF ↑116

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

Recent large vision-language models (VLMs) remain fundamentally constrained by a persistent dichotomy: understanding and generation are treated as distinct problems, leading to fragmented architectures, cascaded pipelines, and misaligned representation spaces. We argue that this divide is not merely...

#multimodal#agent

論文 Hugging Face 2026-05-11 HF ↑11

Missing Old Logits in Asynchronous Agentic RL: Semantic Mismatch and Repair Methods for Off-Policy Correction

Asynchronous reinforcement learning improves rollout throughput for large language model agents by decoupling sample generation from policy optimization, but it also introduces a critical failure mode for PPO-style off-policy correction. In heterogeneous training systems, the total importance ratio ...

#agent#llm#rl

論文 Hugging Face 2026-05-11 HF ↑43

World Action Models: The Next Frontier in Embodied AI

Vision-Language-Action (VLA) models have achieved strong semantic generalization for embodied policy learning, yet they learn reactive observation-to-action mappings without explicitly modeling how the physical world evolves under intervention. A growing body of work addresses this limitation by int...

#coding#robotics#benchmark

論文 Hugging Face 2026-05-11 HF ↑23

AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward

In this paper, we propose AlphaGRPO, a novel framework that applies Group Relative Policy Optimization (GRPO) to AR-Diffusion Unified Multimodal Models (UMMs) to enhance multimodal generation capabilities without an additional cold-start stage. Our approach unlocks the model's intrinsic potential to...

#multimodal#llm#diffusion#agent#alignment

論文 Hugging Face 2026-05-11 HF ↑21

L2P: Unlocking Latent Potential for Pixel Generation

Pixel diffusion models have recently regained attention for visual generation. However, training advanced pixel-space models from scratch demands prohibitive computational and data resources. To address this, we propose the Latent-to-Pixel (L2P) transfer paradigm, an efficient framework that directl...

#diffusion#benchmark

論文 Hugging Face 2026-05-11 HF ↑4

Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs

The continued improvements in language model capability have unlocked their widespread use as drivers of autonomous agents, for example in coding or computer use applications. However, the core of these systems has not changed much since early instruction-tuned models like ChatGPT. Even advanced AI ...

#agent#llm#coding

論文 Hugging Face 2026-05-11 HF ↑6

MEME: Multi-entity & Evolving Memory Evaluation

LLM-based agents increasingly operate in persistent environments where they must store, update, and reason over information across many sessions. While prior benchmarks evaluate only single-entity updates, MEME defines six tasks spanning the full space defined by the multi-entity and evolving axes, ...

#llm#agent#benchmark

論文 Hugging Face 2026-05-11 HF ↑22

ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents

Computer Use Agents (CUAs) can act through both atomic GUI actions, such as click and type, and high-level tool calls, such as API-based file operations, but this hybrid action space often leaves them uncertain about when to continue with GUI actions or switch to tools, leading to suboptimal executi...

#agent#rl

企業動向 OpenAI 2026-05-13

Building a safe, effective sandbox to enable Codex on Windows

Learn how OpenAI built a secure sandbox for Codex on Windows, enabling safe, efficient coding agents with controlled file access and network restrictions....

#agent#coding

企業動向 Microsoft Research 2026-05-13

mimalloc: A new, high-performance, scalable memory allocator for the modern era

mimalloc is an open-source, modern, scalable memory allocator that is a drop-in replacement for malloc and free. It is relatively small (~12K lines), with clear internal data structures, and is easy to build and integrate into other projects. It provides bounded worst-case allocation times (up to OS...

論文深掘り arXiv 2026-05-12

Model-based Bootstrap of Controlled Markov Chains

We propose and analyze a model-based bootstrap for transition kernels in finite controlled Markov chains (CMCs) with possibly nonstationary or history-dependent control policies, a setting that arises naturally in offline reinforcement learning (RL) when the behavior policy generating the data is un...

#llm#rl#benchmark

企業動向 NVIDIA 2026-05-13

NVIDIA, Ineffable Intelligence Team Up to Build the Future of Reinforcement Learning Infrastructure

Reinforcement-learning agents — AI systems that learn by trial and error — can convert computation into new knowledge. That’s the focus of a new engineering-level collaboration between NVIDIA and Ineffable Intelligence, the London-based AI lab founded by AlphaGo architect David Silver in the wake of...

#rl#agent

モデル Microsoft Research 2026-05-13

GridSFM: A new, small foundation model for the electric grid

Introducing GridSFM, a small foundation model that can predict AC optimal power flow in milliseconds, boosting efficiency and unlocking cost savings. Learn how GridSFM gives grid operators direct visibility into congestion, stability, and system health. The post GridSFM: A new, small foundation mode...

企業動向 Microsoft Research 2026-05-12

Advancing AI for materials with MatterSim: experimental synthesis, faster simulation, and multi-task models

MatterSim is expanding what AI can do for materials science—from faster large-scale simulations to MatterSim-MT, a new multi-task model for simulating properties beyond potential energy surfaces alone. The post Advancing AI for materials with MatterSim: experimental synthesis, faster simulation, and...

ツール NVIDIA 2026-05-13

Hermes Unlocks Self-Improving AI Agents, Powered by NVIDIA RTX PCs and DGX Spark

Agentic AI is changing the way users get work done. Following the success of OpenClaw, the community is embracing new open source agentic frameworks. The latest is Hermes Agent, which crossed 140,000 GitHub stars in under three months....

#agent

論文 arXiv 2026-05-12

AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward

#multimodal#llm#diffusion#agent#alignment

論文 arXiv 2026-05-12

Learning, Fast and Slow: Towards LLMs That Adapt Continually

Large language models (LLMs) are trained for downstream tasks by updating their parameters (e.g., via RL). However, updating parameters forces them to absorb task-specific information, which can result in catastrophic forgetting and loss of plasticity. In contrast, in-context learning with fixed LLM...

#llm#rl

論文 arXiv 2026-05-12

OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation

Recent advances in joint audio-video generation have been remarkable, yet real-world applications demand strong per-modality fidelity, cross-modal alignment, and fine-grained synchronization. Reinforcement Learning (RL) offers a promising paradigm, but its extension to multi-objective and multi-moda...

#alignment#diffusion#rl#multimodal#fine-tuning