2026-05-28

20件

論文 Hugging Face 2026-05-26 HF ↑34

ResearchMath-14K: Scaling Research-Level Mathematics via Agents

The frontier of mathematics is defined by problems whose solutions are not yet known, yet it remains unclear whether language models can meaningfully engage with such problems without human intervention. A major obstacle is the lack of large-scale research-level math datasets. To this end, we introd...

#agent#fine-tuning

論文深掘り Hugging Face 2026-05-26 HF ↑32

DenoiseRL: Bootstrapping Reasoning Models to Recover from Noisy Prefixes

Reinforcement learning has become a central paradigm for advancing reasoning in large language models, yet most existing methods still depend on stronger teacher models or heavily curated difficult datasets, limiting scalable capability improvement. In this paper, we introduce DenoiseRL, a reinforce...

#llm#rl#benchmark

論文深掘り Hugging Face 2026-05-26 HF ↑32

MemTrace: Tracing and Attributing Errors in Large Language Model Memory Systems

Memory is essential for enabling large language models to support long-horizon reasoning, yet existing memory systems remain unreliable and difficult to debug. Tracing memory's dynamic evolution is crucial to understand how information is synthesized, propagated, or corrupted over time. In this work...

#llm#alignment#benchmark

論文深掘り Hugging Face 2026-05-26 HF ↑63

Agent Explorative Policy Optimization for Multimodal Agentic Reasoning

Vision-language models with extended reasoning succeed on complex problems, but many real-world problems require external tools that internal reasoning alone often cannot resolve. Agentic reasoning therefore interleaves two behaviors with a structural asymmetry: thinking (the self-contained default)...

#agent#multimodal#rl#benchmark

論文 Hugging Face 2026-05-26 HF ↑97

Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players

World models for interactive video generation have largely focused on single-agent settings, where future observations are generated from a single control signal. However, many generated environments require multi-agent interaction: multiple players, robots, or embodied agents act simultaneously wit...

#agent#diffusion#coding#robotics

論文 Hugging Face 2026-05-26 HF ↑52

From Pixels to Words -- Towards Native One-Vision Models at Scale

Current vision-language models (VLMs) typically stitch together separate image encoders and language decoders via multi-stage alignment, a modular framework that inevitably fragments pixel-level signals across frames and scatters early pixel-word interactions. In parallel, native VLMs, despite impre...

#multimodal#alignment

論文 Hugging Face 2026-05-26 HF ↑4

The Fragility of Chain-of-Thought Monitoring Across Typologically Diverse Languages

Chain-of-thought (CoT) monitoring has been proposed as a promising safety mechanism for detecting misaligned behavior in large language models. However, its reliability remains largely unexplored beyond English and across diverse model families. We present the first large-scale evaluation of CoT mon...

#alignment#benchmark#llm

論文 Hugging Face 2026-05-26 HF ↑72

ProRL: Effective Reinforcement Learning for Proactive Recommendation via Rectified Policy Gradient Estimation

Proactive Recommender Systems (PRSs) aim to guide user preference shift toward target items by generating paths of intermediate recommendations. Reinforcement learning (RL) provides a principled framework for optimizing such sequential decision tasks, as path rewards can naturally capture both short...

#rl

論文 Hugging Face 2026-05-26 HF ↑11

LiveBrowseComp: Are Search Agents Searching, or Just Verifying What They Already Know?

Are LLM-based search agents genuinely searching, or using the web to verify what they already know? We study this question on BrowseComp with three diagnostics. Our analysis reveals Intrinsic Knowledge Dependence (IKD): even with tool access, agents often rely on intrinsic knowledge -- information e...

#agent#benchmark#llm

論文 Hugging Face 2026-05-26 HF ↑38

Self-Improving Language Models with Bidirectional Evolutionary Search

Search has been proposed as an effective method for self-improving language models and agentic systems, both for post-training sample generation and for inference. However, widely used methods such as best-of-N sampling and tree search face two fundamental limitations: they are guided by sparse veri...

#agent#robotics#benchmark

ツール OpenAI 2026-05-28

OpenAI’s Frontier Governance Framework

Explore OpenAI’s Frontier Governance Framework and how our AI safety, security, and risk practices align with emerging EU and California regulations....

#alignment

モデル OpenAI 2026-05-27

Warp’s big bet on building open source with GPT-5.5

Warp uses GPT-5.5 and OpenAI models to coordinate coding agents across local, cloud, and open-source development workflows....

#agent#coding

企業動向 Microsoft Research 2026-05-28

Data Formulator 0.7: AI-powered data analytics for enterprise data

Data Formulator introduces AI-powered analytics for enterprise data workflows. Data teams can easily bring enterprise data into an AI-ready workspace where users can explore, analyze, and visualize data with AI agents to turn raw data into actionable insights. The post Data Formulator 0.7: AI-powere...

#agent

企業動向 Google Research 2026-05-28

A New Era of Innovation: Google Research at I/O 2026

General Science...

企業動向 OpenAI 2026-05-27

Cisco and OpenAI redefine enterprise engineering with Codex

Cisco and OpenAI are redefining enterprise engineering with Codex, helping Cisco scale AI-native development, accelerate AI Defense work, and automate defect remediation....

企業動向 OpenAI 2026-05-27

Building self-improving tax agents with Codex

See how OpenAI, Thrive, and Crete built a self-improving tax agent with Codex, automating filings, improving accuracy, and accelerating workflows....

#agent

論文深掘り arXiv 2026-05-27

Principled Algorithms for Optimizing Generalized Metrics in Multi-Label Learning

Many real-world classification tasks require predicting multiple labels per instance, necessitating the optimization of complex evaluation metrics such as the $F$-measure and Jaccard index. While the Empirical Utility Maximization (EUM) framework is natural for these population-level metrics, existi...

#benchmark

論文深掘り arXiv 2026-05-27

SwarmHarness: Skill-Based Task Routing via Decentralized Incentive-Aligned AI Agent Networks

Vast quantities of compute (GPU cycles on personal workstations, idle inference servers, and edge devices between jobs) go unused because no incentive-aligned protocol exists for their owners to share them safely and profitably. Existing approaches either require a trusted central coordinator (cloud...

#agent

論文深掘り arXiv 2026-05-27

Stance Detection in Prediction Markets: Addressing Imbalanced Trader Commentary via Counterfactual Augmentation and Market Context

Prediction markets such as Polymarket aggregate crowd beliefs into real-time probability estimates, and the comments traders post beneath each market contain rich directional stance signals that prices alone cannot capture. This work introduces the first stance detection study applied to prediction ...

#llm#fine-tuning

企業動向 Microsoft Research 2026-05-27

Extending Human Intelligence Through AI

Understanding AI as an extension of human intelligence—not a replacement for it—offers a more grounded path for building trustworthy AI systems. The post Extending Human Intelligence Through AI appeared first on Microsoft Research ....