2026-05-27

20件

論文深掘り Hugging Face 2026-05-25 HF ↑48

SpatialBench: Is Your Spatial Foundation Model an All-Round Player?

While spatial foundation models have demonstrated impressive performance on standard datasets, a critical question remains: are they truly all-round players capable of generalizing robustly across diverse downstream tasks, arbitrary viewpoints, shifting scene domains, varying input densities, and sp...

#alignment#robotics#benchmark

モデル深掘り OpenAI 2026-05-27

Warp’s big bet on building open source with GPT-5.5

Warp uses GPT-5.5 and OpenAI models to coordinate coding agents across local, cloud, and open-source development workflows....

#agent#coding

論文 Hugging Face 2026-05-25 HF ↑4

Beyond Final Answers: Auditing Trajectory-Level Hallucinations in Multi-Agent Industrial Workflows

Large Language Models (LLMs) are increasingly deployed as autonomous agents that reason, use tools, and act over multiple steps. Yet most hallucination benchmarks still evaluate only the final output, missing failures that originate in intermediate Thought-Action-Observation steps. We present Trajel...

#agent#benchmark#llm

論文 Hugging Face 2026-05-25 HF ↑17

The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence

We introduce the MiniMax-M2 series, a family of Mixture-of-Experts language models built around the principle that mini activations can unleash maximum real-world intelligence. The flagship M2 contains 229.9B total parameters with only 9.8B activated per token. Designed end-to-end for agentic deploy...

#agent#coding#rl#benchmark

論文 Hugging Face 2026-05-25 HF ↑13

Recursive Flow Matching

Generative models have emerged as a powerful paradigm for solving physics systems and modeling complex spatiotemporal dynamics. However, achieving high physical accuracy without incurring high computational cost remains a fundamental challenge, as existing approaches face a critical speed-fidelity t...

#diffusion#benchmark

論文 Hugging Face 2026-05-25 HF ↑8

Efficient Agentic Reinforcement Learning with On-Policy Intrinsic Knowledge Boundary Enhancement

Agentic reinforcement learning (RL) has proven effective for training LLM-based agents with external tool-use capabilities. However, we identify that agentic RL training induces increasing redundant tool calls and blurs the model's intrinsic knowledge boundary, where the model fails to distinguish w...

#agent#rl#llm#benchmark

論文 Hugging Face 2026-05-25 HF ↑18

Share More, Search Less: Collaborative Parallel Thinking for Efficient Test-Time Scaling

Test-Time Scaling (TTS) enhances the reasoning capabilities of large language models by allocating additional inference compute to explore the solution space. However, existing parallel TTS methods typically keep branches isolated during search: intermediate discoveries remain branch-private and can...

#speech#llm#benchmark

論文深掘り Hugging Face 2026-05-25 HF ↑7

VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions

Large language models (LLMs) have evolved into interactive agents that collaborate with users in real-world tasks. Effective collaboration in such settings increasingly depends on understanding the user beyond what is explicitly stated, as user intent is often reflected in fragmented daily interacti...

#agent#benchmark#llm

論文深掘り Hugging Face 2026-05-25 HF ↑3

Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini

We introduce Gemini Embedding 2, a native multimodal embedding model that allows embedding video, audio, image, and text modalities in a unified representation space. We leverage the multimodal capabilities of Gemini to produce embeddings for arbitrary combinations of interleaved inputs across all t...

#multimodal#benchmark

論文 Hugging Face 2026-05-25 HF ↑65

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Vision-language models (VLMs) commonly formulate visual grounding and detection as a coordinate-token generation problem, serializing each 2D box into multiple 1D tokens that are learned and decoded largely independently. This token-by-token decoding mismatches the coupled structure of box geometry ...

#coding#multimodal#benchmark

企業動向 OpenAI 2026-05-27

Cisco and OpenAI redefine enterprise engineering with Codex

Cisco and OpenAI are redefining enterprise engineering with Codex, helping Cisco scale AI-native development, accelerate AI Defense work, and automate defect remediation....

企業動向 OpenAI 2026-05-27

Building self-improving tax agents with Codex

See how OpenAI, Thrive, and Crete built a self-improving tax agent with Codex, automating filings, improving accuracy, and accelerating workflows....

#agent

論文 Hugging Face 2026-05-25 HF ↑6

MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation

Large language model (LLM) agents rely on reusable skills to solve complex tasks. However, existing skill creation approaches treat skills as isolated and static artifacts, limiting their reusability, reliability, and long-term improvement. We propose MUSE-Autoskill Agent (Memory-Utilizing Skill Evo...

#agent#benchmark#llm

企業動向 Microsoft Research 2026-05-27

Extending Human Intelligence Through AI

Understanding AI as an extension of human intelligence—not a replacement for it—offers a more grounded path for building trustworthy AI systems. The post Extending Human Intelligence Through AI appeared first on Microsoft Research ....

企業動向 OpenAI 2026-05-27

Election information and safeguards in 2026

Ahead of global elections, we’re helping people access information, supporting cyber defenders, and increasing AI transparency...

企業動向 Hugging Face 2026-05-27

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM...

#agent#benchmark

企業動向 Hugging Face 2026-05-27

Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL

Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL...

論文深掘り arXiv 2026-05-26

Deep-layer limit and stability analysis of the basic forward-backward-splitting induced network (II): learning problems

Deep unfolding neural networks derived from iterative optimization schemes and numerical ordinary/partial differential equations (ODEs/PDEs) have attracted much attention in data science over the last decade. Therein, numerous important network architectures were constructed from the basic forward-b...

企業動向 NVIDIA 2026-05-27

AI Factories: The New Infrastructure of Intelligence

AI factories are token factories, converting power into intelligence in real time. And as agentic AI scales and autonomous, always-on special agents are deployed in the enterprise, performance per watt and cost per token become the economics that matter....

#agent

企業動向 Google Research 2026-05-27

Private analytics via zero-trust aggregation

Security, Privacy and Abuse Prevention...