2026-05-07

17件

論文深掘り Hugging Face 2026-05-05 HF ↑18

OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents

Deep search has become a crucial capability for frontier multimodal agents, enabling models to solve complex questions through active search, evidence verification, and multi-step reasoning. Despite rapid progress, top-tier multimodal search agents remain difficult to reproduce, largely due to the a...

#agent#multimodal#rl#benchmark

論文深掘り Hugging Face 2026-05-05 HF ↑18

D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models

The landscape of high-performance image generation models is currently shifting from the inefficient multi-step ones to the efficient few-step counterparts (e.g, Z-Image-Turbo and FLUX.2-klein). However, these models present significant challenges for directly continuous supervised fine-tuning. For ...

#diffusion#fine-tuning#llm#multimodal#vision

企業動向深掘り OpenAI 2026-05-07

Parloa builds service agents customers want to talk to

Parloa leverages OpenAI models to power scalable, voice-driven AI customer service agents, enabling enterprises to design, simulate, and deploy reliable, real-time interactions....

#agent#speech

論文 Hugging Face 2026-05-05 HF ↑28

PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World

Synthesizing physics-grounded 3D assets is a critical bottleneck for interactive virtual worlds and embodied AI. Existing methods predominantly focus on static geometry, overlooking the functional properties essential for interaction. We propose that interactive asset generation must be rooted in fu...

#robotics#diffusion#multimodal#agent

論文 Hugging Face 2026-05-05 HF ↑86

Stream-T1: Test-Time Scaling for Streaming Video Generation

While Test-Time Scaling (TTS) offers a promising direction to enhance video generation without the surging costs of training, current test-time video generation methods based on diffusion models suffer from exorbitant candidate exploration costs and lack temporal guidance. To address these structura...

#speech#diffusion#benchmark

モデル OpenAI 2026-05-07

Scaling Trusted Access for Cyber with GPT-5.5 and GPT-5.5-Cyber

OpenAI expands Trusted Access for Cyber with GPT-5.5 and GPT-5.5-Cyber, helping verified defenders accelerate vulnerability research and protect critical infrastructure....

モデル OpenAI 2026-05-07

Advancing voice intelligence with new models in the API

Explore new realtime voice models in the OpenAI API that can reason, translate, and transcribe speech, enabling more natural and intelligent voice experiences....

#speech

企業動向 OpenAI 2026-05-07

Testing ads in ChatGPT

OpenAI begins testing ads in ChatGPT to support free access, with clear labeling, answer independence, strong privacy protections, and user control....

論文 Hugging Face 2026-05-05 HF ↑8

StableI2I: Spotting Unintended Changes in Image-to-Image Transition

In most real-world image-to-image (I2I) scenarios, existing evaluations primarily focus on instruction following and the perceptual quality or aesthetics of the generated images. However, they largely fail to assess whether the output image preserves the semantic correspondence and spatial structure...

#benchmark#llm

論文 Hugging Face 2026-05-05 HF ↑1

When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning

In single-stream autoregressive interfaces, the same tokens both update the model state and constitute an irreversible public commitment. This coupling creates a silence tax: additional deliberation postpones the first task-relevant content, while naive early streaming risks premature commitments th...

#llm#rl#benchmark

論文 Hugging Face 2026-05-05 HF ↑7

Lightning Unified Video Editing via In-Context Sparse Attention

Video editing has evolved toward In-Context Learning (ICL) paradigms, yet the resulting quadratic attention costs create a critical computational bottleneck. In this work, we propose In-context Sparse Attention (ISA), the first near-lossless empirical sparse framework tailored for ICL video editing....

モデル OpenAI 2026-05-07

Introducing Trusted Contact in ChatGPT

Introducing Trusted Contact in ChatGPT, an optional safety feature that notifies someone you trust if serious self-harm concerns are detected....

#alignment

企業動向 OpenAI 2026-05-06

Uber uses OpenAI to help people earn smarter and book faster

Uber uses OpenAI to power AI assistants and voice features that help drivers earn smarter and riders book faster across a global real-time marketplace....

#speech

企業動向 OpenAI 2026-05-06

How frontier firms are pulling ahead

OpenAI’s B2B Signals research shows how frontier enterprises deepen AI adoption, scale Codex-powered agentic workflows, and build durable competitive advantage....

#agent

モデル OpenAI 2026-05-06

Introducing ChatGPT Futures: Class of 2026

Meet the ChatGPT Futures Class of 2026—26 student innovators using AI to build, research, and drive real-world impact. Discover how this generation is redefining learning, creativity, and opportunity with ChatGPT....

論文深掘り arXiv 2026-05-06

Sharp Capacity Thresholds in Linear Associative Memory: From Winner-Take-All to Listwise Retrieval

How many key-value associations can a $d\times d$ linear memory store? We show that the answer depends not only on the $d^2$ degrees of freedom in the memory matrix, but also on the retrieval criterion. In an isotropic Gaussian model for the stored pairs, we show that top-1 retrieval, where every si...

#benchmark#coding

論文深掘り arXiv 2026-05-06

Transformed Latent Variable Multi-Output Gaussian Processes

Multi-Output Gaussian Processes (MOGPs) provide a principled probabilistic framework for modelling correlated outputs but face scalability bottlenecks when applied to datasets with high-dimensional output spaces. To maintain tractability, existing methods typically resort to restrictive assumptions,...

#benchmark