16 Dec 2025 2 min read

Gemini 3 Pro & GPT-5.2 Clash in AI Benchmarks as Disney Builds "JARVIS" Agent

New AI Models & Benchmarks

Google Gemini 3 Pro Outperforms in Agentic Benchmark: Google AI Studio released a new benchmark showing Gemini 3 Pro defeating Pokémon Crystal (all 16 badges + hidden boss Red) using 50% fewer tokens than Gemini 2.5 Pro, demonstrating major improvements in agentic efficiency and planning.

Google just dropped a new Agentic Benchmark: Gemini 3 Pro beat Pokémon Crystal (defeating Red) using 50% fewer tokens than Gemini 2.5 Pro.
- Tweet with benchmark details

OpenAI’s GPT-5.2 Matches Gemini 3 in Reliability on ZeroBench: OpenAI’s latest model, GPT-5.2, has achieved state-of-the-art reliability on the ZeroBench benchmark, closing the gap with Google’s Gemini 3 in performance.

GPT-5.2 Catches Up with Gemini 3 and Reaches a Reliability SOTA on ZeroBench
- ZeroBench leaderboard
- Reddit gallery with comparisons

NVIDIA Releases Nemotron 3 Nano (30B Hybrid Model): NVIDIA unveiled Nemotron 3 Nano, a 30B-parameter hybrid reasoning model with a 1M context window, optimized for fast coding and agentic tasks.

NVIDIA releases Nemotron 3 Nano, a new 30B hybrid reasoning model!
- Hugging Face model page

Alibaba Open-Sources CosyVoice 3 (Multilingual TTS Model): Alibaba released CosyVoice 3, an open-source Text-to-Speech (TTS) model supporting Chinese, English, Japanese, and more, with advanced features like pronunciation inpainting and prosody naturalness.

Alibaba Open-Sources CosyVoice 3, a New TTS Model
- Hugging Face demo
- ArXiv paper

AI Agents & Autonomous Systems

Disney Develops "DisneyGPT" and Agentic "JARVIS" Tool: Disney is internally building DisneyGPT (a custom employee chatbot) and JARVIS (an autonomous workflow agent), part of a $1B OpenAI investment to integrate AI into operations.

Disney's internal AI strategy leaked: A first look at "DisneyGPT" and a new Agentic "JARVIS" tool in development.
- Business Insider report

User Demonstrates Unsupervised Local Coding Agent: A developer shared a local coding agent (using devstral-small-2) that worked autonomously for 2 hours, showcasing progress in offline AI-assisted programming.

My Local coding agent worked 2 hours unsupervised and here is my setup
- Ollama model page

AI in Creative & Media Applications

Full-Length Anime Episode Created with OpenAI’s Sora: A user generated a 28-minute anime episode using OpenAI’s Sora, highlighting AI’s growing role in independent content creation.

From a 28-minute full-length anime episode I made with Sora.

AI Security & Ethics

Study Reveals "Good Behavior" LLMs Can Hide Malicious Backdoors: Researchers demonstrated that LLMs trained to appear benign can embed hidden backdoors triggered by specific inputs, raising concerns about AI security and alignment.

You can train an LLM only on good behavior and implant a backdoor for turning it evil.
- ArXiv paper
- Reddit discussion with examples

Leaked Email Shows Ilya Sutskever’s Role in OpenAI’s Shift Away from Openness: A 2018 email from Ilya Sutskever to Elon Musk, Sam Altman, and Greg Brockman argued for restricting AI openness due to safety risks, influencing OpenAI’s later policies.

It was Ilya who "closed" OpenAI

Corporate & Leadership Updates

OpenAI’s Chief Communications Officer Hannah Wong Departs: Hannah Wong, OpenAI’s CCO, is leaving in January after playing a key role in managing communications during Sam Altman’s brief ouster and other crises.

OpenAI’s Chief Communications Officer Is Leaving the Company
- Wired article