AI’s Next Phase Is Systems, Not Just Smarter Chatbots

Want to learn how to USE AI technology to make money and/or your life easier? Join our FREE AI community here: https://www.skool.com/ai-with-apex/about

AI’s Next Phase Is Systems, Not Just Smarter Chatbots

Today’s AI news points to a broader shift: the industry is moving beyond one-shot answers and toward full systems that can reason, retrieve, act across software, and run long workflows with real infrastructure behind them.

The common thread is simple: AI is being judged less by how clever a prompt response sounds, and more by whether it can do useful work reliably, efficiently, and at scale.

TL;DR

OpenAI says agentic work is expanding from coding into legal, finance, recruiting, and other internal functions, with more users delegating tasks estimated to take 30 minutes to 8+ hours of human work.
Google Research argues that reasoning can improve factual recall, not just problem-solving, by giving models more compute time and better retrieval cues.
Google DeepMind has folded computer use directly into Gemini 3.5 Flash, making browser, mobile, and desktop actions a built-in model capability.
MIT and Microsoft’s Murakkab shows that agent workflows can be dramatically cheaper and more energy-efficient when models, tools, and execution are optimized together.
NVIDIA’s NeMo Automodel reflects a quieter infrastructure trend: fine-tuning and training stacks are becoming more modular and more tightly integrated with Hugging Face-style workflows.

OpenAI says agents are becoming real work infrastructure

What happened
OpenAI published a new look at how agents are changing work, arguing that the key shift is from short prompt-response exchanges to delegated tasks that can run for minutes or hours. The company frames this around Codex usage and says agentic workflows are spreading well beyond engineering.

Why it matters
This is one of the clearest signals yet that AI labs are measuring success in terms of task delegation, not just chat quality. It also suggests the center of gravity is moving from technical copilots toward cross-functional workplace systems.

Key details

OpenAI published How agents are transforming work on June 25, 2026. https://openai.com/index/how-agents-are-transforming-work
OpenAI says agentic AI changes knowledge work from short interactions to delegated, long-horizon tasks that can run for minutes or hours. https://openai.com/index/how-agents-are-transforming-work
By May 2026, OpenAI reports that 80.6% of sampled individual users made at least one Codex request estimated to exceed 30 minutes of human work, 70.2% exceeded one hour, and 25.6% exceeded eight hours. https://openai.com/index/how-agents-are-transforming-work
OpenAI says Codex became the primary AI tool for every department internally, including Legal, Finance, and Recruiting, accounting for more than 85% of output tokens for the average OpenAI worker and 99.8% of weekly output tokens across OpenAI overall. https://openai.com/index/how-agents-are-transforming-work
Since August 2025, OpenAI says non-developer usage grew 137x for individual users, 189x for organizational users, and 12x within OpenAI. https://openai.com/index/how-agents-are-transforming-work
OpenAI notes that some human-time thresholds are estimated using an LLM-as-judge and should be treated as directional rather than definitive labor-market proof. https://openai.com/index/how-agents-are-transforming-work

Source links
https://openai.com/index/how-agents-are-transforming-work
https://openai.com/index/openai-to-acquire-ona/

Google Research says reasoning may help models recall facts

What happened
Google Research published new work arguing that reasoning traces can help language models retrieve correct facts that are already stored in their parameters. The point is subtle but important: reasoning may act as a recall mechanism, not just a logic engine.

Why it matters
This expands the usual story around chain-of-thought. Instead of treating reasoning only as a tool for math or step-by-step logic, Google suggests it can also improve factual access inside the model itself.

Key details

The post was published on June 24, 2026 by Google Research scientists Zorik Gekhman and Jonathan Herzig. https://research.google/blog/thinking-to-recall-how-reasoning-unlocks-parametric-knowledge-in-llms/
Google’s core claim is that allowing a model to generate a reasoning trace can unlock correct answers even for simple, single-hop factual questions. https://research.google/blog/thinking-to-recall-how-reasoning-unlocks-parametric-knowledge-in-llms/
Google identifies two mechanisms: a computational buffer effect from generating extra tokens, and factual priming, where related facts help retrieve the target fact. https://research.google/blog/thinking-to-recall-how-reasoning-unlocks-parametric-knowledge-in-llms/
In one experiment, replacing meaningful reasoning with repeated dummy text like “Let me think” still improved recall over reasoning-off baselines, supporting the compute-buffer hypothesis. https://research.google/blog/thinking-to-recall-how-reasoning-unlocks-parametric-knowledge-in-llms/
Google also found that if a reasoning trace includes even one hallucinated intermediate fact, the model becomes significantly less likely to produce the correct final answer. https://research.google/blog/thinking-to-recall-how-reasoning-unlocks-parametric-knowledge-in-llms/
The research used Gemini-2.5 Flash, Gemini-2.5 Pro, and Qwen3-32B, evaluated on SimpleQA Verified and EntityQuestions. https://research.google/blog/thinking-to-recall-how-reasoning-unlocks-parametric-knowledge-in-llms/

Source links
https://research.google/blog/thinking-to-recall-how-reasoning-unlocks-parametric-knowledge-in-llms/
https://research.google/blog/

Google DeepMind builds computer use directly into Gemini 3.5 Flash

What happened
Google DeepMind announced that computer use is now a built-in tool inside Gemini 3.5 Flash. Instead of keeping computer control as a separate specialty model, Google is integrating it into a mainstream fast model for developers.

Why it matters
This is a product-level sign that agent capabilities are becoming standard, not experimental. As computer use becomes native, developers get a simpler path to building systems that can operate software instead of only generating text.

Key details

The announcement was published on June 24, 2026. https://deepmind.google/blog/introducing-computer-use-in-gemini-3-5-flash/
Google says computer use is now a built-in tool in Gemini 3.5 Flash. https://deepmind.google/blog/introducing-computer-use-in-gemini-3-5-flash/
The capability had previously been available as a standalone Gemini 2.5 computer use model. https://deepmind.google/blog/introducing-computer-use-in-gemini-3-5-flash/
Google says developers can build agents that can see, reason, and take action across browser, mobile, and desktop environments. https://deepmind.google/blog/introducing-computer-use-in-gemini-3-5-flash/
Google positions the feature for long-horizon and enterprise automation tasks, including continuous software testing and knowledge work across professional applications. https://deepmind.google/blog/introducing-computer-use-in-gemini-3-5-flash/
Access is available through the Gemini API and Gemini Enterprise Agent Platform, with options for explicit user confirmation on sensitive actions and automatic stopping if indirect prompt injection is identified. https://deepmind.google/blog/introducing-computer-use-in-gemini-3-5-flash/

Source links
https://deepmind.google/blog/introducing-computer-use-in-gemini-3-5-flash/

MIT and Microsoft target the cost problem in AI agents with Murakkab

What happened
MIT highlighted a joint MIT-Microsoft system called Murakkab that helps optimize the design and deployment of agentic workflows. The idea is to let developers specify goals at a high level while the system chooses models, tools, execution order, and deployment setup.

Why it matters
As agent applications become more complex, orchestration itself is turning into a major performance and cost bottleneck. Murakkab is notable because it treats agent workflows as an optimization problem spanning accuracy, latency, energy use, and compute cost.

Key details

MIT News published the story on June 25, 2026. https://news.mit.edu/2026/improving-ai-agent-speed-and-energy-efficiency-0625
Murakkab was developed by researchers from MIT and Microsoft to optimize agentic workflow design and deployment. https://news.mit.edu/2026/improving-ai-agent-speed-and-energy-efficiency-0625
The system lets developers describe workflow intent without manually hard-coding the full stack of models, tools, execution order, and hardware choices. https://news.mit.edu/2026/improving-ai-agent-speed-and-energy-efficiency-0625
MIT says Murakkab can automatically choose models and tools, decide what runs in parallel versus sequence, and adapt deployment to constraints like latency and accuracy. https://news.mit.edu/2026/improving-ai-agent-speed-and-energy-efficiency-0625
In reported tests on workflows including video Q&A and code generation, Murakkab achieved requirements while using about 35% of the computation, about 27% of the energy, and less than 25% of the cost compared with other methods. https://news.mit.edu/2026/improving-ai-agent-speed-and-energy-efficiency-0625
In one example, it cut energy use by more than an order of magnitude with only about a 2% drop in accuracy. https://news.mit.edu/2026/improving-ai-agent-speed-and-energy-efficiency-0625

Source links
https://news.mit.edu/2026/improving-ai-agent-speed-and-energy-efficiency-0625

NVIDIA and Hugging Face point toward more modular fine-tuning infrastructure

What happened
NVIDIA’s NeMo Automodel documentation shows a continued push toward more standardized large-scale training and fine-tuning workflows that plug directly into familiar Hugging Face patterns. This is not the loudest story of the day, but it fits the larger move toward reusable AI infrastructure.

Why it matters
As teams move from experimentation to deployment, the stack around models matters more: loading, parallelism, scaling, and workflow portability all become practical differentiators. The quieter competition is increasingly about usable systems, not just bigger checkpoints.

Key details

NeMo Automodel is described as an open-source, PyTorch DTensor-native training library from NVIDIA. https://huggingface.co/docs/transformers/community_integrations/nemo_automodel_finetuning?utm_source=openai
It supports pretraining and fine-tuning for LLMs and VLMs with parallelism strategies including FSDP2, tensor, pipeline, expert, and context parallelism. https://huggingface.co/docs/transformers/community_integrations/nemo_automodel_finetuning?utm_source=openai
It is built around Hugging Face loading patterns such as AutoModel.from_pretrained(), with high-performance layer swaps and refined parallelism support. https://huggingface.co/docs/transformers/community_integrations/nemo_automodel_finetuning?utm_source=openai
On the Diffusers side, NeMo Automodel is also presented as Hugging Face native, with direct Hub loading and YAML-driven workflows from single-GPU to multi-node environments. https://huggingface.co/docs/diffusers/main/training/nemo_automodel?utm_source=openai

Source links
https://huggingface.co/docs/transformers/community_integrations/nemo_automodel_finetuning?utm_source=openai
https://huggingface.co/docs/diffusers/main/training/nemo_automodel?utm_source=openai

The throughline across all of these updates is that AI is becoming a coordinated stack: reasoning as retrieval, models as software operators, agents as workplace tools, and infrastructure as the discipline that makes the whole system usable. The next phase looks less like a better chatbot and more like a durable operating layer for digital work.

---

Want to learn how to USE AI technology to make money and/or your life easier? Join our FREE AI community here: https://www.skool.com/ai-with-apex/about