Recent LLM tools updates
OpenAI released GPT-5 which appears to offer incremental improvements but no fundamental architectural breakthroughs. This suggests we may be hitting the top of the S-curve for transformer-based models. OpenAI also released two open-weight models with 20B and 120B parameters.
Cursor released a CLI tool to complement their IDE, reflecting the popularity of this mode of interaction with LLM agents.
Google launched Gemini CLI GitHub Actions in beta, allowing you to tag @gemini-cli
in issues and PRs for automated triage, code reviews, and task delegation.
Ollama launched Turbo, a cloud service running open-source models on datacenter hardware while maintaining their familiar API. This marks Ollama’s shift from purely local inference to hybrid cloud offerings.
I came across an interesting benchmark by METR which tracks the ability of agents to complete long-running software engineering tasks as measured by the time required for a human to complete the task. METR has found a doubling of task length every 7 months for the past 6 years. Most recently GPT-5 has been estimated to have a 80% chance of succeeding at a task that would take a human 26 minutes. This indicates improvements in the efficacy of LLMs within agent systems, as well as a broadening repertoire of tasks they can complete.