How to use
The main insights for me at this point are:
- Things have moved on from a chat interface towards a more powerful and effective agent model of using LLMs.
- Agents are made more effective than LLMs by themselves because they incorporate a feedback mechanism (eg. compiling the code or running the tests) which then makes the LLM iterate on a solution (although it can come at a cost of higher LLM usage).
- Things have also moved on from crafting just the right prompt towards the more general idea of managing the context window.
- Managing the context window is important for getting better results, whether by provision of rules, MCP integrations, or writing precise prompts.
ThoughtWorks experimented with using Claude Code to add support for new languages to their code tool. They identified a few success factors:
- Code quality. What we have always considered to be important for humans in terms of good quality code, it also seems to be the case for agents. This means that when an agent works with well-written, modular clean code that has been designed with separation of concerns and some documentation, we maximise the opportunity it will produce good quality outputs.
- Library ecosystem. Was building the Python parser into CodeConcise easier because Python (the language CodeConcise is built on) already provides a standard module for converting Python code into ASTs? Our hypothesis is that while agents require a good set of tools to act upon the decisions they take, the quality of their output also depends on whether they can solve the problem using well-designed, widely established libraries.
- Training data. LLMs, and, by extension, AI agents, are known to produce better code on a programming language for which plenty of data is available as part of its original training dataset.
- Large language models. As agents are built on top of large language models, their performance is directly affected by the underlying performance of the models they use.
- The agent itself. This includes prompt engineering, workflow design and the tools at its disposal.
- Human pairs. It’s often the case that when we see people and agents working together we get the best of both worlds. Also, developers experienced with agents often know tricks that can guide them through the work and ultimately produce better outcomes.
Addy Osmani catalogued high level ideas for integrating AI into your workflow. He also provided a fairly thorough (but unfortunately rather verbose) list of techniques for context management.
Pete Hodgson also provides some useful high level tips on managing context.
How Anthropic teams use Claude Code is a good list of potential LLM use cases (I would disregard the Team Impact given that this is coming from Anthropic - too much likelihood of rose-tinted glasses there).
There are some good nuggets in Armin Ronacher’s Agentic Coding Recommendations, except I would empthatically recommend against letting the agent operate unattended by means of --dangerously-skip-permissions
or similar. As expected, there are stories online of agents running rm -rf /
and so on.
One specific technique recommended by Aider is using two separate models for a coding task: one in an Architect role and another to actually generate code based on the first model’s output. This is based on the assumption that some models are better than others at particular tasks.
Some more reflections from developers who have been incorporating agents into their workflow:
If you use Claude Code, Mastering Claude Code is a concise guide to setting up and using it effectively.
Backlog.md aims to facilitate collaboration with agents via a Markdown-based Kanban project board. The author suggests that having a Markdown task breakdown leads to good results with eg. Claude Code.
Model Context Protocol (MCP)
MCP has emerged as the main mechanism for connecting LLMs to different data sources and tools. Agents are MCP clients which connect to external MCP servers. A server may provide functionality such as interacting with the filesystem, or it may provide browser automation capabilities (eg. Playwright MCP).
Unfortunately, the security story for MCP is very much evolving (eg. see The Lethal Trifecta by Simon Willison) so use with care.
MCP server directories are mostly vibecoded slop so you’re probably better off with a web search when you need to find a server.
Prompting
Prompt Engineering Guide details various prompting techniques with references to relevant papers.
Addy Osmani has written a prompt engineering playbook.
More specific examples
It’s useful to observe how people actually use AI.
Mario Zechner describes a very structured approach to interacting with Claude Code, detailing an example of porting changes to 2D skeletal animation software from Java to C++.
Nicholas Carlini describes using AI for a variety of programming tasks and includes interaction logs.
Cloudflare Workers OAuth provider has been implemented with extensive LLM support and is perhaps a good cautionary tale as immediately upon release it generated a CVE and another CVE due to fairly basic vulnerabilities. The commit history contains the prompts that were used.
Going deeper

Context Engineering for Agents describes some approaches to context managemement that are implemented within agents.
If you are not satisfied with off-the-shelf agents, you can build your own. There are frameworks such as LangChain to facilitate implementation. Anthropic has a guide on building effective agents.
If you are not satisfied with off-the-shelf LLMs, fine-tuning is an option. This is primarily applicable to LLMs embedded in a product and probably only viable at an org level rather than individual level. The makers of Kiln, a tool for customising LLMs, explain When Fine-Tuning Actually Makes Sense: A Developer’s Guide].