← Back to Blog

The Short History of LLMs

TL;DR: LLMs have evolved rapidly since 2023. From ChatGPT 3.5 Turbo's accessible launch, through RAG and vector databases, to today's agentic coding tools like Claude Code—the way we write software has fundamentally changed in just three years.


I've been working with ChatGPT since ChatGPT 3.5 Turbo was released in March 2023. Honestly, I can't really believe it's only been about 3 years. It's changed a lot, and the tooling provided by various vendors has changed a lot. Every year feels even faster than the last.

3.5 Turbo (March 2023)

Turbo coming out was big. It was big because it was accessible, and it was cheap enough to actually roll out in production. People didn't fully understand what it was capable of, and there was a lot of rapid experimentation. It came out when I was working at a business intelligence deep research startup. We had a whole machine learning department that was using Hugging Face models to parse large amounts of text for meaningful signals. ChatGPT just kind of started taking over all product features. It was surreal. This was all off the back of interest rates shooting up after COVID, basically killing the startup I was at. I can only wonder what the market would look like if the AI hype hit at the same time as the low interest rate, easy money era of software development.

Vector Databases, Baby AGI, Google Slow to Market

At the time, it was identified that a vector database would be the best way to augment the memory of your AI. Pinecone was uniquely suited to fill that need. RAG (Retrieval Augmented Generation) started to emerge as a term. At this point, every news article coming out was basically saying that Google dropped the ball big time, considering AI was basically their thing for the last 10 years. Some other interesting things that happened were OpenAI partnering with Microsoft, and everyone was scratching their heads about why a non-profit seemed to be operating like this.

ChatGPT 4.0, RAG Takes Over, Early Agentic Workflows (Late 2023)

At this point, developer tooling was basically just Copilot, and asking ChatGPT through the console for code. It was pretty rough, but it was still pretty groundbreaking at the time. This was also the time that a lot of engineers basically formed an opinion about AI that hasn't changed. AI hallucinated a lot. It didn't have access to up-to-date data. Some people were messing about with giving it the ability to search Google by itself, extending the data it had available. At the time, it felt like the only way for AI to know about something topical was for it to have been trained on it. This is also where RAG came in, because if you scraped the data yourself (or had embeddings generated), you could get some reliable results.

I joined a new company to work on data scraping. I was scraping stuff that was a bit more difficult to get—content that needed a headless browser to access, load the JavaScript, and maybe interact with an SPA. Then I'd pipe all that data into a pipeline to parse it. Agents were basically just prompts on a loop until an exit clause was hit. I was messing about with the web scraping side, and being able to recover from failures. It felt like it had changed a lot from the previous year!

Model Improvements and Price Wars (2024)

2024 was a year of refinement rather than revolution. Models got cheaper, context windows expanded, and quality improved across the board. Claude, GPT-4, and Gemini were all competing on benchmarks and pricing. For me personally, not much changed in the day-to-day. I was still using the same workflows—chatting with AI, copying code back and forth, dealing with the occasional hallucination. The tools were better, but the fundamental experience remained the same.

MCP, Cursor, Windsurf, Claude Code (2025)

It was 2025. This year I went deep on learning Neovim. I'm really glad I did. It might be one of the most enjoyable ways to code I'd ever known. Before everything changed.

I found a cool way to use AI like Cursor, but in Neovim. It was a panel that I could open, prompt it to make changes to the code, and I could see the diffs and choose to accept or decline. It was pretty awesome, but it was still pretty error prone. A lot of people might have tried AI at this point, seen the hallucinations, encountered bugs with their software that easily became a massive time sink. Basically, 70% of the time you were moving quickly, but then 30% of the time you got stuck on something and would probably end up having to manually go in and fix it. Honestly, it was cool, but it still just wasn't that good.

Then MCP came out, and everyone and their dog was rolling out their own MCP server. Honestly, it made sense to have a standardized format for generating annotations for endpoints, and being able to call endpoints from a conversation was super convenient. It was also the next logical step in building more complex workflows. At this point, there was a lot of experimentation, and context rot really started to hit as a term. RAG seemed to be getting less popular in favour of just loading in more context. This is likely because the base models were getting more efficient and had larger context windows for a similar price.

Cursor emerged from forked VS Code and just kind of got market share. To be honest, as somebody who stopped using VS Code, I think Cursor is overrated. I don't really see why anyone would stay on it. But what did change was pricing models for AI coding. Claude Code came out, which was terminal native and fit really well into my existing workflow. The pricing model was also a fixed monthly cost. That's a pretty good deal. It means that I can just code with it and not worry about getting a huge bill at the end.

As of writing this, it's really only been 6 months since my summer of Claude. At this point, it seems like the discussion of vibe coding—what is vibe coding, and when to allow AI-generated code into your production codebase—got polarizing. I feel like slowly engineers are warming to AI as they try to get over their initial worries about hallucinations. It took me some time too, but realistically I got a subscription within a week of Claude Code coming out. So I've been using it for nearly as long as it has existed.

The sad thing is that most of my workflow is now just talking to ChatGPT and writing markdown documents. I don't get to use as many of the really satisfying Neovim keybinds that I set up. It genuinely feels like coding through the chat interface is the best way to work. Still reviewing critical pieces of code, but the amount of code that can be written in a short amount of time is staggering.

Looking Ahead

It's hard to predict where this goes next. If the last three years are any indication, whatever I write here will look naive in six months. But a few things feel inevitable: AI will get better at maintaining context across longer sessions, the line between "writing code" and "describing what you want" will continue to blur, and the developers who adapt will outpace those who don't.

What I'm most curious about is whether we'll look back at this period as the awkward adolescence of AI-assisted development, or whether the fundamental interaction model—chatting with an AI, reviewing its output—will persist. Either way, I'm glad I've been along for the ride.

Want to Work Together?

Get in touch to discuss your project needs.

Contact Me