Refactoring with GPT-5-Codex

Didn’t really need Next.js.

Next.js

Next.js is a very popular web framework, and usually LLMs (large language models) default to it when you ask for a website.

But it comes with a lot of concepts (Edge Runtime? API routes?) that I couldn’t really wrap my head around, so I have decided to finally ditch it in favor of something simpler.

Several times I got pinged by Vercel to update my old projects, and then I was faced with long changelogs, multiple migration guides, quite a few codemods (lots of respect for people who write code to modify code), and generally an unpleasant experience that ended with me giving up.

The other reason is that Next.js is tightly integrated with Vercel. Everything is awesome when you use Vercel; try deploying it on Cloudflare though.

And no, this isn’t because of this tweet (nitter, archived) from the Vercel CEO.

A third reason is that I have some GitHub Copilot credits to burn; as you’ll see below, GPT-5-Codex is actually quite capable now.

Cloudflare

I had only used Cloudflare Pages to host static content, such as this historical map of Washington D.C. because I got curious about its historical, square borders. (So I was pleasantly surprised when a less capable AI was able to generate this easily, and now I know about Andrew Ellicott Park and the South Corner Boundary Stone.)

Workers vs. Pages

I am totally at fault for this, but I accidentally deployed several “Workers” as opposed to “Pages”, because the latter was greyed out and hard to notice.

I wondered for a while why Netlify worked out-of-the-box but Cloudflare did not, until I finally noticed the “Pages” tab.

This Blog

My blog isn’t even Next.js; it’s Hugo, so I was able to use Cloudflare’s Hugo preset directly.

I did have to ~~set my HUGO_VERSION to a lower one~~ update my theme, PaperMod, but that was very straightforward.

Benu

Benu is a side project I wrote back in college, because the school menu website was not very user-friendly.

Its main feature was one-click comparison between dining halls, but has since acquired more functionality. Its main feature was one-click comparison between dining halls, but it has since acquired more functionality. (And the school menu website has sadly become less user-friendly.)

For Benu, I used Cloudflare’s Next.js preset. Everything worked until I clicked into it, and there was a nodejs_compat error; so I added that to my runtime configuration, and it worked.

I did use Vercel’s automatic caching feature, though, and I don’t think I can do that without messing around with OpenNext, so that’s a refactoring topic on my TODO list.

GPT-5-Codex

It is the last day of September, and I need to burn through my 300 premium request credits from GitHub Copilot.

So I asked GPT-5-Codex to rewrite my code, and it mostly worked.

CUE!

CUE! is a color checker that I wrote for fun. I rarely use it myself.

The functionality is extremely simple, so I asked GPT-5-Codex to rewrite it with Astro; I had to click “Allow” (because I don’t want YOLO mode), but it worked in one try without any intervention.

Here is the pull request.

You can see that I tried using the “Edge Runtime”, whatever that is, before giving up and rewriting the entire thing.

Secret Project

This secret project is more of a SPA, with some dynamic routing and data loading involved.

I tried using Astro, and while there were some bugs (buttons not working, etc.), pretty much everything worked after pointing it out.

I tried using Astro, and while there were some bugs (buttons not working, etc.), pretty much everything worked after pointing them out.

That was true until I hit deploy and got a 404 without any build errors. It turned out Astro had generated client/ and server/ for me, so I asked GPT-5-Codex for help. I decided to render my 100+ pages ahead of time, but that took 6 minutes on Cloudflare and forever on my laptop!

Then I researched concurrent builds, but the documentation said it was not recommended.

I also found out about Svelte, which is why, on another branch, GPT-5-Codex is rewriting my secret project in SvelteKit as of writing.

Thoughts

I learned about Astro, which is good for rich content, and SvelteKit, which is good for rich interactivity.

There is also Remix, but I haven’t needed to use it quite yet.

Speed

This is the bulk of my discoveries today, and also what prompted me to write this blog post.

Some time ago, this article on how LLMs feel like dialup came up on Hacker News, and today I really felt that. I was using GitHub Copilot’s GPT-4.1 before, occasionally Claude Sonnet 4; both felt relatively responsive. But GPT-5-Codex is very slow.

Task completion

Or that’s my perception: my mental model assigned the older models smaller tasks, such as test coverage, and their tasks spanned short periods of time, often coming back failing. GPT-5-Codex, on the other hand, would carefully lint, double-check, and in general take longer, but with a better degree of task completion.

I am also ultimately impressed with the refactoring. I think longer tasks like rewriting ~~in Rust~~, TDD, and test coverage can now be delegated to today’s frontier models, and that truly changes how I think about them.

Time

Let me emphasize this: GPT-5-Codex takes way longer.

Yes, I did give it a large task (to rewrite the entire thing), but it still takes a long time, even between approvals.

And yet it can achieve so much during a session.

Cost

Another article that I vaguely remember said that while AI models are performing better, the need for tokens is growing as well, so the cost is actually the same, if not higher.

I don’t know much about economies of scale in model providers, but from Cursor to Claude Code, I still have some observations:

Cost: because cost is projected to remain high, a subscription model is not sustainable. I think a lot of providers might be loss leaders at this point. See recent limits to Claude Code.
Maintenance: VS Code is not entirely open-source. There is Code OSS, but there are extensions like C/C++ that require VS Code and not a fork (especially not Cursor). Combine that with the inherent difficulty of maintaining an editor, and I think Cursor is not the way to go.
Competition: some say Anthropic’s Claude models are the best for programming; I think it is highly unlikely for a single provider to have a “moat” and remain the best provider, so tying oneself to a single provider is unwise. This includes OpenAI’s Codex and Claude Code.

Observation 1 means that this “golden age” of subscription-based pricing is probably short-lived; I should enjoy my $100/yr GitHub Copilot subscription while supplies last.

Observations 2 and 3 mean that for now, I am sticking with GitHub Copilot, at the benevolence of Microsoft, I guess.

(I am also hesitant to pay Anthropic $200 a month…)

Furthermore, while observation 3 means I prefer multi-provider platforms, observation 1 means that these are often expensive. OpenRouter, for example, is one I cannot happily afford if I were to have Claude Code levels of token usage.

GitHub Copilot’s pricing model is also extremely generous:

Students get the $100/yr tier for free
You get 300 premium requests each month
Each @Copilot session on GitHub costs 1 request
- (model unknown; maybe GPT-4.1?)
- Some can run for almost 30 minutes
Each session (user input) costs 1 request
- Even for long GPT-5-Codex sessions

Notes

Context: I use whatever GitHub Copilot in VS Code defaults to; my only MCP (Model Context Protocol) extension is Context7, which I asked the LLM to refer to once or twice.

LLMs: While this blog post mentions LLMs, none were used to write this blog post; I typed it on my laptop’s keyboard. I did ask GPT-5 to proofread, and give minor edits (think 1 character per line) though.

Task length: I remember trying to give GPT-4.1 and Claude Sonnet 4 long tasks, but they tended to return much earlier (and fail much more often) than GPT-5-Codex.

Next.js#

Cloudflare#

Workers vs. Pages#

This Blog#

Benu#

GPT-5-Codex#

CUE!#

Secret Project#

Thoughts#

Speed#

Task completion#

Time#

Cost#

Notes#