I’m just today after having my first real success with Claude (and generally with coding agents). I’ve played with Cursor in the past but am now trying Claude and others.
As mentioned in the article, the big trick is having clear specs. In my case I sat down for 2 hours and wrote a 12 step document on how I would implement this (along with background information). Claude went through step by step and wrote the code. I imagine this saved me probably 6-10 hours. I’m now reviewing and am going to test etc. and start adjusting and adding future functionality.
Its success was rooted in the fact I knew exactly how to do what it needed to do. I wrote out all the steps and it just followed my lead.
It makes it clear to me that mid and senior developers aren’t going anywhere.
That said, it was amazing to just see it go through the requirements and implement modules full of organised documented code that I didn’t have to write.
I get excellent results and don’t do anything like that. Basically I ask Claude to write code as I do. A small step at a time. I literally prompt it to do the next step I’d do and so on and so forth. I accept all changes immediate and then commit after every change and then review the diff. If Claude did some badness then I ask it to fix that. I typically also give references to existing code that I want it to model or functions to use.
This gives me excellent results with far less typing and time.
Exactly this. In my view people over-think the use of Claude in many cases.
Because it's marketed as AI and it takes a while to figure out that it's really quite limited. In my opinion there's not a lot of intelligence going on, It's great at translating a requirement and giving you an approximation of what you asked for, but there isn't really any "thinking" going on.
I think when they advertise "thinking" it just does a few more iterations of giving you the closest "number in your head from the clues you've given it (requirements).
I saw someone once say that LLMs are a kind of "word calculator" and I feel that's quite a good description.
It is much better at making todo lists than it is at completing them successfully. But I appreciate having “someone to chat with” when I’m puzzling over codes or what to do next. Especially when I’m getting started on something new I don’t know a lot about.
To be fair, if you turn on thinking mode on Llms and see the output their is some thinking/reasoning.
A simple example:
Prompt:Make it yellow
Think: the user wants something yellow but hasn't said what it is. Previously the user talked about creating a Button, so it must be the button but I should clarify by asking
Response: is it the button that you want yellow?
I guess it becomes philosophical, what is thinking? What I’ve notice it does all that but misses the absolute most obvious things or gets incredibly basic things wrong.
this part of LLMs is the pre-programmed human logic. The LLMs arent actually thinking, they're just going through a defined IF THEN loop based on whatever weights, if there is some ambiguity in a prompt the LLM is just programmed to prompt for more info. Its not actually thinking it needs to ask anything, its just coming back with low precision probability.
All of the recent improvements in the LLM's "thinking" have just been layers of programming on top of its statistical models. Which is why its becoming clear LLMs are really not advancing that much.
I don't understand why the diff isn't part and parcel of the AI dialog..
In VS code it shows me a diff with every contribution.
Same with other IDEs like cursor.
But Claude Code is a command line tool.
You can use the Claude code add on and it shows diffs in vscode and you can highlight code and ask questions. But it still works in the vscode terminal like Claude code the cli. It’s quite nice actually since you can flip back and forth.
This is how I do it. Instead of writing a large amount, I just make it generate one function at a time.
I dont want it to replace me, I replace reading the docs and googling or repetitive tasks.
Its a hit or miss sometimes but I get to review every snippet.
If I would generate alot of code at once like for a full project I would get brainfuck reviewing it.
I keep my normal development flow and iterate, no waterfall
Sometimes I do OP’s approach, sometimes yours, but in all cases, writing down what you need done in detailed English gets me to a better understand of what the hell I’m even doing.
Even if I wrote the same prompts and specs and then typed everything myself, it would have already been an improvement.
I've come to the conclusion that the best use for "AI" it typing faster than me. I work at a place with a very well defined architecture so basically implementation is usually very straight forward, Claude can follow it because as you said, it's following a spec of sorts.
On the other hand, there has been quite a few moments in the last week where I'm actually starting to question if it's really faster. Some of the random mistakes can be major depending on how quickly it gets something wrong. I feel like a computer game, I need to save every time I make progress (commit my work).
I'm still on the fence about it honestly.
Wouldn’t it be faster to write the code yourself at that point ?
What’s the advantage here for you with a process like this?
In your flow you also have multiple review steps and corrections as well adds even more friction.
I can see the advantage in what parent is describing however.
Sometimes its faster I write it, sometimes its not.
I prompt like: give me go structs for this json "pasted json" and write 2 functions to save it and load it from "nosql db I use"
That basically speeds up writing glue code
The business logic I write myself
Its faster to do the business logic myself than review what the Ai did.
> give me go structs for this json "pasted json" and write 2 functions to save it and load it from "nosql db I use"
Those tools existed before LLMs and were local, fast, and most importantly free.
Why people continue to re-invent tools and workflows we already have, I don't know. Perhaps they just like to be able say "but this uses AI!"
It was an example
I only use what I need, plain and simple, and its free yup, works local too.
Llms suck at large projects, so I break it down to small simple things, what I wrote is an example of a small thing, I keep the tasks that large.
I can write my own code. I just use LLMs to speed up. I use whatever workflow suits me the most and saves me from typing much
Code generating with LLMs is a reinvention of local tools that used to generate code, yes. Its an advanced autocomplete.
But give me another tool that can generate code from natural language full of typos. I dont think your claim that it existed before llms is correct at all. Ever code generation task could be handwritten to be reusable but that takes developer hours
> give me another tool that can generate code from natural language full of typos
Give me a tool that doesn't generate code full of errors.
Yeah. Read “Programming as Theory Building” by Naur [1] to understand why you need to still need to develop a theory of the problem and how to model it yourself lest the LLM concoct (an incorrect) one for you.
[1] https://gwern.net/doc/cs/algorithm/1985-naur.pdf
I don't know how many times now that I've seen these things claim to have run the code and show me the hallucinated output and then go on to develop an incorrect theory based on that hallucinated output.
I've never seen the CLI coding tools do anything like that. They're designed to integrate with the tools. If you're just using a chat interface then yes, you're likely to get some inconsistent behavior.
This was Gemini CLI in kilocode. Does it often. Sometimes it even imagines that it's done a build when it hasn't - imagines build errors and then sets out to fix them. I have it set so that it asks permission prior to running commandline tools so I know it hasn't actually run make.
I use Gemini CLI daily (work is a Google shop), directly (no kilocode). I've never seen anything like that.
I wonder if it could be something to do with the kilocode integration.
But, I do more commonly run with permission required for many operations, because I find it works much better if I help it every now and then. It can get stuck on some pretty simple stuff.
This article has been share a dozen times on HN, if you want to see some discussion check out: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
We need to ask LLMs to generate documentation, diagrams, FAQs in addition to code then. We all know what this means: keeping them up to date.
Has anyone managed to setup a "reactive" way to interact with LLMs in a codebase, so that when an LLM extend or updates some part of the territory, it also extends or updates the map?
Amazing, the article is 40 years old and still totally relevant today. And even more amazing is that many of today's IT managers seem unaware of its points.
This could be taken as praise.
I take it as a manifestation of the temporal bigotry in computer science. That anything not new is bad, which is absolutely untrue. Old is not bad, new is not good. Where something exists in time has almost not bearing on its quality. Most knowledge and good ideas do not survive.
Thanks for sharing this article.
> As mentioned in the article, the big trick is having clear specs
I've been building a programming language using Claude, and this is my findings, too.
Which, after discovering this, makes sense. There are a LOT of small decisions that go into programming. Without detailed guidance, LLMs will end up making educated guesses for a lot of these decision, many of which will be incorrect. This creates a compounding effect where the net effect is a wrong solution.
Can you (or anyone) share an example of such a specification document? As an amateur programmer experimenting with CC, it would be very helpful to understand the nature and depth of the information that is helpful.
I have multiple system prompts that I use before getting to the actual specification.
1. I use the Socratic Coder[1] system prompt to have a back and forth conversation about the idea, which helps me hone the idea and improve it. This conversation forces me to think about several aspects of the idea and how to implement it.
2. I use the Brainstorm Specification[2] user prompt to turn that conversation into a specification.
3. I use the Brainstorm Critique[3] user prompt to critique that specification and find flaws in it which I might have missed.
4. I use a modified version of the Brainstorm Specification user prompt to refine the specification based on the critique and have a final version of the document, which I can either use on my own or feed to something like Claude Code for context.
Doing those things improved the quality of the code and work spit out by the LLMs I use by a significant amount, but more importantly, it helped me write much better code on my own because I know have something to guide me, while before I used to go blind.
As a bonus, it also helped me decide if an idea was worth it or not; there are times I'm talking with the LLM and it asks me questions I don't feel like answering, which tells me I'm probably not into that idea as much as I initially thought, it was just my ADHD hyper focusing on something.
[1]: https://github.com/jamesponddotco/llm-prompts/blob/trunk/dat...
[2]: https://github.com/jamesponddotco/llm-prompts/blob/trunk/dat...
[3]: https://github.com/jamesponddotco/llm-prompts/blob/trunk/dat...
Good stuff. A minor observation:
> I use the Socratic Coder[1] system prompt to have a back and forth conversation about the idea. (prompt starts with: 1. Ask only one question at a time)
Why only 1? IMHO it's better to write a long prompt explaining yourself as much as possible (exercises your brain and you figure out things), and request as many questions to clarify as possible, review, and suggestions, all at once. This is better because:
I guess if you use a fast response conversational system like ChatGPT app it would make more sense. But I don't think that way you can have deep conversations unless you have a stellar working memory. I don't, so it's better for me to write and read, and re-write, and re-read...I do one question at a time so I don't feel overwhelmed and can answer questions with more details.
I start with an idea between <idea> tags, write as much as I possibly can between these tags, and then go one question at a time answering the questions with as much details as I possibly can.
Sometimes I'll feed the idea to yet another prompt, Computer Science PhD[1], and use the response as the basis for my conversation with the socratic coder, as the new basis might fill in gaps that I forgot to include initially.
[1]: https://github.com/jamesponddotco/llm-prompts/blob/trunk/dat...
[2]: Something like "Based on my idea, can you provide your thoughts on how the service should be build, please? Technologies to use, database schema, user roles, permissions, architectural ideas, and so on."
Is this true? I'm getting a feeling that most of this is adding external stucture when coding agents already provide a framework for it.
I've had moderate success in throwing a braindump at the llm, asking it to do a .md with a plan and then going with the implementation for it. Specialized thinking prompts seem like overkill (or dumbo-level coding skills are enough for me).
Love these, thanks.
What's the benefit of putting the original idea between <idea> tags when it seems to the main body of the prompt anyway? Or are you supplying the Socratic Coder prompt and the idea in the same prompt?
Mostly because the system prompt states "The user will provide the idea you will be working with as the first message between <idea> tags." and Claude loves XML tags, but if you're not including anything else in the message, it probably doesn't matter.
Thank you for sharing these prompts. These are excellent.
Thanks for sharing these prompts. Will certainly help with improving my LLM coding workflow.
Heh. I tried your socratic coder prompt as a Claude project and when sending it the Brainstorm Spec message after some back and forth, Claude responded with “Chat ended due to a prompt injection risk”.
You may want to turn these good prompts into slash commands! :)
They are subagents and slash commands, depending on the project. Eventually, I need to come up with a “dotclaude” repository with these and a few others I use in.
Edit: Sorry, I had a brain fart for a second, thought you were talking about other prompts. I prefer to keep those as chats with the API, not Claude Code, but yeah, they might work as slash commands too.
Wish we could star files in addition to repos
You mean like adding a bookmark, or downloading the files? Yeah, wish that was possible on the web.
Well I use GitHub stars kind of like coding/cool project/idea/whatever bookmarks, so yeah would be neat to be able to star just any file within a repo in addition to the repo itself
Here’s a write up of an experiment I did, with idea, spec, prompt, and Claude code commits.
http://pchristensen.com/blog/articles/first-impressions-of-v...
https://medium.com/machine-words/writing-technical-design-do... and https://medium.com/free-code-camp/how-to-write-a-good-softwa...
Search Claude-code Planning mode. You can use claude to help you write specs. Many YouTube videos, as well. I think spec docs are pretty personal and project specific....
I do a multistep process
Step 1: back and forth chat about the functionality we want. What do we want it to do? What are the inputs and outputs? Then generate a spec/requirements sheet.
Step 2: identify what language, technologies, frameworks to use to accomplish the goal. Generate a technical spec.
Step 3: architecture. Get a layout of the different files that need to be created and a general outline of what each will do.
Step 4: combine your docs and tell it to write the code.
Another useful approach is to "cheat" and point the LLM at an existing code base that implements the algorithms or patterns you want. Something like:
"Review <codebase> and create a spec for <algorithm/pattern/etc.>"
It gives you a good starting point to jump off from.
If you are amateur it better to do your time and not use ai tools for a while.
It will make you much bette at development to learn like a senior dev did today
Many mid and senior developers cannot write specs. I agree with the intent of your statement.
> That said, it was amazing to just see it go through the requirements and implement modules full of organised documented code that I didn’t have to write
Small side remark, but what is the value added of the AI generated documentation for the AI generated code. It's just a burden that increases context size whenever AI needs to re-analyse or change the existing code. It's not like any human is ever going to read the code docs, when he can just ask AI what it is about.
Leaving aside the value for humans, it's actually very valuable for the AI to provide indexed summary documents of what code goes where, what it does, and what patterns it uses, and what entry points and what it's API conventions are.
This is useful because if you just have Claude Code read all the code every time, it'll run out of context very quickly, whereas if you have a dozen 50 line files that summarize the 200-2000 lines of code they represent, they can always be fresh in context. Context management is king.
The way I use Claude is like first i ask about research about what i want to do, what's new, any little known stuff, etc. As soon i get something i like, I ask Claude to make a specification with everything it needs for making this happen (this is for me to understand how things will be done. Then i ask for an analysis of the specification and ways it can make it short, but in a way it understands the specification and can use it to make things happen. When i got the specification ready, I just upload the specification and ask: start building phase 2. And that's it. It just generates the code, i move it to the ide and start reading and change whatever i want. If i find something diferent from the specification, or a new development, i just update the specification.
This is sort of like asking “why do pilots still perform manual takeoffs and landing even though full autopilot is possible?” It’s because autopilot is intended to help pilots, not replace them. Too much could go wrong in the real world. Having some skills that you practice daily is crucial to remaining a good pilot. Similarly, it’s probably good to write some code daily to keep skills sharp.
1) when your cloud LLM has an outage, your manager probably still expects you to be able to do your work for the most part. Not to go home because openai is down lol. You being productive as an engineer should not depend on the cloud working.
2) You may want to manually write code for certain parts of the project. Important functions, classes, modules, etc. Having good auto-generated docs is still useful when using a traditional IDE like IntelliJ, WebStorm, etc.
3) Code review. I’m assuming your team does code review as part of your SDLC??? Documentation can be helpful when reviewing code.
> You being productive as an engineer should not depend on the cloud working.
lol where do you work? This obviously isn't true for the entire industry. If Github or AWS or your WiFi/ISP is down, productivity is greatly reduced. Many SaaS company don't have local dev, so rely on the cloud broadly being up. "Should" hasn't been the reality in industry for years.
Well, the only thing i need to write code is to be alive. No Github or AWS? No problem, have local copies of everything. No Claude? ok, i have local llm to give some help. So, internet is not so needed to write code. No IDE's just a CLI? Sure all i need is a text editor and a compiler/linker working. No computer or electricity? Get a pen and paper and start writing code on paper, will get to the computer when possible. I do not depend on cloud working to be productive.
No pen a and paper? Compile it on fleshware.
It's not a question of what you can do, but where the comfort level reduction outweighs the project importance/pay.
My company deploys everything in the cloud, but we can still do some meaningful work locally for a few hours if needed in a pinch. Out gitlab is self hosted because that’s extra critical.
I continue writing code and unit tests while i wait for the cloud to work again. If the outage is a long time, I may even spin up “DynamoDB Local” via docker for some of our simpler services that only interact with DynamoDB. Our apache flink services that read from kafka are a lost cause obviously lol.
It’s also a good opportunity to tackle any minor refactoring that you’ve been hoping to do. Also possible without the cloud.
You can also work on _designing_ new features (whiteboarding, creating a design document, etc). Often when doing so you need to look at the source to see how the current implementation works. That’s much easier with code comments.
You said productivity is greatly reduced. He said productivity should not stop.
You are making different points.
Examples. Examples are helpful for both humans and LLMs, especially if you have a custom framework or are using an unusual language. And I find I can generate ~10 good examples with LLMs in the time it would take me to generate ~3 good examples manually.
It's entirely possible that the parameters that get activated by comments in code are highly correlated with the parameters involved in producing good code.
Claude's proclivity for writing detailed comments and inline comments and very near perfect commit messages is one of the best things about it.
I’m not sure I agree that I’ll never look at the code. I think it’s still important to know how the code is working for your own mental model of the app. So in this case I’ll be testing and reviewing everything to see how it’s implemented. With that in mind it’s useful for me as well as serving as context for the AI. That said, you may be right.
frequently your session/context may drop (e.g. claude crashes, or your internet dies, or your computer restarts, etc.). Claude does best when it can recover the context and understand the current situation from clear documentation, rather than trying to reverse engineer intent and structure from an existing code base. Also, the human frequently does read the code docs as there may be places where Claude gets stuck or doesn't do what you want, but a human can reason their way into success and unstick the obstacle.
I promise you that token context rot is worse than the gains from added natural language explanations
This hasn't been my experience.
Keep in mind each Claude subagent gets its own context.
From Claude -r you can resume any conversation at any previous point, so there isn’t a way to lose context that way. As opposed to compact, which I find makes it act brain dead afterwards for a while
Oh God yes, I wish there were better tools to help one curate and condense a context when one finds that sweet spot where it's writing great code.
Often someone will have to maintain the code. Whether the maintainer is a human or an AI, an explanation of the intent of the code could be helpful.
written once, looked at 100 times.
I try to prompt-enforce no line by line documentation, but encourage function/class/module level documentation that will help future developers/AI coding agents. Humans are generally better, but AI sometimes needs a help to stop it not understanding a piece of code's context and just writing it's own new function that does the same thing
But comments like that are most likely to be WHAT the code do. Which is rarely useful (naming your identifiers better can help with that). When I need a comment, I'm actually looking for WHYs, aka design decisions. And in this case your prompts are better than whatever comments the agent may add. (Maybe add a summary of the prompts in the commit message?)
This.
Documentation should answer WHYs not HOWs.*
* = Unless the how is a complex, opaque, or obscure algorithm. Then sometimes a written explanation can help. Sometimes that can be as simple as the name and variant, e.g. Dijkstra with k shortest path routing.
Doc strings within the code could be helpful for both humans and AI. Sometimes spoken word intent is easier to digest then code and help identify side effects for both human and AI.
After someone mentioned that recently I've started to write really detailed specs with the help of ChatGPT Deep Research and editing it myself. Then getting this exported as a Markdown document and passing it to Cursor really worked very well.
It puts you in a different mind space to sit down and think about it instead of iterating too much and in the end feeling productive while actually not achieving much and going mostly in circles.
The test and review cycle is what determines time saved in my view. Since you were satisfied overall I take it that cycle was not too cumbersome?
The parent wrote:
>I imagine this saved me probably 6-10 hours. I’m now reviewing and am going to test etc.
Guessing the time saved prior to reviewing and testing seems premature fron my end.
Frankly, even if you ignore Claude entirely, being able to write a good spec for yourself is a worthwhile endeavour.
after 30+years of engineering writing lots of specs is mostly a waste of time. The problem is more or less you don’t know enough. The trick is to write the smallest simplest version of whatever you are trying to achieve and then iterate on that. Be prepared to throw it out. The nice thing with Claude (or Gemini) is that it lets to do this really really quickly.
Complete agree. It’s a core skill of a good developer. What’s interesting is that in the past I’d have started this process but then jumped into coding prematurely. Now when you know you are using an agent, the more you write, the better the results.
Yes but let's not forget the lessons of waterfall planning. You can't anticipate everything, so the detail level of the implementation plan should be within a goldi locks zone of detailed but not too detailed, and after each implementation and test phase one should feel comfortable adjusting the spec/plan to the current state of things.
Another good point. I noticed this happening while writing my document.
A few times while writing the doc I had to go back and update the previous steps to add missing features.
Also I knew when to stop. It’s not fully finished yet. There are additional stages I need to implement. But as an experienced developer, I knew when I had enough for “core functionalty” that was well defined.
What worries me is how do you become a good developer if AI is writing it all?
One of my strengths as a developer is understanding the problem and breaking it down into steps, creating requirements documents like I’ve discussed.
But that’s a hard-earned skill from years of client work where I wrote the code. I have a huge step up in getting the most from these agents now.
Agents raise the floor for all, but they raise the ceiling for those of us with sufficient priors.
The downside of waterfall was not overly detailed specs. In fact, the best software development is universally waterfall following a good, ideally formal spec.
The downside that Agile sought to remedy was inflexibility, which is an issue greatly ameliorated by coding agents.
Maybe if you know the entire possibility space beforehand, in which case that's a great position to be in. In other cases and if the spec doesn't align with reality after implementation has begun or unforseen issues pop up, the spec needs revision, does it not?
> In other cases and if the spec doesn't align with reality after implementation has begun or unforeseen issues pop up, the spec needs revision, does it not?
Yes and then it gets pumped back to the top of the waterfall and goes through the entire process. Many organizations became so rigid that this was a problem. It is what Tom Smykowski in office space is a parody of. It's why you get much of the early web having things like the "feature creep" and "if web designers were architects".
Waterfall failed because of politics mingled into the process, it was the worst sort of design by committee. If you want a sense of how this plays out you simply have to look at Wayland development. The fact that is has been as successful as it is, is a testament to the will and patience of those involved.
I too just yesterday had my first positive experience with Claude writing code in my project. I used plan mode for the first time and gave it the "think harder" shove. It was a straightforward improvement but not trivial. The spec wasn't even very detailed- I mentioned a couple specific classes and the behaviour to change, and it wrote the code I would have expected to write, with even a bit more safety checking than I would have done.
I write out a document that explains what I want. Then I write stubs for the functions and classes or whatever. Every stub I write a docstring for what it’s supposed to do. Then I have Claude write unit tests for each stub one at a time. Then I have it write the functions one at a time. At some point I should just start writing the codes itself again. Haha.
> It makes it clear to me that mid and senior developers aren’t going anywhere.
I kinda feel like this is a self-placating statement that is not going to stay true for that long. We are so early in the process of developing AI good enough to do any of these things. Yes, right now you need senior level design skills and programming knowledge, but that doesn't mean that will stay true.
>I kinda feel like this is a self-placating statement that is not going to stay true for that long. We are so early in the process of developing AI good enough to do any of these things. Yes, right now you need senior level design skills and programming knowledge, but that doesn't mean that will stay true.
So you really think that in a few years some guy with no coding experience will ask the AI "Make me a GTA 6 clone that happens in Europe" and the AI will make actually make it, the code will just work and the performance will be excellent ?
The LLMs can't do that, they are attracted to solutions they seen in their training, this means sometimes they over complicate things, they do not see clever solutions, or apply theory and sometimes they are just stupid and hallucinate variable names and functions , like say 50% of the time it would use speed and 50% of the time it would use velocity and hte code will fail because undefined stuff.
I am not afraid of LLMs taking my job, I am afraid of bullshit marketing that convinces the CEO/management that if they buy me Claude then I must work 10x faster.
The GTA example is not a good one because the vast majority of the work in making a videogame is not coding...
And, somewhat besides the point, generative AI is getting better at a lot of those things as well. Maybe I want to believe this will happen because it's probably the only way to get a sequel to Sleeping Dogs.
>The GTA example is not a good one because the vast majority of the work in making a videogame is not coding...
What is outside coding?
- writing? an AI that can replace hard core developers that write optimized game engines should also be able to generate quests
- art? same, AI should be able to get already created models and change them here and there to make them not look stolen
- marketing? why can't AI replace those people
so?
art, design, testing, producer stuff, composer, sound designer writer. Then product and creative manager, art and technical director and thet all the rest. And at the end game can be no fun at all. Also gta has thousands bugs still, cant imagine how many AI would make and if you could solve to actually complete massive project like that
> So you really think that in a few years some guy with no coding experience will ask the AI "Make me a GTA 6 clone that happens in Europe" and the AI will make actually make it, the code will just work and the performance will be excellent ?
There is definitely a path from here to the future where the most senior engineer in your org/dept/team decides he can make some big project without some subset of more-junior employees because he has Claude. The managers or PMs won’t be coding without engineers, but it’s definitely possible for engineers to code with less teammates, especially if the very experienced ones are the ones planning and guiding the effort.
> The LLMs can't do that, they are attracted to solutions they seen in their training, this means…
None of the things you’ve said this means match my experience using LLMs to write real, usable, viable code. It might not be the most performant or perfect code, but it’s certainly usable and most software isn’t written at Google or whatever and don’t need to support hundreds of millions of customers at scale. If it took a day instead of a month, then “the business” might decide that’s a worthy tradeoff.
>None of the things you’ve said this means match my experience using LLMs to write real, usable, viable code. It might not be the most performant or perfect code, but it’s certainly usable and most software isn’t written at Google or whatever and don’t need to support hundreds of millions of customers at scale. If it took a day instead of a month, then “the business” might decide that’s a worthy tradeoff.
It depends on your project, I seen a lot of stupidity in the AI, like in a lua project where arrays were 1 indexed it would 0 index them, somehow the c like behaviour was too strong of a force to drag the model in that direction.
For example when i test an image generator I ask it to create a photo of the front of a book store and to include no brands, labels or texts (because they always include english text and most of the time there are spelling errors), but the AIs can't make a shop without the branding/text above teh door, they are just so over trained on this concept that explicti commands can't fix it,
so the same with LLMs, they are attracted to the average most popular shit they seen in the training data, so without instructions by you or maybe by the provider behind the hidden prompts it will output outdated javascript using "var" . it will output unoptimized algorithms, and even if you used a specific variable name it will be strongly be pushed to rename it to whatever is most popular in the training data.
Yes, I can make the LLMs write soem good code but ony if I baby sit it, tell it exactly what files to read as inspiration, what features to use and what to do, for sure I can't just paste the text in a ticket and let if free.
I also use it to review my code for bugs, it can find up to 5-% of the bugs and halucinate others that are not possible (like it would sugerate that if $x is null then something would crash and I should check for that, but the type system would already ensure $x can't be null so it really needs more training to do simple stuff... to be original and not just regurgitate the most popular things it was trained on it would need to be something not based on LLM architecture
What if took 1 day and 1 month later, prod is on fire and no ones know why? Speed does not equate quality. And based on my experience, after 1.0, discussions about the features take more time than coding them. Especially with paying customers.
> So you really think that in a few years some guy with no coding experience will ask the AI "Make me a GTA 6 clone that happens in Europe" and the AI will make actually make it, the code will just work and the performance will be excellent ?
I don't know the answer, as much as anyone else, and obviously I'm skeptical that it'll happen.
But then if I think back to 2018, and imagine what I would think if I saw even GPT-OSS-20b back then, it would have been close to magic and absolutely not something I would have expect. I felt the same about GPT2 when it first launched too, when LLMs started to show small bit of promise. GPT3 was insane even when it launched.
So I guess I wouldn't base "what could happen in the future" based on what I personally believe is possible, because LLMs definitely fell into that camp just a few years ago, so why not with larger coding tasks too, which I see as unlikely today?
I think it can already replace mid-level engineers, based on my experience. Also, you really don't need meticulously crafted specs for this - I've completed multiple projects with Claude with loose specs, iterating in case the direction is not looking good. You can always esc-out in case you see it doing something you didn't wish for.
Was it frontent or/and did you deploy them to prod to be used by 5 million daily users or just had fun at home?
That's the way I'd used it, I've built a document with all the requirements and then gave it to CC. But it was not a final document, I had to go back and make some changes after experimenting with the code CC built.
You could have saved yourself another 2 hours by getting Claude 4 Opus to write out the specs for you first. Why did you need to write them at all?
Can’t you send the same spec through cursor? Am I missing something there?
Yes certainly. I’m sure Cursor would do a good job.
That said, I think that the differing UIs of Cursor (in the IDE) and Claude (in the CLI) fundamentally change how you approach problems with them.
Cursor is “too available”. It’s right there and you can be lazy and just ask it anything.
Claude nudges you to think more deeply and construct longer prompts before engaging with it.
That my experience anyway
Fun fact: there is a Cursor CLI now
You can use Claude to write the spec next time.
This is the way.