I must be the dumbest "prompt engineer" ever, each time I ask an AI to fix or even worse, create something from scratch it rarely returns the right answer and when asked for modification it will struggle even more.
All the incredible performance and success stories always come from these Twitter posts, I do find value in asking simple but tedious task like a small refactor or generate commands, but this "AI takes the wheel level" does not feel real.
I think it's probably the difference between "code" and "programming". An LLM can produce code and if you're willing to surrender to the LLMs version of whatever it is you ask for, then you can have a great and productive time. If you're opinionated about programming, LLMs fall short. Most people (software engineers, developers, whatever) are not "programmers" they're "coders" which is why they have a positive impression of LLMs: they produce code, LLMs produce code... so LLMs can do a lot of their work for them.
Coders used to be more productive by using libraries (e.g: don't write your own function for finding the intersection of arrays, use intersection from Lodash) whereas now libraries have been replaced by LLMs. Programmers laughed at the absurdity of left-pad[1] ("why use a dependency for 16 lines of code?") whereas coders thought left-pad was great ("why write 16 lines of code myself?").
If you think about code as a means to an end, and focus on the end, you'll get much closer to the magical experience you see spoken about on Twitter, because their acceptance criteria is "good enough" not "right". Of course, if you're a programmer who cares about the artistry of programming, that feels like a betrayal.
[1] https://en.wikipedia.org/wiki/Npm_left-pad_incident
Oh, this captures my experience perfectly.
I've been using Claude Code a lot recently, and it's doing amazing work, but it's not exactly what I want it to do.
I had to push it hard to refactor and simplify, as the code it generated was often far more complicated than it needed to be.
To be honest though, most of the code it generated I would accept if I was reviewing another developer's work.
I think that's the way we need to look at it. It's a junior developer that will complete our tasks, not always in our preferred way, but at 10x the speed, and frequently make mistakes that we need to point out in CR. It's not a tool which will do exactly what we would.
My experience so far on Claude 3.7 has been over-engineered solutions that are brittle. Sometimes they work, but usually not precisely the way I prompted it to, and often attempts to modify them require more refactoring due to the unnecessary complexity.
This has been the case so far in both js for web (svelte, react) and python automation.
I feel like 3.5 generally came up "short" more often than 3.7, but in practical usage it meant I could more easily modify and build on top of. 3.7 has led to a lot of deconstructing, reprompting, starting over.
All I really care about is the end result and, so far, LLMs are nice for code completion, but basically useless for anything else.
They write as much code as you want, and it often sorta works, but it’s a bug filled mess. It’s painstaking work to fix everything, on part with writing it yourself. Now, you can just leave it as-is, but what’s the use of releasing software that crappy?
I suppose it’s a revolution for that in-house crapware company IT groups create and foist on everyone who works there. But the software isn’t better, it just takes a day rather than 6 months (or 2 years or 5 years) to create. Come to think of it, it may not be useful for that either… I think the end-purpose is probably some kind of brag for the IT manger/exec, and once people realize how little effort is involved it won’t serve that purpose.
Right on point. The same principle applies when deciding whether to use a framework or not. Coders often marvel at the speed with which they can build something using a framework they don’t fully understand. However, a true programmer seeks to know and comprehend what’s happening under the hood.
This aligns with my experience. I've seen LLMs produce "code" that the person requesting is unable to understand or debug. It usually almost works. It's possible the person writing the prompt didn't actually understand the problem, so they got a half baked solution as a result. Either way, they need to go to a human with more experience to figure it out.
I'm waiting for artisan programming to become a thing.
> If you think about code as a means to an end, and focus on the end
The problem with this is that you will never be able to modify the code in a meaningful way after it crosses a threshold, so either you'll have a prompt only modification ability, or you will just have to rewrite things from scratch.
I wrote my first application ever (equivalent to a education CMS today) in the very early 2000s with barely any notion of programming fundamentals. It was probably a couple hundred thousand lines of code by the time I abandoned it.
I wrote most of it in HTML, JS, ASP and SQL. I was in high school. I didn't know what common data structures were. I once asked a professor when I got into late high school "why arrays are necessary in loops".
We called this cookbook coding back in the day.
I was pretty much laughed at when I finally showed people my code, even though it was a completely functional application. I would say an LLM probably can do better, but it really doesn't seem like something we should be chasing.
I tried LLMs for my postgraduate "programming" tasks to create lower level data structures and algorithms that are possible to write a detailed requirements for - they failed miserably. When I pushed in certain directions, I've got student level replies like "collision probability is so low we can just ignore it", while same LLM accurately estimated that in my dataset there will be collisions.
And I don't believe until I see LLMs can use real debugger to figure out a root cause for a sophisticated, cascading bug.
This surrendering to the LLM has been going around a lot lately. I can only guess it is from people that haven't tried it very much themselves but love to repeat experiences from other people.
I’m a software developer by trade but also program art creation tools as a hobby. Funny thing is, at work, code is definitely a means to an end. But when I’m doing it for an art project, I think of the code as part of the art :) the process of programming and coming up with the code is ultimately a part of the holistic artistic output. The two are not separate, just as the artists paint and brushes are also a part of the final work of a painting.
> LLMs version of whatever it is you ask for, then you can have a great and productive time
Sure, but man are there bugs.
This is untrue.
You can be over specified in your prompts and say exactly what types and algorithms you want if you’re opinionated.
I often write giant page long specs to get exactly the code I want.
It’s only 2x as fast as coding, but thinking in English is way better than coding.
Also, if you cannot tell the difference between code written by an LLM or a human, what is the difference? This whole post is starting to feel like people with very strong (gaterkeeper'ish) views on hi-fi stereo equipment, coffee, wine, ... and programming. Or should I say "code-as-craft" <cringe>?
Thank you for eloquently saying what I've been trying hard to express.
Interesting, but it seems ridiculous to disambiguate “Programmer” vs “Coder”.
They’re synonymous words and mean the same thing right?
Person who writes logic for machines
Some hints for people stuck like this:
Consider using Aider. It's a great tool and cheaper to use than Code.
Look at Aiders LLM leaderboard to figure out which LLMs to use.
Use its architect mode (although you can get quite fast without it - I personally haven't needed it).
Work incrementally.
I use at least 3 branches. My main one, a dev one and a debug one. I develop on dev. When I encounter a bug I switch to debug. The reason is it can produce a lot of code to fix a bug. It will write some code to fix it. That won't work. It will try again and write even more code. Repeat until fixed. But in the end I only needed a small subset of the new code. So you then revert all the changes and have it fix it again telling it the correct fix.
Don't debug on your dev branch.
Aider's auto committing is scary but really handy.
Limit your context to 25k.
Only add files that you think are necessary.
Combining the two: Don't have large files.
Add a Readme.md file. It will then update the file as it makes code changes. This can give you a glimpse of what it's trying to do and if it writes something problematic you know it's not properly understanding your goal.
Accept that it is not you and will write code differently from you. Think of it as a moderately experienced coder who is modifying the codebase. It's not going to follow all your conventions.
https://aider.chat/
https://aider.chat/docs/leaderboards/
> I use at least 3 branches. My main one, a dev one and a debug one. I develop on dev. When I encounter a bug I switch to debug. The reason is it can produce a lot of code to fix a bug. It will write some code to fix it. That won't work. It will try again and write even more code. Repeat until fixed. But in the end I only needed a small subset of the new code. So you then revert all the changes and have it fix it again telling it the correct fix.
how big/complex does the codebase have to be for this to be for you to actually save time compared to just using a debugger and fixing it yourself directly? (I'm assuming here that bugs in smaller codebases are that much easier for a human to identify quickly)
Thanks, that's a helpful set of hints!
Can you provide a ballpark of what kind of $ costs we are talking here for using Aider with, say, Claude? (or any other provider that you think is better at the moment).
Say a run-of-the-mill bug-fixing session from your experience vs the most expensive one off the top of your head?
The three-branch thing is so smart.
do you have a special prompt to instruct aider to log file changes in the repo's README? I've used aider in repos with a README.md but it has not done this update. (granted, i've never /add the readme into aider's context window before either...)
I have the same experience.
Where AI shines for me is as a form of a semantic search engine or even a tutor of sorts. I can ask for the information that I need in a relatively complex way, and more often than not it will give me a decent summary and a list of "directions" to follow-up on. If anything, it'll give me proper technical terms, that I can feed into a traditional search engine for more info. But that's never the end of my investigation and I always try to confirm the information that it gives me by consulting other sources.
Exactly same experience: since the early-access GPT-3 days, I played out various scenarios, and the most useful case has always been to use generativeAI as semantic search. It's generative features are just lacking in quality (for anything other than a toy project), and the main issues since the early GPT days remains, even though it gets better, it's still too unreliable for (mid-complex systems) serious work. Also, if you don't pay attention, it messes up other parts of the code.
Yeah I have had some "magic" moments where I knew "what" I needed, had an idea of "how it would look",but no idea how to do it and ai helped me understand how I should do it instead of the hacky very stupid way I would have done it
Same here. In some cases, brainstorming even kinda works – I mean, it usually gives very bad responses, but it serves as a good duck.
Code? Nope.
I've done code interviews with hundreds of candidates recently. The difference between those who are using LLMs effectively and those who are not is stark. I honestly think engineers who think like OP are going to get left behind. Take a weekend to work on getting your head around this by building a personal project (or learning a new language).
A few things to note:
a) Use the "Projects" feature in Claude web. The context makes a significant amount of difference in the output. Curate what it has in the context; prune out old versions of files and replace them. This is annoying UX, yes, but it'll give you results.
b) Use the project prompt to customize the response. E.g. I usually tell it not to give me redundant code that I already have. (Claude can otherwise be overly helpful and go on long riffs spitting out related code, quickly burning through your usage credits).
c) If the initial result doesn't work, give it feedback and tell it what's broken (build messages, descriptions of behavior, etc).
d) It's not perfect. Don't give up if you don't get perfection.
Hundreds of candidates? That's significant if not an exaggeration. What are the stark differences you have seen? Did you inquire about the candidate's use of language models?
I'd add to that that the best results are with clear spec sheets, which you can create using Claude (web) or another model like ChatGPT or Grok. Telling them what you want and what tech you're using helps them create a technical description with clear segments and objectives, and in my experience works wonders in getting Claude Code on the right track, where it has full access to the entire context of your code base.
> The difference between those who are using LLMs effectively and those who are not is stark.
Same here. Most candidates I interviewed said they did not use AI for development work. And it showed. These guys were not well informed on modern tooling and frameworks. Many of them seemed stuck in/comfortable with their old way of doing things and resistant to learning anything new.
I even hired a couple of them, thinking that they could probably pick up these skills. That did not happen. I learned my lesson.
My workflow for that kind of thing goes something like this (I use Sonnet 3.7 Thinking in Cursor):
1. 1st prompt is me describing what I want to build, what I know I want and any requirements or restrictions I'm aware of. Based on these requirements, ask a series of questions to produce a complete specification document.
2. Workshop the specification back and forward until I feel it's complete enough.
3. Ask the agent to implement the specification we came up with.
4. Tell the agent to implement Cursor Rules based on the specifications to ensure consistent implementation details in future LLM sessions.
I'd say it's pretty good 80% of the time. You definitely still need to understand the problem domain and be able to validate the work that's been produced but assuming you had some architectural guidelines you should be able to follow the code easily.
The Cursor Rules step makes all the difference in my experience. I picked most of this workflow up from here: https://ghuntley.com/stdlib/
Edit: A very helpful rule is to tell Cursor to always checkout a new branch based on the latest HEAD of master/main for all of it's work.
I need to steal the specification idea.
Cursor w/ Claude has a habit of running away on tangents instead of solving just the one problem, then I need to reject its changes and even roll back to a previous version.
With a proper specification as guideline it might stay on track a bit better.
I decided to try seriously the Sonnet 3.7. I started with a simple prompt on claude.ai ("Do you know claude code ? Can you do a simple implementation for me ?"). After minimal tweaking from me, it gave me this : https://gist.github.com/sloonz/3eb7d7582c33e95f2b000a0920016...
After interacting with this tool, I decided it would be nice if the tool could edit itself, so I asked (him ? it ?) to create its next version. It came up with a non-working version of this https://gist.github.com/sloonz/3eb7d7582c33e95f2b000a0920016.... I fixed the bug manually, but it started an interactive loop : I could now describe what I wanted, describe the bugs, and the tool will add the features/fix the bugs itself.
I decided to rewrite it in Typescript (by that I mean: can you rewrite yourself in typescript). And then add other tools (by that: create tools and unit tests for the tools). https://gist.github.com/sloonz/3eb7d7582c33e95f2b000a0920016... and https://gist.github.com/sloonz/3eb7d7582c33e95f2b000a0920016... have been created by the tool itself, without any manual fix from me. Setting up the testing/mock framework ? Done by the tool itself too.
In one day (and $20), I essentially had recreated claude-code. That I could improve just by asking "Please add feature XXX". $2 a feature, with unit tests, on average.
So you’re telling me you spent 20 dollars and an entire day for 200 lines of JavaScript and 75 lines of python and this to you constitutes a working re-creation of Claude Code?
This is why expectations are all out of whack.
Thanks for writing up your experience and sharing the real code. It is fascinating to see how close these tools can now get to producing useful, working software by themselves.
That said - I'm wary of reading too much into results at this scale. There isn't enough code in such a simple application to need anything more sophisticated than churning out a few lines of boilerplate that produce the correct result.
It probably won't be practical for the current state of the art in code generators to write large-scale production applications for a while anyway just because of the amount of CPU time and RAM they'd need. But assuming we solve the performance issues one way or another eventually it will be interesting to see whether the same kind of code generators can cope with managing projects at larger scales where usually the hard problems have little to do with efficiently churning out boilerplate code.
LLM are replacing Google for me when coding. When I want to get something implemented, let's say make a REST request in Java using a specific client library, I previously used Google to find example of using that library.
Google has gotten worse (or the internet has more garbage) so finding code an example is more difficult than it used to be. Now I ask an LLM for an example. Sometimes I have to ask for a refinement and and usually something is broken in the example but it takes less time to get the LLM produced example to work than it does to find a functional example using Google.
But the LLM has only replaced my previous Google usage, I didn't expect Google to develop my applications and I don't with LLMs.
This has been my experience of successful usage as well. It's not writing code for me, but pulling together the equivalent of a Stack Overflow example and some explaining sentences that I can follow up on. Not perfect and I don't blindly copy paste it, same as Stack Overflow ever was, but faster and more interactive. It's helpful for wayfinding, but not producing the end result.
I used the Kagi free trial when I was doing Advent of Code in a somewhat unfamiliar language (Swift) last year, as well as ChatGPT occasionally.
The LLM was obviously much faster and the information was much higher density, but it had quite literally about a 20% rate of just making up APIs from my limited experiment. But I was very impressed with Kagi’s results and ended up signing up, now using it as my primary search engine.
In order to use a library, I need to (this is my opinion) be able to reason about the library’s behavior, based on a specification of its interface contract. The LLM may help with coming up with suitable code, but verifying that the application logic is correct with respect to the library’s documented interface contract is still necessary. It’s therefore still a requirement to read and understand the library’s documentation. For example, for the case of a REST client, you need to understand how the possible failure modes of the HTTP protocol and REST API are translated by the library.
I wonder how good Google could be if they had a charge per query model that these LLMs do. AI or not, dropping the ad incentive would be nice.
I've found AI to be useful on precisely-scoped tasks I might assign to a junior programmer to take a day to do, like "convert this exact bash script to a Powershell script".
But in my own work, those tasks are pretty rare, like 3 times a month? Often I start working on something, and the scope and definition of success changes while I'm in the midst of it. Or it turns out to be harder than expected and it makes sense to timebox it and do a quick search for workarounds.
As much as we joke about StackOverflow commenters sometimes telling a question-asker they shouldn't be doing what they're trying to do, you do actually want that (soft) pushback some of the time. Most modern LLMs will gleefully come up with a complete plan for how we're gonna get the square peg in the round hole.
> you do actually want that (soft) pushback some of the time. Most modern LLMs will gleefully come up with a complete plan for how we're gonna get the square peg in the round hole.
I once accidentally asked a local DeepSeek distilled model to do the wrong thing by accidentally copy pasting the wrong variable name.
It told me how to do it, and then asked me if I was sure.
My local DeepSeek R1 model (deepseek-r1:7b) saw me trying to do something stuupid (I was working with the wrong variable). It told me how to do what I asked and then asked:
> _Is this modification part of a larger change you're making to the code? I'd like to make sure we're not modifying something important that might have side effects._
Looking at its though process:
> _The user wants to modify this string by replacing "-input" with "-kb". But looking at the ARN structure, I recall that AWS S3 ARNs are usually in the form arn<:aws:1151472526310103070>s3:::bucket_name RegionalPart path. The part after the bucket name is typically s3:// followed by the object key._ > _Wait, maybe the user has a specific reason to make this change. Perhaps they're formatting or structuring something for a different purpose. They might not realize that directly modifying ARNs can cause issues elsewhere if not done correctly._
I am a slow adopter of new tech but plan to spend a significant amount of time in 2025 using AI tools when coding. I am net negative on AI simply replacing programmers, but I think the practice of development is undergoing a seismic shift at this point.
My recent usage is oriented towards using pseudocode descriptions that closely map to Python to produce Python functions. I am very impressed with Claude 3.7's syntactic correctness when given a chunk of pseudocode that looks "python-y" to begin with.
My one concern is that much of my recent code requirements lack novelty. So there is a somewhat reasonable chance that the tool is just spitting out code it slurped somewhere in github or elsewhere in the larger Internet. Just this week, I gave Claude a relatively "anonymous" function in pseudocode, meaning variable names were not particularly descriptive with one tiny exception. However, Claude generated a situationally appropriate comment as part of the function definition. This was . . . surprising to me if somehow the model had NOT in its training set had some very close match to my pseudocode description that included enough context to add the comment.
At this point very little code is "novel". Everyone is simply rewriting code that has already been written in a similar form. The LLM isn't slurping up and restating code verbatim. It is taking code that it has seen thousands of times and generating a customized version for your needs. It's hubris to think that anyone here is generating "novel" code.
Yeah, it's so bad now I only trust my eyes. Everyone is faking posts, tweets and benchmarks that the truth no longer exists.
I'm using Claude 3.7 now and while it improved on certain areas, it degraded on others (ie: it randomly removes/changes things more now).
It's clear to anyone paying attention that LLMs hit a wall a while back. RAG is just expert systems with extra steps. 'Reasoning' is just burning more tokens in hopes it somehow makes the results better. And lately we've seen that a small blanket is being pulled that way or another.
LLMs are cool, machine learning is cooler. Still no 'AI' in sight.
I initially had the same experience. My codebase is super opinionated with a specific way to handle things. Initially it kept on wanting to do things it's way. I then changed my approach and documented the way the codebase is structured, how things should be done, all the conventions used and on every prompt I make sure to tell him to use these documents as reference. I also have a central document that keeps track of dependencies of modules and the global data model. Since I made these documents as reference developing new features has been a breathe. I created the architecture, documented it, and now it uses it.
The way I prompt it is first I write the documentation of the module I want, following the format I detailed inbthe master documents, and ask him to follow the documentation and specs.
I use cursor as well, but more as an assistant when I work on the architecture pieces.
But I would never let an AI the driver seat for building the architecture and making tech decisions.
What I've noticed from my extensive use over the past couple weeks has been Claude Code really sucks at thinking things through enough to understand the second and third order consequences of the code that it's writing. That said, it's easy enough to work around its deficiencies by using a model with extended thinking (Grok, GPT4.5, Sonnet 3.7 in thinking mode) to write prompts for it and use Claude Code as basically a dumb code-spewing minion. My workflow has been: give Grok enough context on the problem with specific code examples, ask it to develop an implementation plan that a junior developer can follow, and paste the result into Claude Code, asking it to diligently follow the implementation plan and nothing else.
"Claude Code really sucks at thinking things through enough to understand the second and third order consequences of the code that it's writing"
Yup, that's our job as software engineers.
In all of these posts I fail to see how this is engineering anymore. It seems like we are one step away from taking ourselves out of the picture completely.
This has been my experience as well. Breaking problems into smaller problems where you can easily verify correctness works much better than having it solve the whole problem on its own.
Hey, I've been hearing about this issue that programmers have on HN a lot.
But I'm in the more 'bad programmer/hacker' camp and think that LLMs are amazing and really helpful.
I know that one can post a link to the chat history. Can you do that for an example that you are comfortable sharing? I know that it may not be possible though or very time consuming.
What I'm trying to get at is: I suck at programming, I know that. And you probably suck a lot less. And if you say that LLMs are garbage, and I say they are great, I want to know where I'm getting the disconnect.
I'm sincerely, not trying to be a troll here, and I really do want to learn more.
Others are welcome to post examples and walk through them too.
Thanks for any help here.
>and think that LLMs are amazing and really helpful
Respectively, are you understanding what it produces or do you think that's its amazing because it produces something, that 'maybe' works.
Here's an e.g. I was just futzing with. I did a refactor of my code (typescript) and my test code broke (vitest) and for some reason it said 'mockResolvedValue()' is not a function. I've done this a gazillion times.
I allowed it via 3-4 iterations to try and fix it (I was being lazy and wanted my error to go away) and the amount of crap (rewriting tests, referenced code) it was producing was beyond ridiculous. (I was using github co-pilot).
Eventually I said "f.that for a game of soldiers" and used by brain. I forgot to uncomment a vi.mock() during the refactor.
I DO use it to fix stupid typescript errors (the error blob it dumps on you can be a real pita to process) and appreciate it when gives me a simple solution.
So I agree with quite a few comments here. I'm not ready to bend the knee to our AI Overloads.
I can give you an example here. We had to do some basic local VAT validation for EU countries and as the API that you can use for that has some issues for some countries (as it checks this in the national databases) we wanted to also have a basic local one. So using Claude 3.7 I wanted to get some basic VAT validation, in general the answer and solution was good, you would be impressed, but here comes the fun part. The basic solution was just some regular expressions and then it went further on its own and created specific validations for certain countries. These validations were something like credit card number validations, sums, check digits, quite nice you would say. But the thing is in a lot of these countries these numbers are basically assigned randomly and have no algorithm, so it went on to hallucinate some validations that don't exist providing a solution that looks nice, but basically it doesn't work in most cases.
Then I went on github and found that it used some code written by someone in JS 7 years ago and just converted and extended it for my language, but that code was wrong and simply useless. We'll end up with people publishing exploits and various other security flaws in Github, these LLMs will get trained on that and people that have no clue what they are doing will push out code based on that. We're in for fun times ahead.
here is one solution i am helping out with which was very very easy to create using claude https://www.youtube.com/watch?v=R72TvoXCimg&t=2s
Maybe it’s the shot up plane effect; we only see the winners but rarely see the failures. Leads us to wrong or incorrect conclusions.
Finding the right prompt to have current generation AI create the magic depicted in twitter posts may be a harder problem than most anticipate.
"wild", "insane" keywords usually are a good filter for marketing spam.
Influencer would be another term...
I don't believe in those either, and I never see compelling YouTube videos showing that in action.
For small stuff LLMs are actually great and often a lifesaver on legacy codebases, but that's more or less where it stops.
I'm in the same boat. I've found it useful in micro contexts but in larger programs, it's like a "yes man" that just agrees with what I suggest and creates an implementation without considering the larger ramifications. I don't know if it's just me.
I have a challenging, repetitive developer task that I need to do ~200 times. It’s for scraping a site and getting similar pieces of data.
I wrote a worksheet for Cursor and give it specific notes for how to accomplish the task in a particular case. Then let it run and it’s fairly successful.
Keep in mind…it’s never truly “hands off” for me. I still need to clean things up after it’s done. But it’s very good at figuring out how to filter the HTML down and parse out the data I need. Plus it writes good tests.
So my success story is that it takes 75% of the energy out of a task I find particularly tedious.
I haven’t found llm code gen to be very good except in cases like you mention here. When you need to do large boilerplatey code with a lot of hardcoded values or parameters. The kind of thing you could probably write a code generator yourself for if you cared enough to do it. Thankfully Llms can save us from some of that.
it rarely returns the right answer
One of the biggest difficulties AI will face is getting developers to unlearn the idea that there's a right answer, and that of the many thousands of possible right answers, 'the code I would have written myself' is just one (or a few if you're one of the few great devs who don't stop thinking about approaches after your first attempt.)
I spent a few hours trying cursor. I was impressed at first, I liked the feel of it and I tried to vibe code, whatever that means.
I tried to get it to build a very simple version of an app I’ve been working on. But the basics didn’t work, and as I got it to fix some functionality other stuff broke. It repeatedly nuked the entire web app, then rolled back again and again. It tried quick and dirty solutions that would lead to dead ends in just a few more features. No sense of elegance or foundational abstractions.
The code it produced was actually OK, and I could have fixed the bugs given enough time, but overall the results were far inferior to every programmer I’ve ever worked with.
On the design side, the app was ugly as hell and I couldn’t get it to fix that at all.
Autocomplete on a local level seems far more useful.
Gene and I would like to invite you to review our book, if you're up for it. It should be ready for early review in about 7-10 days.
It seems like you would be the perfect audience for it. We're hoping the book can teach you what you need in order to have all those success stories yourself.
How can I follow this book? I’m interested too.
I can definitely save time, but I find I need to be very precise about the exact behaviour, a skill I learned as… a regular programmer. Soper up is higher in languages I’m not familiar with, where I know what needs doing but not necessarily the standard way to do it.
I had Claude prototype a few things and for that it's really enjoyable.
Like a single page HTML J's page which does a few things and saves it state in local storage with a json backup feature (download the json).
I also enjoy it for doing things I don't care much but makes it more polished. Like I hate my basically empty readme with two commands. It looks ugly and when I come back to stuff like this a few days/weeks later I always hate it.
Claude just generates really good readmes.
I'm trying out Claude code right now and like it so far.
Funny, because I have the same feeling toward the "I never get it to work" comments. You don't need any special prompt engineering so that's definitely not it.
Yeah I gave Claude Code a try at about 5 different things, with miserable results on all of them (insult to injury -- each time it charged me about a buck!). I wonder if because it was C# with Unity code, maybe not so heavily represented in the training set?
I still find lots of use for LLMs authoring stuff at more like the function level. "I know I need exactly this."
Edit: I did however find it amazing for asking questions about sections of the code I did not write.
I’ve dug into this a few times.
Every single time they were doing something simple.
Just because someone has decades of experience or is a SME in some niche doesn’t mean they’re actually good… engineers.
> I do find value in asking simple but tedious task like a small refactor or generate commands,
This is already a productivity boost. I'm more and more impressed about what I can get out of these tools (as you said, simple but tedious things). ChatGPT4o (provided by company) does pretty complex things for me, and I use it more and more.
Actually, I noticed that when I can't use it (e.g. internal tools/languages), I'm pretty frustrated.
Are you concerned that these tools will soon replace the need for engineers?
I am willing to say I am a good prompt engineer, and "AI takes the wheel" is only ever my experience when my task is a very easy one. AI is fantastic for a few elements of the coding process--building unit tests, error checking, deciphering compile errors, and autocompleting trivially repetitive sections. But I have not been able to get it "take the wheel"
this space is moving really fast, I suggest before forming an definitive opinion try the best tool, such as the latest Claude model and use "agentic" mode or the equivalence on your client. For example, on Copilot this mode is brand new and only available in vscode insider. Cursor and other tools have had it for a little longer.
People have been saying it writes amazing code that works for far longer than that setup has been available though. Your comment makes me think the product is still trying to catch up to these expectations people are setting.
That being said I appreciate your suggestion and will consider giving that a shot.
You have to learn and figure out how to prompt it. My experience with Claude Code is this: one time it produces an incredible result; another time it's an utter failure. There are prompt tips and tricks which have enormous influence on the end result.
Can you give us some of these tips?
I agree it feels very different from my experience.
I'm curious when we'll start seeing verifiable results like live code streams with impressive results or companies dominating the competition with AI built products.
Are you actually using claude? There's an enormous difference between claude code and copilot, with the latter being a bigger burden these days than a help.
Can you clarify what tools and programming language you use? If find that the issue often is wrong tooling, exotic programming languages or frameworks.
I would consider frontend tasks using Typescript and React quite standard.
From the creators of static open-source marketing benchmarks: twitter PR posts.
Is this true even for Claude 3.7 Sonnet/3.7 Sonnet Thinking ?
Oh. That's because he's clearly lying.
It's not.
A lot -- and I mean a lot -- of people who hype it up are hobby or aspirational coders.
If you drill down on what exactly they use it for, they invariably don't write code in professional settings that will be maintained and which other humans have to read.
Everyone who does goes "eh, it's good for throwaway code or one offs and it's decent at code completion".
Then there's the "AGI will doom us all" cult weirdos, but we don't talk about them.
I have the same experience
What model are you using
You've got to do piecemeal validation steps yourself, especially for models like Sonnet 3.7 that tend to over-generate code and bury themselves in complexity. Windsurf seems to be onto something. Running Sonnet 3.7 in thinking mode will sometimes reveal bits and pieces about the prompts they're injecting when it mentions "ephemeral messages" reminding it about what files it recently visited. That's all external scaffolding and context built around the model to keep it on track.