293 comments
  • develoopest2m

    I must be the dumbest "prompt engineer" ever, each time I ask an AI to fix or even worse, create something from scratch it rarely returns the right answer and when asked for modification it will struggle even more.

    All the incredible performance and success stories always come from these Twitter posts, I do find value in asking simple but tedious task like a small refactor or generate commands, but this "AI takes the wheel level" does not feel real.

    • abxyz2m

      I think it's probably the difference between "code" and "programming". An LLM can produce code and if you're willing to surrender to the LLMs version of whatever it is you ask for, then you can have a great and productive time. If you're opinionated about programming, LLMs fall short. Most people (software engineers, developers, whatever) are not "programmers" they're "coders" which is why they have a positive impression of LLMs: they produce code, LLMs produce code... so LLMs can do a lot of their work for them.

      Coders used to be more productive by using libraries (e.g: don't write your own function for finding the intersection of arrays, use intersection from Lodash) whereas now libraries have been replaced by LLMs. Programmers laughed at the absurdity of left-pad[1] ("why use a dependency for 16 lines of code?") whereas coders thought left-pad was great ("why write 16 lines of code myself?").

      If you think about code as a means to an end, and focus on the end, you'll get much closer to the magical experience you see spoken about on Twitter, because their acceptance criteria is "good enough" not "right". Of course, if you're a programmer who cares about the artistry of programming, that feels like a betrayal.

      [1] https://en.wikipedia.org/wiki/Npm_left-pad_incident

      • miki1232112m

        Oh, this captures my experience perfectly.

        I've been using Claude Code a lot recently, and it's doing amazing work, but it's not exactly what I want it to do.

        I had to push it hard to refactor and simplify, as the code it generated was often far more complicated than it needed to be.

        To be honest though, most of the code it generated I would accept if I was reviewing another developer's work.

        I think that's the way we need to look at it. It's a junior developer that will complete our tasks, not always in our preferred way, but at 10x the speed, and frequently make mistakes that we need to point out in CR. It's not a tool which will do exactly what we would.

        • biker1425412m

          My experience so far on Claude 3.7 has been over-engineered solutions that are brittle. Sometimes they work, but usually not precisely the way I prompted it to, and often attempts to modify them require more refactoring due to the unnecessary complexity.

          This has been the case so far in both js for web (svelte, react) and python automation.

          I feel like 3.5 generally came up "short" more often than 3.7, but in practical usage it meant I could more easily modify and build on top of. 3.7 has led to a lot of deconstructing, reprompting, starting over.

      • jmull2m

        All I really care about is the end result and, so far, LLMs are nice for code completion, but basically useless for anything else.

        They write as much code as you want, and it often sorta works, but it’s a bug filled mess. It’s painstaking work to fix everything, on part with writing it yourself. Now, you can just leave it as-is, but what’s the use of releasing software that crappy?

        I suppose it’s a revolution for that in-house crapware company IT groups create and foist on everyone who works there. But the software isn’t better, it just takes a day rather than 6 months (or 2 years or 5 years) to create. Come to think of it, it may not be useful for that either… I think the end-purpose is probably some kind of brag for the IT manger/exec, and once people realize how little effort is involved it won’t serve that purpose.

      • bikamonki2m

        Right on point. The same principle applies when deciding whether to use a framework or not. Coders often marvel at the speed with which they can build something using a framework they don’t fully understand. However, a true programmer seeks to know and comprehend what’s happening under the hood.

      • icedchai2m

        This aligns with my experience. I've seen LLMs produce "code" that the person requesting is unable to understand or debug. It usually almost works. It's possible the person writing the prompt didn't actually understand the problem, so they got a half baked solution as a result. Either way, they need to go to a human with more experience to figure it out.

      • beezlewax2m

        I'm waiting for artisan programming to become a thing.

      • someothherguyy2m

        > If you think about code as a means to an end, and focus on the end

        The problem with this is that you will never be able to modify the code in a meaningful way after it crosses a threshold, so either you'll have a prompt only modification ability, or you will just have to rewrite things from scratch.

        I wrote my first application ever (equivalent to a education CMS today) in the very early 2000s with barely any notion of programming fundamentals. It was probably a couple hundred thousand lines of code by the time I abandoned it.

        I wrote most of it in HTML, JS, ASP and SQL. I was in high school. I didn't know what common data structures were. I once asked a professor when I got into late high school "why arrays are necessary in loops".

        We called this cookbook coding back in the day.

        I was pretty much laughed at when I finally showed people my code, even though it was a completely functional application. I would say an LLM probably can do better, but it really doesn't seem like something we should be chasing.

      • oxag3n2m

        I tried LLMs for my postgraduate "programming" tasks to create lower level data structures and algorithms that are possible to write a detailed requirements for - they failed miserably. When I pushed in certain directions, I've got student level replies like "collision probability is so low we can just ignore it", while same LLM accurately estimated that in my dataset there will be collisions.

        And I don't believe until I see LLMs can use real debugger to figure out a root cause for a sophisticated, cascading bug.

      • mrits2m

        This surrendering to the LLM has been going around a lot lately. I can only guess it is from people that haven't tried it very much themselves but love to repeat experiences from other people.

      • bodhi_mind2m

        I’m a software developer by trade but also program art creation tools as a hobby. Funny thing is, at work, code is definitely a means to an end. But when I’m doing it for an art project, I think of the code as part of the art :) the process of programming and coming up with the code is ultimately a part of the holistic artistic output. The two are not separate, just as the artists paint and brushes are also a part of the final work of a painting.

      • roflyear2m

        > LLMs version of whatever it is you ask for, then you can have a great and productive time

        Sure, but man are there bugs.

      • nbardy2m

        This is untrue.

        You can be over specified in your prompts and say exactly what types and algorithms you want if you’re opinionated.

        I often write giant page long specs to get exactly the code I want.

        It’s only 2x as fast as coding, but thinking in English is way better than coding.

      • throwaway20372m

            > Programmers laughed at the absurdity of left-pad[1] ("why use a dependency for 16 lines of code?")
        
        I'm confused. Did they laugh, then still use it as a dependency? If not, did they reinvent the wheel or copy the 16 lines into their project? Right up until the day of the "NPM left-pad incident" tiny dependencies were working just fine.

        Also, if you cannot tell the difference between code written by an LLM or a human, what is the difference? This whole post is starting to feel like people with very strong (gaterkeeper'ish) views on hi-fi stereo equipment, coffee, wine, ... and programming. Or should I say "code-as-craft" <cringe>?

      • jv222222m

        Thank you for eloquently saying what I've been trying hard to express.

      • gitgud2m

        Interesting, but it seems ridiculous to disambiguate “Programmer” vs “Coder”.

        They’re synonymous words and mean the same thing right?

        Person who writes logic for machines

    • BeetleB2m

      Some hints for people stuck like this:

      Consider using Aider. It's a great tool and cheaper to use than Code.

      Look at Aiders LLM leaderboard to figure out which LLMs to use.

      Use its architect mode (although you can get quite fast without it - I personally haven't needed it).

      Work incrementally.

      I use at least 3 branches. My main one, a dev one and a debug one. I develop on dev. When I encounter a bug I switch to debug. The reason is it can produce a lot of code to fix a bug. It will write some code to fix it. That won't work. It will try again and write even more code. Repeat until fixed. But in the end I only needed a small subset of the new code. So you then revert all the changes and have it fix it again telling it the correct fix.

      Don't debug on your dev branch.

      Aider's auto committing is scary but really handy.

      Limit your context to 25k.

      Only add files that you think are necessary.

      Combining the two: Don't have large files.

      Add a Readme.md file. It will then update the file as it makes code changes. This can give you a glimpse of what it's trying to do and if it writes something problematic you know it's not properly understanding your goal.

      Accept that it is not you and will write code differently from you. Think of it as a moderately experienced coder who is modifying the codebase. It's not going to follow all your conventions.

      https://aider.chat/

      https://aider.chat/docs/leaderboards/

      • majormajor2m

        > I use at least 3 branches. My main one, a dev one and a debug one. I develop on dev. When I encounter a bug I switch to debug. The reason is it can produce a lot of code to fix a bug. It will write some code to fix it. That won't work. It will try again and write even more code. Repeat until fixed. But in the end I only needed a small subset of the new code. So you then revert all the changes and have it fix it again telling it the correct fix.

        how big/complex does the codebase have to be for this to be for you to actually save time compared to just using a debugger and fixing it yourself directly? (I'm assuming here that bugs in smaller codebases are that much easier for a human to identify quickly)

      • geoka92m

        Thanks, that's a helpful set of hints!

        Can you provide a ballpark of what kind of $ costs we are talking here for using Aider with, say, Claude? (or any other provider that you think is better at the moment).

        Say a run-of-the-mill bug-fixing session from your experience vs the most expensive one off the top of your head?

      • tptacek2m

        The three-branch thing is so smart.

      • ddanieltan2m

        do you have a special prompt to instruct aider to log file changes in the repo's README? I've used aider in repos with a README.md but it has not done this update. (granted, i've never /add the readme into aider's context window before either...)

    • branko_d2m

      I have the same experience.

      Where AI shines for me is as a form of a semantic search engine or even a tutor of sorts. I can ask for the information that I need in a relatively complex way, and more often than not it will give me a decent summary and a list of "directions" to follow-up on. If anything, it'll give me proper technical terms, that I can feed into a traditional search engine for more info. But that's never the end of my investigation and I always try to confirm the information that it gives me by consulting other sources.

      • mentalgear2m

        Exactly same experience: since the early-access GPT-3 days, I played out various scenarios, and the most useful case has always been to use generativeAI as semantic search. It's generative features are just lacking in quality (for anything other than a toy project), and the main issues since the early GPT days remains, even though it gets better, it's still too unreliable for (mid-complex systems) serious work. Also, if you don't pay attention, it messes up other parts of the code.

      • jofzar2m

        Yeah I have had some "magic" moments where I knew "what" I needed, had an idea of "how it would look",but no idea how to do it and ai helped me understand how I should do it instead of the hacky very stupid way I would have done it

      • Yoric2m

        Same here. In some cases, brainstorming even kinda works – I mean, it usually gives very bad responses, but it serves as a good duck.

        Code? Nope.

    • smallerfish2m

      I've done code interviews with hundreds of candidates recently. The difference between those who are using LLMs effectively and those who are not is stark. I honestly think engineers who think like OP are going to get left behind. Take a weekend to work on getting your head around this by building a personal project (or learning a new language).

      A few things to note:

      a) Use the "Projects" feature in Claude web. The context makes a significant amount of difference in the output. Curate what it has in the context; prune out old versions of files and replace them. This is annoying UX, yes, but it'll give you results.

      b) Use the project prompt to customize the response. E.g. I usually tell it not to give me redundant code that I already have. (Claude can otherwise be overly helpful and go on long riffs spitting out related code, quickly burning through your usage credits).

      c) If the initial result doesn't work, give it feedback and tell it what's broken (build messages, descriptions of behavior, etc).

      d) It's not perfect. Don't give up if you don't get perfection.

      • triyambakam2m

        Hundreds of candidates? That's significant if not an exaggeration. What are the stark differences you have seen? Did you inquire about the candidate's use of language models?

      • jacobedawson2m

        I'd add to that that the best results are with clear spec sheets, which you can create using Claude (web) or another model like ChatGPT or Grok. Telling them what you want and what tech you're using helps them create a technical description with clear segments and objectives, and in my experience works wonders in getting Claude Code on the right track, where it has full access to the entire context of your code base.

      • cheema332m

        > The difference between those who are using LLMs effectively and those who are not is stark.

        Same here. Most candidates I interviewed said they did not use AI for development work. And it showed. These guys were not well informed on modern tooling and frameworks. Many of them seemed stuck in/comfortable with their old way of doing things and resistant to learning anything new.

        I even hired a couple of them, thinking that they could probably pick up these skills. That did not happen. I learned my lesson.

    • InvertedRhodium2m

      My workflow for that kind of thing goes something like this (I use Sonnet 3.7 Thinking in Cursor):

      1. 1st prompt is me describing what I want to build, what I know I want and any requirements or restrictions I'm aware of. Based on these requirements, ask a series of questions to produce a complete specification document.

      2. Workshop the specification back and forward until I feel it's complete enough.

      3. Ask the agent to implement the specification we came up with.

      4. Tell the agent to implement Cursor Rules based on the specifications to ensure consistent implementation details in future LLM sessions.

      I'd say it's pretty good 80% of the time. You definitely still need to understand the problem domain and be able to validate the work that's been produced but assuming you had some architectural guidelines you should be able to follow the code easily.

      The Cursor Rules step makes all the difference in my experience. I picked most of this workflow up from here: https://ghuntley.com/stdlib/

      Edit: A very helpful rule is to tell Cursor to always checkout a new branch based on the latest HEAD of master/main for all of it's work.

      • theshrike792m

        I need to steal the specification idea.

        Cursor w/ Claude has a habit of running away on tangents instead of solving just the one problem, then I need to reject its changes and even roll back to a previous version.

        With a proper specification as guideline it might stay on track a bit better.

    • slooonz2m

      I decided to try seriously the Sonnet 3.7. I started with a simple prompt on claude.ai ("Do you know claude code ? Can you do a simple implementation for me ?"). After minimal tweaking from me, it gave me this : https://gist.github.com/sloonz/3eb7d7582c33e95f2b000a0920016...

      After interacting with this tool, I decided it would be nice if the tool could edit itself, so I asked (him ? it ?) to create its next version. It came up with a non-working version of this https://gist.github.com/sloonz/3eb7d7582c33e95f2b000a0920016.... I fixed the bug manually, but it started an interactive loop : I could now describe what I wanted, describe the bugs, and the tool will add the features/fix the bugs itself.

      I decided to rewrite it in Typescript (by that I mean: can you rewrite yourself in typescript). And then add other tools (by that: create tools and unit tests for the tools). https://gist.github.com/sloonz/3eb7d7582c33e95f2b000a0920016... and https://gist.github.com/sloonz/3eb7d7582c33e95f2b000a0920016... have been created by the tool itself, without any manual fix from me. Setting up the testing/mock framework ? Done by the tool itself too.

      In one day (and $20), I essentially had recreated claude-code. That I could improve just by asking "Please add feature XXX". $2 a feature, with unit tests, on average.

      • WD-422m

        So you’re telling me you spent 20 dollars and an entire day for 200 lines of JavaScript and 75 lines of python and this to you constitutes a working re-creation of Claude Code?

        This is why expectations are all out of whack.

      • Silhouette2m

        Thanks for writing up your experience and sharing the real code. It is fascinating to see how close these tools can now get to producing useful, working software by themselves.

        That said - I'm wary of reading too much into results at this scale. There isn't enough code in such a simple application to need anything more sophisticated than churning out a few lines of boilerplate that produce the correct result.

        It probably won't be practical for the current state of the art in code generators to write large-scale production applications for a while anyway just because of the amount of CPU time and RAM they'd need. But assuming we solve the performance issues one way or another eventually it will be interesting to see whether the same kind of code generators can cope with managing projects at larger scales where usually the hard problems have little to do with efficiently churning out boilerplate code.

    • matt_heimer2m

      LLM are replacing Google for me when coding. When I want to get something implemented, let's say make a REST request in Java using a specific client library, I previously used Google to find example of using that library.

      Google has gotten worse (or the internet has more garbage) so finding code an example is more difficult than it used to be. Now I ask an LLM for an example. Sometimes I have to ask for a refinement and and usually something is broken in the example but it takes less time to get the LLM produced example to work than it does to find a functional example using Google.

      But the LLM has only replaced my previous Google usage, I didn't expect Google to develop my applications and I don't with LLMs.

      • ptmcc2m

        This has been my experience of successful usage as well. It's not writing code for me, but pulling together the equivalent of a Stack Overflow example and some explaining sentences that I can follow up on. Not perfect and I don't blindly copy paste it, same as Stack Overflow ever was, but faster and more interactive. It's helpful for wayfinding, but not producing the end result.

      • deergomoo2m

        I used the Kagi free trial when I was doing Advent of Code in a somewhat unfamiliar language (Swift) last year, as well as ChatGPT occasionally.

        The LLM was obviously much faster and the information was much higher density, but it had quite literally about a 20% rate of just making up APIs from my limited experiment. But I was very impressed with Kagi’s results and ended up signing up, now using it as my primary search engine.

      • layer82m

        In order to use a library, I need to (this is my opinion) be able to reason about the library’s behavior, based on a specification of its interface contract. The LLM may help with coming up with suitable code, but verifying that the application logic is correct with respect to the library’s documented interface contract is still necessary. It’s therefore still a requirement to read and understand the library’s documentation. For example, for the case of a REST client, you need to understand how the possible failure modes of the HTTP protocol and REST API are translated by the library.

      • jayd162m

        I wonder how good Google could be if they had a charge per query model that these LLMs do. AI or not, dropping the ad incentive would be nice.

    • escapecharacter2m

      I've found AI to be useful on precisely-scoped tasks I might assign to a junior programmer to take a day to do, like "convert this exact bash script to a Powershell script".

      But in my own work, those tasks are pretty rare, like 3 times a month? Often I start working on something, and the scope and definition of success changes while I'm in the midst of it. Or it turns out to be harder than expected and it makes sense to timebox it and do a quick search for workarounds.

      As much as we joke about StackOverflow commenters sometimes telling a question-asker they shouldn't be doing what they're trying to do, you do actually want that (soft) pushback some of the time. Most modern LLMs will gleefully come up with a complete plan for how we're gonna get the square peg in the round hole.

      • philipswood2m

        > you do actually want that (soft) pushback some of the time. Most modern LLMs will gleefully come up with a complete plan for how we're gonna get the square peg in the round hole.

        I once accidentally asked a local DeepSeek distilled model to do the wrong thing by accidentally copy pasting the wrong variable name.

        It told me how to do it, and then asked me if I was sure.

        My local DeepSeek R1 model (deepseek-r1:7b) saw me trying to do something stuupid (I was working with the wrong variable). It told me how to do what I asked and then asked:

        > _Is this modification part of a larger change you're making to the code? I'd like to make sure we're not modifying something important that might have side effects._

        Looking at its though process:

        > _The user wants to modify this string by replacing "-input" with "-kb". But looking at the ARN structure, I recall that AWS S3 ARNs are usually in the form arn<:aws:1151472526310103070>s3:::bucket_name RegionalPart path. The part after the bucket name is typically s3:// followed by the object key._ > _Wait, maybe the user has a specific reason to make this change. Perhaps they're formatting or structuring something for a different purpose. They might not realize that directly modifying ARNs can cause issues elsewhere if not done correctly._

      • 2m
        [deleted]
    • clusterhacks2m

      I am a slow adopter of new tech but plan to spend a significant amount of time in 2025 using AI tools when coding. I am net negative on AI simply replacing programmers, but I think the practice of development is undergoing a seismic shift at this point.

      My recent usage is oriented towards using pseudocode descriptions that closely map to Python to produce Python functions. I am very impressed with Claude 3.7's syntactic correctness when given a chunk of pseudocode that looks "python-y" to begin with.

      My one concern is that much of my recent code requirements lack novelty. So there is a somewhat reasonable chance that the tool is just spitting out code it slurped somewhere in github or elsewhere in the larger Internet. Just this week, I gave Claude a relatively "anonymous" function in pseudocode, meaning variable names were not particularly descriptive with one tiny exception. However, Claude generated a situationally appropriate comment as part of the function definition. This was . . . surprising to me if somehow the model had NOT in its training set had some very close match to my pseudocode description that included enough context to add the comment.

      • doug_durham2m

        At this point very little code is "novel". Everyone is simply rewriting code that has already been written in a similar form. The LLM isn't slurping up and restating code verbatim. It is taking code that it has seen thousands of times and generating a customized version for your needs. It's hubris to think that anyone here is generating "novel" code.

    • csomar2m

      Yeah, it's so bad now I only trust my eyes. Everyone is faking posts, tweets and benchmarks that the truth no longer exists.

      I'm using Claude 3.7 now and while it improved on certain areas, it degraded on others (ie: it randomly removes/changes things more now).

      • namaria2m

        It's clear to anyone paying attention that LLMs hit a wall a while back. RAG is just expert systems with extra steps. 'Reasoning' is just burning more tokens in hopes it somehow makes the results better. And lately we've seen that a small blanket is being pulled that way or another.

        LLMs are cool, machine learning is cooler. Still no 'AI' in sight.

    • julienmarie2m

      I initially had the same experience. My codebase is super opinionated with a specific way to handle things. Initially it kept on wanting to do things it's way. I then changed my approach and documented the way the codebase is structured, how things should be done, all the conventions used and on every prompt I make sure to tell him to use these documents as reference. I also have a central document that keeps track of dependencies of modules and the global data model. Since I made these documents as reference developing new features has been a breathe. I created the architecture, documented it, and now it uses it.

      The way I prompt it is first I write the documentation of the module I want, following the format I detailed inbthe master documents, and ask him to follow the documentation and specs.

      I use cursor as well, but more as an assistant when I work on the architecture pieces.

      But I would never let an AI the driver seat for building the architecture and making tech decisions.

    • crabl2m

      What I've noticed from my extensive use over the past couple weeks has been Claude Code really sucks at thinking things through enough to understand the second and third order consequences of the code that it's writing. That said, it's easy enough to work around its deficiencies by using a model with extended thinking (Grok, GPT4.5, Sonnet 3.7 in thinking mode) to write prompts for it and use Claude Code as basically a dumb code-spewing minion. My workflow has been: give Grok enough context on the problem with specific code examples, ask it to develop an implementation plan that a junior developer can follow, and paste the result into Claude Code, asking it to diligently follow the implementation plan and nothing else.

      • simonw2m

        "Claude Code really sucks at thinking things through enough to understand the second and third order consequences of the code that it's writing"

        Yup, that's our job as software engineers.

      • cglace2m

        In all of these posts I fail to see how this is engineering anymore. It seems like we are one step away from taking ourselves out of the picture completely.

      • TylerLives2m

        This has been my experience as well. Breaking problems into smaller problems where you can easily verify correctness works much better than having it solve the whole problem on its own.

    • Balgair2m

      Hey, I've been hearing about this issue that programmers have on HN a lot.

      But I'm in the more 'bad programmer/hacker' camp and think that LLMs are amazing and really helpful.

      I know that one can post a link to the chat history. Can you do that for an example that you are comfortable sharing? I know that it may not be possible though or very time consuming.

      What I'm trying to get at is: I suck at programming, I know that. And you probably suck a lot less. And if you say that LLMs are garbage, and I say they are great, I want to know where I'm getting the disconnect.

      I'm sincerely, not trying to be a troll here, and I really do want to learn more.

      Others are welcome to post examples and walk through them too.

      Thanks for any help here.

      • vlod2m

        >and think that LLMs are amazing and really helpful

        Respectively, are you understanding what it produces or do you think that's its amazing because it produces something, that 'maybe' works.

        Here's an e.g. I was just futzing with. I did a refactor of my code (typescript) and my test code broke (vitest) and for some reason it said 'mockResolvedValue()' is not a function. I've done this a gazillion times.

        I allowed it via 3-4 iterations to try and fix it (I was being lazy and wanted my error to go away) and the amount of crap (rewriting tests, referenced code) it was producing was beyond ridiculous. (I was using github co-pilot).

        Eventually I said "f.that for a game of soldiers" and used by brain. I forgot to uncomment a vi.mock() during the refactor.

        I DO use it to fix stupid typescript errors (the error blob it dumps on you can be a real pita to process) and appreciate it when gives me a simple solution.

        So I agree with quite a few comments here. I'm not ready to bend the knee to our AI Overloads.

      • mns2m

        I can give you an example here. We had to do some basic local VAT validation for EU countries and as the API that you can use for that has some issues for some countries (as it checks this in the national databases) we wanted to also have a basic local one. So using Claude 3.7 I wanted to get some basic VAT validation, in general the answer and solution was good, you would be impressed, but here comes the fun part. The basic solution was just some regular expressions and then it went further on its own and created specific validations for certain countries. These validations were something like credit card number validations, sums, check digits, quite nice you would say. But the thing is in a lot of these countries these numbers are basically assigned randomly and have no algorithm, so it went on to hallucinate some validations that don't exist providing a solution that looks nice, but basically it doesn't work in most cases.

        Then I went on github and found that it used some code written by someone in JS 7 years ago and just converted and extended it for my language, but that code was wrong and simply useless. We'll end up with people publishing exploits and various other security flaws in Github, these LLMs will get trained on that and people that have no clue what they are doing will push out code based on that. We're in for fun times ahead.

      • alexkwood2m

        here is one solution i am helping out with which was very very easy to create using claude https://www.youtube.com/watch?v=R72TvoXCimg&t=2s

    • sovietmudkipz2m

      Maybe it’s the shot up plane effect; we only see the winners but rarely see the failures. Leads us to wrong or incorrect conclusions.

      Finding the right prompt to have current generation AI create the magic depicted in twitter posts may be a harder problem than most anticipate.

    • fullstackwife2m

      "wild", "insane" keywords usually are a good filter for marketing spam.

      • belter2m

        Influencer would be another term...

    • epolanski2m

      I don't believe in those either, and I never see compelling YouTube videos showing that in action.

      For small stuff LLMs are actually great and often a lifesaver on legacy codebases, but that's more or less where it stops.

    • noufalibrahim2m

      I'm in the same boat. I've found it useful in micro contexts but in larger programs, it's like a "yes man" that just agrees with what I suggest and creates an implementation without considering the larger ramifications. I don't know if it's just me.

    • iambateman2m

      I have a challenging, repetitive developer task that I need to do ~200 times. It’s for scraping a site and getting similar pieces of data.

      I wrote a worksheet for Cursor and give it specific notes for how to accomplish the task in a particular case. Then let it run and it’s fairly successful.

      Keep in mind…it’s never truly “hands off” for me. I still need to clean things up after it’s done. But it’s very good at figuring out how to filter the HTML down and parse out the data I need. Plus it writes good tests.

      So my success story is that it takes 75% of the energy out of a task I find particularly tedious.

      • WD-422m

        I haven’t found llm code gen to be very good except in cases like you mention here. When you need to do large boilerplatey code with a lot of hardcoded values or parameters. The kind of thing you could probably write a code generator yourself for if you cared enough to do it. Thankfully Llms can save us from some of that.

    • onion2k2m

      it rarely returns the right answer

      One of the biggest difficulties AI will face is getting developers to unlearn the idea that there's a right answer, and that of the many thousands of possible right answers, 'the code I would have written myself' is just one (or a few if you're one of the few great devs who don't stop thinking about approaches after your first attempt.)

    • rhubarbtree2m

      I spent a few hours trying cursor. I was impressed at first, I liked the feel of it and I tried to vibe code, whatever that means.

      I tried to get it to build a very simple version of an app I’ve been working on. But the basics didn’t work, and as I got it to fix some functionality other stuff broke. It repeatedly nuked the entire web app, then rolled back again and again. It tried quick and dirty solutions that would lead to dead ends in just a few more features. No sense of elegance or foundational abstractions.

      The code it produced was actually OK, and I could have fixed the bugs given enough time, but overall the results were far inferior to every programmer I’ve ever worked with.

      On the design side, the app was ugly as hell and I couldn’t get it to fix that at all.

      Autocomplete on a local level seems far more useful.

    • _steve_yegge_2m

      Gene and I would like to invite you to review our book, if you're up for it. It should be ready for early review in about 7-10 days.

      It seems like you would be the perfect audience for it. We're hoping the book can teach you what you need in order to have all those success stories yourself.

      • Gunnerhead2m

        How can I follow this book? I’m interested too.

    • moomin2m

      I can definitely save time, but I find I need to be very precise about the exact behaviour, a skill I learned as… a regular programmer. Soper up is higher in languages I’m not familiar with, where I know what needs doing but not necessarily the standard way to do it.

    • Delomomonl2m

      I had Claude prototype a few things and for that it's really enjoyable.

      Like a single page HTML J's page which does a few things and saves it state in local storage with a json backup feature (download the json).

      I also enjoy it for doing things I don't care much but makes it more polished. Like I hate my basically empty readme with two commands. It looks ugly and when I come back to stuff like this a few days/weeks later I always hate it.

      Claude just generates really good readmes.

      I'm trying out Claude code right now and like it so far.

    • Kiro2m

      Funny, because I have the same feeling toward the "I never get it to work" comments. You don't need any special prompt engineering so that's definitely not it.

    • dilap2m

      Yeah I gave Claude Code a try at about 5 different things, with miserable results on all of them (insult to injury -- each time it charged me about a buck!). I wonder if because it was C# with Unity code, maybe not so heavily represented in the training set?

      I still find lots of use for LLMs authoring stuff at more like the function level. "I know I need exactly this."

      Edit: I did however find it amazing for asking questions about sections of the code I did not write.

    • babyent2m

      I’ve dug into this a few times.

      Every single time they were doing something simple.

      Just because someone has decades of experience or is a SME in some niche doesn’t mean they’re actually good… engineers.

    • yodsanklai2m

      > I do find value in asking simple but tedious task like a small refactor or generate commands,

      This is already a productivity boost. I'm more and more impressed about what I can get out of these tools (as you said, simple but tedious things). ChatGPT4o (provided by company) does pretty complex things for me, and I use it more and more.

      Actually, I noticed that when I can't use it (e.g. internal tools/languages), I'm pretty frustrated.

      • cglace2m

        Are you concerned that these tools will soon replace the need for engineers?

    • kolbe2m

      I am willing to say I am a good prompt engineer, and "AI takes the wheel" is only ever my experience when my task is a very easy one. AI is fantastic for a few elements of the coding process--building unit tests, error checking, deciphering compile errors, and autocompleting trivially repetitive sections. But I have not been able to get it "take the wheel"

    • nsonha2m

      this space is moving really fast, I suggest before forming an definitive opinion try the best tool, such as the latest Claude model and use "agentic" mode or the equivalence on your client. For example, on Copilot this mode is brand new and only available in vscode insider. Cursor and other tools have had it for a little longer.

      • collingreen2m

        People have been saying it writes amazing code that works for far longer than that setup has been available though. Your comment makes me think the product is still trying to catch up to these expectations people are setting.

        That being said I appreciate your suggestion and will consider giving that a shot.

    • egorfine2m

      You have to learn and figure out how to prompt it. My experience with Claude Code is this: one time it produces an incredible result; another time it's an utter failure. There are prompt tips and tricks which have enormous influence on the end result.

      • ido2m

        Can you give us some of these tips?

    • jayd162m

      I agree it feels very different from my experience.

      I'm curious when we'll start seeing verifiable results like live code streams with impressive results or companies dominating the competition with AI built products.

    • 2m
      [deleted]
    • Ancalagon2m

      Are you actually using claude? There's an enormous difference between claude code and copilot, with the latter being a bigger burden these days than a help.

    • razemio2m

      Can you clarify what tools and programming language you use? If find that the issue often is wrong tooling, exotic programming languages or frameworks.

      • 2m
        [deleted]
      • develoopest2m

        I would consider frontend tasks using Typescript and React quite standard.

    • gloosx2m

      From the creators of static open-source marketing benchmarks: twitter PR posts.

    • huvin992m

      Is this true even for Claude 3.7 Sonnet/3.7 Sonnet Thinking ?

    • timewizard2m

      Oh. That's because he's clearly lying.

    • habinero2m

      It's not.

      A lot -- and I mean a lot -- of people who hype it up are hobby or aspirational coders.

      If you drill down on what exactly they use it for, they invariably don't write code in professional settings that will be maintained and which other humans have to read.

      Everyone who does goes "eh, it's good for throwaway code or one offs and it's decent at code completion".

      Then there's the "AGI will doom us all" cult weirdos, but we don't talk about them.

    • dgellow2m

      I have the same experience

    • darepublic2m

      What model are you using

    • EigenLord2m

      You've got to do piecemeal validation steps yourself, especially for models like Sonnet 3.7 that tend to over-generate code and bury themselves in complexity. Windsurf seems to be onto something. Running Sonnet 3.7 in thinking mode will sometimes reveal bits and pieces about the prompts they're injecting when it mentions "ephemeral messages" reminding it about what files it recently visited. That's all external scaffolding and context built around the model to keep it on track.

  • CGamesPlay2m

    I decided to check this out after seeing the discussion here. I had previously misunderstood that it required a Claude.ai plan, but it actually just uses your API keys.

    I did a comparison between Claude Code and Aider (my normal go-to): I asked it to do clone a minor feature in my existing app with some minor modifications (specifically, a new global keyboard shortcut in a Swift app).

    Claude Code spent about 60 seconds and $0.73 to search the code base and make a +51 line diff. After it finished, I was quite impressed by its results; it did exactly the correct set of changes I would have done.

    Now, this is a higher level of task than I would normally give to Aider (because I didn't provide any file names, and it requires changing multiple files), so I was not surprised that Aider completely missed the files it needed to modify to start (asking me to add 1 correct file and 2 incorrect files). I did a second attempt after manually adding the correct files. After doing this, it produced an equivalent diff to Claude Code. Aider did this in 1 LLM prompt, or about 15 seconds, with a total cost of $0.07, about 10% of the Claude Code cost.

    Overall, it seems clear that the higher level of autonomy carries a higher cost with it. My project here was 7k SLOC; I would worry about ballooning costs on much larger projects.

    • bufferoverflow2m

      Now, the billion dollar question is - how long would it take you to code that diff?

      • CGamesPlay2m

        Probably about 3 minutes? That's my main usage of these types of coding tools, honestly. I already know generally what I want to happen and validating that the LLM is on the right track / reached the right solution is easy.

        I'm not saying that "Claude Code makes me 300% more effective", but I guess it did for this (simple) task.

      • ddacunha2m

        I recently made some changes to a website generated by a non-technical user using Dreamweaver. The initial state was quite poor, with inline CSS and what appeared to be haphazard copy-pasting within a WYSIWYG editor.

        Although I’m not proficient in HTML, CSS, or JavaScript, I have some understanding of what good code looks like. Through several iterations, I managed to complete the task in a single evening, which would have required me a week or two to relearn and apply the necessary skills. Not only the code is better organised, it’s half the size and the website looks better.

      • fragmede2m

        time spent is not the only question. how much thought it takes, however impossible that may be to measure, is another one. If an LLM assisted programmer is able to solve the problem without deep focus, while responding to emails and attending meetings, vs the programmer who can't, is time really the only metric we can have here?

      • 2m
        [deleted]
      • ignoramous2m

        > how long would it take you to code that diff?

        My scrum/agile coach says, by parallelizing prompts, a single developer can babysit multiple changes in the same time slice. By having a sequence of prompts ready before hand, a single developer can pipeline those one after the other. With an IDE that helps schedule such work, a single developer can effectively hyper-thread their developmental workflow. If the developer is epoll'ing at 10x the hertz... that's another force multiplier. Of course context switches & side-channels are of concern, but a voice over my shoulder tells me that as long as memory safety is guaranteed, everything should turn up alrigd3adb33f.

      • throwawaymsft2m

        Infinite, because the median counterfactual is never getting around to this P4 “nice to have” issue that’s languished in the backlog.

    • artdigital2m

      Same here, I did a few small taks with Claude Code after seeing this discussion here and is too expensive for me.

      A small change to create a script file (20 LoC) was 10cts, a quick edit to a README was 7ct

      Yes yes engineers make more than that blah blah but the cost would quickly jump out of control for bigger tasks. I’d easy burn through $10-20 upwards a day with this, or upwards $100-$300 a month. Unless you have a Silicon Valley salary, that’s too expensive.

      I use other tools like Cody (the tool the author created) or Copilot because I pay $10 a month and that’s it. Yes I get rate limited almost daily but I don’t need to worry that my tool cost is going out of control suddenly.

      I hope Anthropic introduces a new plan that bundles Claude Code into it, I’d be much more comfortable using that knowing it won’t suddenly be more than my $50/mo (or whatever)

      • stevage2m

        It's an interesting question. As a freelance consultant, theoretically a tool like this could allow me to massively scale up my income, assuming I could find enough clients.

        I'm a bit nervous where I'd end up though - with code I'd "written" but wasn't familiar with, and with who knows what kinds of limitations or subtle bugs baked in.

      • bayarearefugee2m

        > Yes yes engineers make more than that blah blah but the cost would quickly jump out of control for bigger tasks.

        Also (most) engineers don't hallucinate answers. Claude still does regularly. When it does it in chat mode via a flat rate Pro plan I can laugh it off and modify the prompt to give it the context it clearly didn't understand but if its costing me very real money for the LLM to over-eagerly over-engineer an incorrect implementation of the stated feature its a lot less funny.

      • zo12m

        I use Grok and it's free (even Grok3). I definitely don't hit limits unless it's a pretty heavy day and I do a lot of adjustments. Also, don't send entire codebases to it, just one-off files. What's quite amazing is how it doesn't matter that it doesn't have the source to dependent files, it figures it out and infers what each method does based on its name and context, frigging amazing if you ask me.

        And it doesn't fight me like the OpenAI tooling does that logs me out randomly every day and I have to login and spend 4 minutes copying login codes from my email or answering their stupid Captcha test. And this is on their API playground where I pay for every single call - so not like I'm trying to scrape my free chat usage as an API.

    • lolinder2m

      I think that tools like this have to operate on a subscription model like Cursor does in order to make any kind of sense for most users. The pay as you go model for agentic code tools makes you responsible for paying for:

      * Whatever context the agent decides to pull in.

      * However many iterations the model decides to run.

      * Any result you get, regardless of how bad it is.

      With pay as you go, the tool developer has no incentive to minimize any of these costs—they get paid more if it's inefficient, as long as it's not so inefficient that no one uses it. They don't even need it to be especially popular, they just need some subset of the population to decide that costs don't matter (i.e. those with Silicon Valley salaries).

      With Cursor's model of slow and fast queries, they are taking responsibility for ensuring that the agents are as cost efficient as possible. The more efficient the agent the larger their cut. The fewer times that people have to ask a question a second time, the larger their cut. This can incentivize cutting corners, but that somewhat balanced out by the need to keep people renewing their subscription, and on the whole for most users it's better to have a flat subscription price and a company that's optimizing their costs than to have a pay-as-you-go model and a company that has no incentive to improve efficiency.

      • foz2m

        I think this core business model question is happening at all levels in these companies. Each time the model goes in the wrong direction, and I stop it - or I need to go back and reset context and try again - I get charged. The thing is, this is actually a useful and productive way to work sometimes. Like when pairing with another developer, you need to be able to interrupt each other, or even try something and fail.

        I don't mind paying per-request, but I can't help but think the daily API revenue graph at Cursor is going up whenever they have released a change that trips up development and forces users to intervene or retry things. And of course that's balanced by churn if users get frustrated and stop or leave. But no product team wants to have that situation.

        In the end I think developers will want to pay a fair and predictable price for a product that does a good job most of the time. I don't personally care about switching models, I tend to gravitate towards the one that works best for me. Eventually, I think most coding models will soon be good at most things and the prices will go down. Where will that leave the tool vendors?

      • cft2m

        Unfortunately the opposite is happening: Cursor is going to pay-per-use model:

        https://x.com/ericzakariasson/status/1898753771754434761

        https://old.reddit.com/r/cursor/comments/1j5kvun/cursor_0470...

        I am afraid that the endgame of programming will be who has the biggest budget for an LLM, further consolidating programming to megacorps and raising barrier to entry.

    • personjerry2m

      That seems like a steal? Engineers are paid much more to do much less

      • CGamesPlay2m

        No, I'm paid much more to do much more than what I did in this simple task. Claude didn't even test the changes (in this case, it does not have the hardware required to do that), or decide that the feature needed to be implemented in the first place. But my comparison wasn't "how do I compare to Claude Code", it was "how does Aider compare to Claude Code". My boss does not use Aider or Claude Code, and would not be happy with the results of replacing me with it (yet).

      • beepbooptheory2m

        I know this is not really in the spirit of the room here, but before I ever dreamed of getting paid to code, I only learned to at all because I was cheap and/or poor cook/grad student that wanted to make little artsy musical things on the computer. I remember the first time I just downloaded pure data. No torrent, no cracks, it was just there for me and all it asked for was my patience.

        The only reason I ever got into linux at all was because I ended up with some dinky acer chromebook for school but didn't want to stop making stuff. Crouton changed my life in a small way with that.

        As I branched out and got more serious, learning web development, emacs, java, I never stopped feeling so irrationally lucky that it was all free, and always would be. Coming on here and other places to keep learning. It is to this day still the lovely forever hole I can excavate that costs only my sleep and electricity.

        This is all not gone, but if I was just starting now, I'd find hn and so and coding twitter just like I did 10 years ago, but would be immediately turned off by this pervasive sense that "the way to do things now" is seemingly inseparable from a credit card number and monthly charge, however small. I just probably would not of gotten into it. It just wouldn't feel like its for me: "oh well I don't really know how to do this anyway, I can't justify spending money on it!" $0.76 for 50 loc is definitely nuts, but even $0.10 would of turned me way off. I had the same thoughts with all the web3 stuff too...

        I know this speaks more to my money eccentricities than anything, and I know we dont really care on here about organic weirdo self teachers anymore (just productivity I guess). I am truly not even bemoaning the present situation, everyone has different priorities, and I am sure people are still having the exciting discovery of the computer like I did on their cursor ide or whatever. But I am personally just so so grateful the timeline lined up for me. I don't know if I'd have my passion for this stuff if I was born 10 years later than I was, or otherwise started learning now. But I guess we don't need the passion anymore anyway, its all been vectorized!

      • Bjorkbat2m

        Tangential, but this reminds me of something someone said on Twitter that has resonated with me ever since. Startups targeting developers / building developer tooling are arguably one of the worst startups to build, because no matter how much of a steal the price is relative to the value you get, developers insist they can build their own or get by with an open-source competitor. We're as misguided on value as we are on efficiency and automation (more specifically, the old trope of a dev spending hours to automate something that takes minutes to do).

      • tropin2m

        Not everybody works in USA.

    • caseyf72m

      Which model did you use with aider?

      • CGamesPlay2m

        My post above was with sonnet-3.5. When I used sonnet-3.7, it didn't speculate at the files at all; it simply requested that I add the appropriate ones.

    • maineagetter2m

      [dead]

  • chaosprint2m

    It seems the original poster hasn't extensively tried various AI coding assistants like Cursor or Windsurf.

    Just a quick heads-up based on my recent experience with agent-based AI: while it's comfortable and efficient 90% of the time, the remaining 10% can lead to extremely painful debugging experiences.

    In my view, the optimal scenarios for using LLM coding assistants are:

    - Architectural discussions, effectively replacing traditional searches on Google.

    - Clearly defined, small tasks within a single file.

    The first scenario is highly strategic, the second is very tactical. Agents often fall awkwardly between these two extremes. Personally, I believe relying on an agent to manage multiple interconnected files is risky and counterproductive for development.

    • sitkack2m

      Steve Yegge, you should know who he is AND the post also mentioned Cursor and Windsurf. His own company works on a similar product.

    • hashmap2m

      This has been my experience as well. I find that the copy/paste workflow with a browser LLM still gets me the most bang for the buck in both those cases. The cli agents seem to be a bit manic when they get hold of the codebase and I have a harder time corralling them into not making large architectural changes without talking through them first.

      For the moment, after a few sessions of giving it a chance, I find myself using "claude commit" but not asking it to do much else outside the browser. I still find o1-pro to be the most powerful development partner. It is slow though.

    • tomnipotent2m

      The author works on Cody at Sourcegraph so I'll give him the benefit of the doubt that he's tried all the major players in the game.

    • finolex12m

      He literally says in his post "It might look antiquated but it makes Cursor, Windsurf, Augment and the rest of the lot (yeah, ours too, and Copilot, let's be honest) FEEL antiquated"

    • sbszllr2m

      > In my view, the optimal scenarios for using LLM coding assistants are:

      > - Architectural discussions, effectively replacing traditional searches on Google.

      > - Clearly defined, small tasks within a single file.

      I think you're on point here, and it has been my experience too. Also, not limited to coding but general use of LLMs.

    • bdangubic2m

      duuuuude :) you should seriously consider deleting this post… if you do not know who Steve Yegge is (the original poster as you call him) you really should delete this post

      • turnsout2m

        I appreciate that you're super attuned to this frothy space, but not everyone cares about learning all the many personalities in the ecosystem… even people who care about this stuff

    • intrasight2m

      > extremely painful debugging experiences.

      I'd claim that if you're debugging the code - or even looking at it for that matter - that you're using AI tools the wrong way.

      • chaosprint2m

        I'd be very interested to know of a way to make it work with AI that doesn't require debugging if you can illustrate.

      • vunderba2m

        Congratulations. You allow the AI to make some new subroutine, and you immediately commit and merge the changes to your system. You run it, and it executes without throwing any immediate errors.

        The business domain is far more nuanced and complex, and your flimsy "does it compile" test for the logic doesn't even begin to cover the necessary gamut of the input domain which you as the expert might have otherwise noticed had you performed even a cursory analysis of the LLM generated code before blindly accepting it.

        Nice to know that I'm going to be indefinitely employed fixing this kind of stuff for decades to come...

      • collingreen2m

        This is exactly my impression of the summary of these kinds of posts and, I'm speculating here, maybe where there is such a stark difference.

        I'm guessing that the folks who read the output and want to understand it deeply and want to "approve" it like a standard pull request are having a very different perspective and workflow than those who are just embracing the vibe.

        I do not know if one leads to better outcomes than the other.

  • inciampati2m

    I'm impressed by how many people who are working with Claude Code seem to have never heard of its open source inspiration, aider: https://aider.chat/

    It's exactly what Yegge describes. It runs in the terminal, offering a retro vision of the most futuristic thing you can possibly be doing today. But that's been true since the command line was born.

    But it's more than Claude Code, in that it's backend LLM agnostic. Although sonnet 3.7 with thinking _is_ killing the competition, you're not limited to it, and switching to another model or API provider is trivial, and something you might do many times a day even in your pursuit of code that vibes.

    I've been a vim and emacs person. But now I'm neither. I never open an editor except in total desperation. I'm an aider person now. Just as overwhelmingly addicted to it as Yegge is to Claude Code. May the future tooling we use be open source and ever more powerful.

    Another major benefit of aider is its deep integration with git. It's not "LLM plus coding" it's really like an interface between any LLM (or LLMs), git, and your computer. When things go wrong, I git branch, reset to a working commit, and continue exploring.

    note: cross-posted from the other (flagged) thread on this

    • victorbjorklund2m

      Aider is amazing. You can even use it in copypaste mode with web-based AI:s.

  • rs1862m

    A single tweet with lots of analogy, with no screenshot/screen recording/code examples whatsoever. These are just words. Are we just discussing programming based on vibe?

    • delusional2m

      It's influencer culture. It's like when people watch those "software developer" youtubers and pretend it's educational. It's reality television for computer people.

      • mpalmer2m

        Reality television plus cooking show, exactly.

      • tylerrobinson2m

        > reality television for computer people

        Complete with computer people kayfabe!

    • frankc2m

      I think the interest has more to do with who is doing the tweeting, don't you think?

      • rs1862m

        Reminder: "appeal to authority" is a classical logical fallacy.

        For me, I don't know this person, which means that all the words are completely meaningless.

        Which is exactly the point.

    • mhh__2m

      > vibe

      People do literally call it vibe coding.

      https://en.wikipedia.org/wiki/Vibe_coding (it turns out there is a wikipedia page already although suspect it'll be gone soon)

      • Bjorkbat2m

        I'm amused by all the flags this article has. It reinforces this belief that "vibe-coding" isn't something that evolved organically, but was forced. I wouldn't go as far as to call it "astroturfed", I believe it was a spontaneous emergence, but otherwise it feels like an effort by a disproportionately small group of people desperately trying to make vibe-coding a thing for whatever reason.

      • rglover2m

        This is quite possibly one of the most disturbing things I've seen in awhile.

        Sure, for fun and one-off private apps this is fine, but the second that some buzzed-on-the-sides haircut guy thinks this is the way, the chaos will rival a Hollywood disaster movie.

    • kleiba2m

      What, someone cannot utter an opinion anymore?

      • h4ny2m

        I find that question ironic.

  • raylad2m

    I tried it on a small Django app and was not impressed in the end.

    It looks like it’s doing a lot, and at first I was very impressed, but after a while I realized that when it ran into a problem it kept on trying nonworking strategies even though it had tried them before and I had added to claude.md instructions to keep track of strategies and not reuse failing ones.

    It was able to make a little progress, but not get to the end of the task, and some of its suggestions were completely insane. At one point there was a database issue and it suggested switching to an entirely different database than the one that was already used by the app, which was working and production.

    $12 spent in a couple of hours later, it had created 1200 lines of partially working code and rather of a mess. I ended up throwing away all the changes and going back to using the web UI.

    • babyent2m

      Now take your $12 and multiply it by 100k people or more trying it.

      Even if you won’t use it again, that’s booked revenue for the next fundraise!

    • nprateem2m

      I use it like a brush for new apps and a scalpel for existing ones and it generally works well. If it can't solve something after 3 attempts though I just do it.

    • winrid2m

      LLMs seem to work a lot better with statically typed languages where the compiler can give feedback.

      • jimbokun2m

        LLMs are what’s finally going to make Haskell popular!

  • phartenfeller2m

    I tried it too and tasked it to do a bigger migration (one web framework to another). It failed pretty bad where I stopped the experiment. It still gave me a headstart where I can take parts and continue the migration manually. But the worst thing was that it did things I didn't asked for like changing the HTML structure and CSS of pages and changing hand picked HEX color codes...

    More about my experience on my blog: https://hartenfeller.dev/blog/testing-claude-code

    • ludamn2m

      Such a nice read, thanks for sharing!

  • ing33k2m

    I tried Claude Code and gave it very clear instructions to build a web based tool I wanted to build over the weekend. It did exactly that ! sure, there were some minor modifications I had to make, but it completed over 80% of the work for me.

    As for the app itself, it included a simple UI built on React with custom styling and real-time support using WSS . I provided it with my brand colors and asked it to use chadcn. It also includes a Node.js-based backend with socket.io and puppeteer. I even asked it to generate a Dockerfile and Kubernetes manifests. It almost did a perfect job—the only thing I had to fix manually was updating my Ingress to support WSS.

    After studying the K8s manifests, I learned a bunch of new things as well.. spent around $6 for this session and I felt that it was worth it.

  • bob10292m

    I find that maintaining/developing code is not an ideal use case for LLMs and is distracting from the much more interesting ones.

    Any LLM application that relies more-or-less on a single well-engineered prompt to get things done is entry level and not all that impressive in the big picture - 99% of the heavy lifting is in the foundation model and next token prediction. Many code assistants are based on something like this out of necessity of needing to support anybody's code. You can't rely on too many clever prompt chaining patterns to build optimizations for Claude Code because everyone takes different approaches to their codebase and has wildly differing expectations for how things should go down. Because the range of expectations is so vast, there is a lot of room to get disappointed.

    The LLM applications that are most interesting have the model integrated directly with the product experience and rely on deep domain expertise to build sophisticated chaining of prompts, tool calling and nesting of conversations. In these applications, the user's experience and outcomes are mostly predetermined with the grey areas intended to be what the LLM is dealing with. You can measure things and actually do something about it. What was the probability of calling one tool over the other in a specific context of use? Placing these prompts and statistics alongside domain requirements will enable you to see and make a difference.

  • hleszek2m

    I must have been a little too ambitious with my first test with Claude Code.

    I asked it to refactor a medium-sized Python project to remove duplicated code by using a dependency injection mechanism. That refactor is not really straightforward as it involves multiple files and it should be possible to use different files with different dependencies.

    Anyway, I explain the problem in a few lines and ask for a plan of what to do.

    At first I was extremely impressed, it automatically used commands to read the files and gave me a plan of what to do. It seemed it perfectly understood the issue and even proposed some other changes which seemed like a great idea.

    So I just asked him to proceed and make the changes and it started to create folders and new files, edit files, and even run some tests.

    I was dumbfounded, it seemed incredible. I did not expect it to work with the first try as I had already some experience with AI making mistakes but it seemed like magic.

    Then once it was done, the tests (which covered 100% of the code) were not working anymore.

    No problem, I isolate a few tests failing and ask Claude Code to fix it and it does.

    Now for a few times I found some failing tests and ask him to fix it, slowly trying to fix the mess until there is a test which had a small problem: it succeeded (with pytest) but froze at the end of the test.

    I ask again Claude Code to fix it and it tries to add code to solve the issue, but nothing works now. Each time it adds some bullshit code and each time it fails, adding more and more code to try to fix and understand the issue.

    Finally after $7,5 spent and 2000+ lines of code changed it's not working, and I don't know why as I did not make the changes.

    As you know it's easier to write code than to read code so at end I decided to scrape everything and do all the changes myself little by little, checking that the tests keep succeeding as I go along. I did follow some of the recommended changes it proposed tough.

    Next time I'll start with something easier.

    • jpc02m

      Really yoy nearly got the correct approach there.

      I generally follow the same approach these days, ask it to develop a plan then execute but importantly have it excute each step in as small increments as possible and do a proper code review for each step. Ask if for changes you want it to make.

      There is certainly times I need to do it myself but definitely this has improved some level of productivity for me.

      It's just pretty tedious so I generally write a lot of "fun" code myself, and almost always do the POC myself then have the AI do the "boring" stuff that I know how to do but really don't want to do.

      Same with docs, the modern reasoning models are very good at docs and when guided to a decent style can really produce good copy. Honestly R1/4o are the first AI I would actually concider pulling into my workflow since they make less mistakes and actually help more than they harm. They still need to be babysit though as you noticed with Claude.

    • UncleEntity2m

      > ...do all the changes myself little by little, checking that the tests keep succeeding as I go along.

      Or... you can do that with the robots instead?

      I tried that with the last generation of Claude, only adding new functionality when the previously added functionality was complete, and it did a very good job. Well, Claude for writing the code and Deepseek-R1 for debugging.

      Then I tried a more involved project with apparently too many moving parts for the stupid robots to keep track of and they failed miserably. Mostly Claude failed since that's where the code was being produced, can't really say if Deepseek would've fared any better because the usage limits didn't let me experiment as much.

      Now that I have an idea of their limitations and had them successfully shave a couple yaks I feel pretty confident to get them working on a project which I've been wanting to do for a while.

    • darkerside2m

      I'm curious for the follow up post from Yegge, because this post is worthless without one. Great, Claude Code seems to be churning out bug fixes. Let's see if it actually passes tests, deploys, and works as expected in production for a few days if not weeks before we celebrate.

    • elcomet2m

      I'm wondering if you can prompt it to work like this - make minimal changes, and run the tests at each step to make sure the code is still working

      • espdev2m

        This thing can "fix" tests, not code. It just adjusts tests to incorrect code. So you need to keep an eye on the test code as well. That sounds crazy, of course. You have to constantly keep in mind that LLM doesn't understand what it is doing.

    • biorach2m

      git commit after each change it makes. It will eventually get itself into a mess. Revert to the last good state and tell it to try a different approach. Squash your commits at the end

  • noisy_boy2m

    The trick is not to get sucked into making it do 100% of the task and have a judgement of the sweet spot. Provide it proper details upfront along with the desired overall structure - that should settle in about 10-15 mins of back and forth. This must include tests that you have to review manually - again you will find issues and lose time again (say about 30-45mins). Cut your losses and close the lose ends of the test cide. Now run the tests and start giving it discreet tasks to fix the tests. This is easily 20-40 mins. Now take over and go through the while thing yourself because this is where you will find more issues upon in-depth checking (the LLM has done most of what it could) and this where you must understand the code you need to support.

  • iwasbirchyfirst2m

    I went from copying and pasting with ChatGPT Canvas, to Claude Artifacts, to Cursor w Claude. I haven't explored Rules yet, but have been using the Notepads for much of this. Much of my time is spent managing git commits/reverts, and preparing to bring Claude up to speed after the next chat renewal.

    AI coding is like having a partner who is both the smartest person in town, but also a functional alcoholic, who's really, really good at hiding it. LLMs act like they are working in a dark warehouse with a flashlight, and Altzheimers.

    They can get an amazing amount of functional code done in a few minutes, and then spend hours trying to fix one detail. My standard prompt begins with, "Don't guess, debug!" They have limited resources and will bs you if they can.

    For longer projects, since every prompt is almost starting from scratch (they do have a limited buffer, which will make it easy to become complacent), if you get into repeated debugging sessions, it will start creating new functions instead making exisiting functions work, and code bloat is tremendous. Perhaps Rules work, but I've given up trying to get it to code in my style. I'm trying to have AI do all the coding so I can just be the "idea" guy ("vibe" coding), so I'm learning to let go and let it code in ways the I would hate to maintain. It's working from code examples that don't use my style, so I'm not going to keep fighting it on style (with some execptions like variable naming conventions).

    • internet_points2m

      > AI coding is like having a partner who is both the smartest person in town, but also a functional alcoholic, who's really, really good at hiding it.

      I'm stealing this :-) (And from now on I'll be imagining coding alongside Columbo. Claude Columbo.)

  • benzible2m

    This is very far from my experience. It's been impressive on a few things but as of now I'm on day 3 of trying to get it to fix an issue in an open source library I maintain that I haven't had time to deal with. I'm in an endless loop where it keeps having the same epiphany about the root cause, then it implements a "fix", then tries to verify via a test script, then when it runs into difficulty it keeps adding workarounds to the test script. I can't get it focused on the fact that it's acting on behalf of the library author and that hacking the test script has no value.

    I am not disposed to AI skepticism. I'd love it if a tool existed that worked as this guy claims. Claude Code is the best tool of its type that I've worked with but I'd put it at "on balance a time-saver in a lot of cases, way more trouble than it's worth in many others".

  • mtlynch2m

    This is particularly interesting, as Steve Yegge works on (and I think leads) Sourcegraph Cody[0], which is a competitor to Claude Code.

    Cody does use Claude Sonnet, so they do have some aligned interests, but it's still surprising to see Yegge speak so glowingly about another product that does what his product is supposed to do.

    [0] https://sourcegraph.com/cody

    • mechanicum2m

      I mean, doing that is pretty much what made him (semi-)famous in the first place (https://gist.github.com/chitchcock/1281611).

      • mtlynch2m

        Yeah, but it's pretty different complaining from the position of a rank and file engineer at what was then like a 50k-person org as opposed to praising a competitor's product when you're at a small company, and you're the public face of your product.

      • istjohn2m

        Thanks, I never read that one. Yegge's writing is just delicious. He could write a guide to watching paint dry and I would savor every word.

    • esafak2m

      Cody lets you pick your model.

    • manojlds2m

      Rising tide lifts all the boats and all that.

      Claude Code didn't feel that different to me, and maybe they have something that is better and when they do release it they can say hey look, we pushed hard and have something that's better than even Claude Code.

  • ddawson2m

    I do not know how to code, nothing beyond the simplest things but I am obsessed with AI development. Several month’s ago I decided to see if Claude could do the work for me.

    My test was to create a tower defense game that would run in a web browser. It took me about eight prompts, each time refining it, including agreeing to the suggesting that Claude recommended and seeing Claude agree with me on the bugs that I pointed out.

    It was mind blowing. It’s not pretty at all but is recognizable as a TD game. I thanked Claude and said that was enough and Claude signed off as well saying, well if you’re going to continue to refine it, do these four things. I was really stunned.

    • 1010082m

      Honesty question, and leaving aside implications about what's possible and all of that, what was particular positive about the experience?

      You didn't do anything, just asked a different entity to do it for you. And nothing noble or original, just a copy of existing games. I see no difference between this and getting a $500 coupon to use at Fiverrr and ask a freelance engineer to do the same while you chat with them.

      • atonse2m

        Is there anything inherently noble about programming if not to solve a real world problem?

        If they were doing it for an exam where their skills were being evaluated, that’s one thing. But if they were doing it as a means to an end, does it matter if they found a more efficient way to do it?

      • ddawson2m

        I'm not asking for an award. lol. I'm not sure exactly what you're after here, with asking what was particularly positive.

        It's a personal attempt to see how much I can do with an automaton. I could pay someone to do my taxes or file them myself (I'm in the US). There is much more room for frustration but also lots of benefits to the latter.

        In particular, with Claude Artifacts, I had a chance to see an amazing innovation. Have you ever wanted to see something new just because it's new? It changes you, which of course is one of the purposes of exploring novelties. By the way, this was my experience in July 2024.

    • kypro2m

      I was talking to some colleagues about this recently, and I think the reason non-coders and amateur coders seem to be so much more impressed by the current state of AI code gen is that they don't fully understand what actually goes into the average software project.

      Setting up a simple game on your local machine really isn't that hard. For example, you can probably take an open-source project and with some minor alterations have something working pretty quickly.

      But most software development doesn't work like this. You are typically given very explicit requirements and when you're working for a corporate with real customers you have high quality standards that need to be met. Often this is going to require a lot of bespoke code and high-level solutionising which you're not going to get out of some open source project (at least not without significant changes).

      Similarly, productionising products requires a lot of work and consideration which spinning something up locally doesn't. You need to think about security, data persistence, hosting, deployment, development environments, documentation, etc, etc, etc...

      I think this partly explains why people have such widely different opinions on these tools at the moment. I acknowledge they write pretty good code, but for me they're almost useless in 90% of the things I do and think about as a software engineer.

    • fergie2m

      I suspect, that just like real developers, that Claude is best at "greenfield" projects, but not so good at making changes to existing code generated by other developers or AIs.

  • bn-l2m

    It really feels like I’m in an alternate reality with these posts. Is this paid shilling? I’m honestly wondering now considering how different my experience is EVERY DAY with llms for code.

    • rcpt2m

      LLMs absolutely destroy interview questions and programming contests. They can spit out well designed classes and nicely packaged functions instantly.

      But in my experience they haven't been great whenever asked to something more high level

    • namaria2m

      > You just open your heart and your wallet, and Claude Code takes the wheel.

      Yeah it sounds a lot like marketing to me

  • omgwalt2m

    I've been using Claude for about 3 months now. What I've learned is that you have to learn how Claude "thinks" about your project, meaning what kinds of mistakes he makes. They're pretty consistent. As you uncover each new kind, compile a list of things to remind him of each time. For my own project, some of my reminder items to give him each chat (sometimes multiple times in the chat) include: "maintain a single source of truth", "avoid duplication", "stick to our established architecture instead of building new architecture". I always make sure he has an up-to-date file tree to look at. I remind him to ask me if there's a particular file he needs to see for reference instead of making up something new. I also say to make the changes specific to just what needs to be changed, rather than building in stuff and "anticipating" problems that aren't even here yet. Stuff like that. By the way, Project Knowledge is a very useful feature. Just don't expect Claude to ever look at it after your first statement/question. That's when he looks at what's there.

    • wholinator22m

      Tangential but referring to the AI as anything other than "it" is still extremely uncomfortable to me. It feels like the first step towards an inevitable "AI girl/boyfriend" trap.

  • trescenzi2m

    I’m sorry what is happening with this paragraph:

    > As long as the bank authorizations keep coming through, it will push on bug fixes until they're deployed in production, and then start scanning through the user logs to see how well it's doing.

    I enjoy using these tools. They help me in my work. But the continual hype makes the discussion around them impossible to be genuine.

    So I ask, genuinely, did I miss the configuration section where you can have it scan your logs for new errors and have it argue with you on PRs? Is he trying to say something else? Or is it just anthropomorphizing hype?

    • deanputney2m

      I cannot tell if the original tweet is sarcasm or not. Sections like this make me think yes? It's got to be at least tongue-in-cheek.

      • breckenedge2m

        My take is it's a mix of both sarcasm and not sarcasm, even in the same sentence. It's a post truth future with a ton of upvotes.

    • frankc2m

      I haven't got to trying claude code yet, but absolutely with cursor and windsurf you can have the agent be reading the output of what it writes and runs and it can fix things it sees. You can also have it review code. It also help some times to have it review in a fresh chat with less context. I really think a lot of people on HN are not really pushing on everything that is available. Its magic for me but I spend a lot of effort and skill manifesting the magic. I'm not doubting other people's experience really, but wondering if they are giving up too fast because they actually don't want it work well for ego reasons.

      • techpineapple2m

        I’m going to keep at it, because I was trained as an SRE, not a developer and have lots of ideas for side projects that have thus far taken a long time to get going, but I’ve been struggling, it sort of quickly gets into these infinite loop situations where it can’t seem it can’t seem to fix a feature and goes back and forth between multiple non working states. CSS layouts but even basic stuff like having the right web socket routes.

        We’ll see, maybe my whole approach is wrong, I’m going to try with a simpler project, my first approach was relatively complex.

      • trescenzi2m

        Oh ok this makes sense. Because of the ordering of the sentences I read it as “it pushes the code to production and then monitors it in production”.

        I have found that prompting something like “do X and write tests to confirm it works” works well for what you’re describing. Or even you write the tests then it’ll iterate to make sure they pass.

    • bakies2m

      Yes it will, I wrote a quick script for local deployment (and then had claude improve it) and then quickly write documentation (and have claude improve it) on how to deploy & gather logs. It will do those things and follow the logs while I'm clicking around in the app. When starting a new session it will read the docs and know how to deploy and check logs. If something failed in the docker build that script output is read by claude since it ran it.

      haven't tried PR stuff yet though

  • credit_guy2m

    I'm using Copilot for writing documentation jupyter notebooks. I do lots of matplotlib plots. Setting up these plots takes lots of repetitive lines of code. Like plt.legend(). With Copilot these lines just show up, and you press tab and move on. Sometimes it is freaky how it guesses what I want to do. For this type of work, Copilot increases my productivity by a factor of 5 easily.

    There are other types of work where Copilot is useless. But it's up to me to take the good parts, and ignore the bad parts.

    • bglazer2m

      Yeah copilot is very good for matplotlib. Clunky interface with lots of repetitive code, but also tons of examples on the internet means that I almost never write matplotlib code by hand anymore.

  • jgalt2122m

    That post from Yegge reads like it was written by some foaming at the mouth VC. I am getting decent use from Claude, but mostly as a stack overflow replacement. I will never let it read our company's code base. Then everyone will know how to do what we do. After all, that's why these things are so good at React and not so good at Solid (there's just so much public react code). Also, see the recent "AI IS STIFLING TECH ADOPTION" post

    https://vale.rocks/posts/ai-is-stifling-tech-adoption

    https://news.ycombinator.com/item?id=43047792

  • pmarreck2m

    If you tell these AI assistants to use TDD and to only add 1 feature at a time, perhaps unsurprisingly, they code better! (Just like humans!) But then you have to keep reminding them or they'll forget. Just like humans...

  • SamCritch2m

    I asked it to summarise my repo. It did a pretty good job. Then I asked it to see if it could summarise a specific function, but it said my $5 was up. Now I need to find whoever's in charge of our company account and spend half an hour busking in front of their desk to get some more credit.

    Until then I'm sticking to a combination of Amazon Q Developer (with the @workspace tag in Intellij), ChatGPT and Gemini online.

  • BeetleB2m

    For everyone complaining about the cost: Try Aider instead. You can easily limit the context and keep costs down.

    A coworker tried both and found Code to be 4x the cost because it doesn't easily let you limit the context.

    • tysonworks2m

      That's my experience as well. Claude Code/Cline — these tools are just a way to burn money. With Aider, my spending is minimal and I get exactly what I need.

  • gaia2m

    Agreed it is a step above the rest, but I find that it still needs some oversight you'd hope it doesn't. Simple stuff like rewriting stuff differently which yields the same result (to the naked eye, no need to even test it), to having to call it out when it misses things (update other scripts with the same problem, reflect the changes in the README or requirements.txt) and to ask it to try harder to solve an issue in a better way.

    Sometimes it takes the easy way out. If you look up your billing, you will see sometimes it uses the 3.5 instead of the 3.7 API, maybe this has something to do with it. It apologies ("you are correct", "yes that was hasty") but I'd rather have it try harder every time, not only when called out (and foot the bill that goes with that ofc).

    But overall the experience is great. I hope to use it for a few months at least until something else comes along. At the current pace, I switch subscriptions/API/tool at least once a month.

  • 9999000009992m

    It’s ok.

    It’s expensive and is correct for the easy 90%. It messes up the hard 10%.

    I guess with it rapidly improving, I should wait 3 months. What’s frustrating is when you spend 20$ in credits on it writing non functional code.

    That said, every programmer needs to at least demo these tools

    • owenthejumper2m

      Feels like you shouldn't be using $20 of credits to produce non functional code.

      I use Aider chat with Claude and my sessions are much smaller. You need to ask it much smaller tasks and work incrementally.

      • 9999000009992m

        If you already have a moderately complex code base it starts making mistakes.

  • macrolime2m

    Can Claude Code also be a devops agent or is it only for coding?

    I currently use Cursor as a devops agent, I use the remote ssh extension to ssh into a VM, then Cursor will set up everything, I make snapshots on way in case it fucks up. It's been really great to quickly be able to setup and try out different infrastructures and backends in no time at all. It works well enough that I now do all my development using remote dev with ssh or remote containers on the a server. Having a virtualized dev environment is a great addition to just having git for the code.

  • dcre2m

    I think he’s wrong about it looking antiquated. It is one of the most beautifully done bits of TUI design I’ve seen.

  • vander_elst2m

    Are there any videos showing these very advanced use cases? I'd be interested in learning how to achieve this level of proficiency. At the moment I still feel I'm better off without ai

    • MarkMarine2m

      Claude code doesn’t need magic prompts. It’s not perfect but holy moly is it good, when it’s working it just one shots things for you. It’s just EXPENSIVE. I’ve spent 30$ in tokens on it this week.

      Cursor’s “chat with your codebase” is a funny joke compared to Claude Code. Ask it questions, have it figure things out.

      I had it analyze the openAPI schema and the backend that is serving the schema for the API I’m writing and write end to end tests for the API. Then I did my normal meetings and it was done with huge chunks of it, it had run the code locally and tested against the actual endpoints to understand if the end to end tests were working or it had found a bug. Then it fixed the bugs it found in the backend codebase. My prompt: “write me end to end tests against my openAPI schema”

      That was it. 30$ in tokens later, pressing enter a bunch of times to approve its use of curl, sed, etc…

      • vander_elst2m

        Thanks, but to me this feels too high level I'd really need to see a video of such things to better understand what's going on, how much time it took end to end, what was the quality. Yes I could spend 3 days and 300 bucks playing around with the tool, but I'd prefer to learn things offline first.

      • baal80spam2m

        > Claude code doesn’t need magic prompts. It’s not perfect but holy moly is it good, when it’s working it just one shots things for you. It’s just EXPENSIVE. I’ve spent 30$ in tokens on it this week.

        As long as it costs less than a developer, it will be used instead of the said developer.

      • BeetleB2m

        Use Aider to keep the costs low. You can explicitly tell it what files to use.

      • ido2m

        A junior developer cost about €300 per workday (fully loaded & location dependent) and often achieves far less than that per day.

      • roflyear2m

        What are some examples of the bugs it fixed?

  • cybertheory2m

    If anyone is having trouble with claude code output, my team and I are releasing an MCP server that provides all the latest and greatest technical knowledge in one place for AI to access. We are at 5k waitlists already, go ahead and sign up for updates! https://jetski.ai

  • relaxing2m

    Does anyone know a good video of someone demonstrating their workflow with these coding assistants?

    • emporas2m

      I have written a whole project [1] for making amateur videoclips [2] of A.I. generated music, using GPT and other LLMs. 10.000 to 12.000 lines of code was written exclusively by AIs.

      I didn't know at the start what should be done, and the code ended up having lots of duplication, but i refactored it a lot and now it is in the order of 4.000 lines.

      I could make some screencasts about the development process, but it is very simple. I ask it to write some code, and I always provide some relevant context. When it is a method of a struct, i give it the struct, and/or a similar method, and i describe what i want the method to do. Also sometimes I give it the type signature of the method/function which has to write. When it has to return an error, i provide the error Enum.

      In other words, by providing the context, i never use it zero shot, rather always aim for few shot answers. When i want another function to use, i just provide the type signature, and almost never the whole function.

      One more detail, I give as minimal of a context as possible. I use as a context window, 100 tokens, 200 or maximum 300 tokens. 100 tokens per query, then delete the context for the next task, and provide new context. Never use more than few hundred tokens per query. Even 300 tokens is pushing it too far.

      That's about it! Never use LLMs zero shot, always few shot, and never use more than 100 tokens per query.

      [1] https://github.com/pramatias/fxp_videoclipper/tree/main [2] https://www.youtube.com/watch?v=RmmoMPu091Y

    • pchristensen2m

      This is the OP walking another developer through the process - https://m.youtube.com/watch?v=jpzv-_YQf6k

      He had more like it - search for “yegge chat oriented programming”

  • ptsd_dalmatian2m

    for those who don't want to go to twitter: https://xcancel.com/Steve_Yegge/status/1898674257808515242

    • turnsout2m

      Yeah, the author needs to get off X/Twitter. At this point posting to X is like driving around in a Cybertruck with a bumper sticker reading "I Approve of the Current Situation"

  • catigula2m

    I've burnt about $100 in Claude code so far.

    It's very cool but highly limited in many ways.

    It feels like this same statement has applied to LLMs since their popularity explosion.

  • dabinat2m

    I used GitHub Copilot and Udemy to teach myself Rust. Copilot was especially helpful at resolving obtuse compiler error messages.

    But as I’ve improved at Rust I have noticed I am using Copilot a lot less. For me now it has mainly become a tool for code completion. It’s not helping me solve problems now, it’s really just about saving time. I have estimated (unscientifically) that it probably improves my productivity 2-4x.

  • thyrsus2m

    Do these AIs know how to do test driven development? Can you tell them the code generated must pass these test? Can AIs assist in developing tests?

  • jwr2m

    After reading this, I tried Claude Code (in a docker container, as one does, you wouldn't want to use npm without protection after all).

    I gave it a huge Clojure codebase and told it to implement proration in my subscription system. Which currently doesn't have proration of any kind. Very poorly specified task, and yet what I got was really quite good — not working, ready-to-run code, but certainly a great starting point.

    I was impressed by how it was able to follow my coding conventions, interface with my Stripe library to (correctly) create and confirm payment intents, and most importantly, by the `calculate-proration-amount` function which was correct in spite of how difficult it was (there are custom plans, plan overrides, extra users, etc).

    It is the first time I felt an AI coding tool is genuinely useful. I still think you need to set expectations correctly: this doesn't "just do stuff", it is not a human programmer and will not produce flawless ready-to-run code in most cases. But it's a very good tool.

    • 2m
      [deleted]
  • jtwaleson2m

    I haven't tried Claude Code yet, so forgive my ignorance, but does it integrate with linters as well as Cursor does? I've seem excellent results on my Rust & Typescript codebase where it started making a change in a wrong direction, but quickly fixed it when it got linting errors. It seems that a standalone CLI tool like Claude Code would struggle with this.

  • turnsout2m

    I've tried out Claude Code a few times, and it seems to work fine, but not noticeably better than Cursor. And replacing my use of Cursor would add up to significantly more than $20/mo in Claude Code.

    So why wouldn't I continue burning Cursor's VC money? LOL

  • orange_puff2m

    https://open.substack.com/pub/orangepuff/p/first-impressions... I used Claude code to get started on a pdf reader I wanted to build. This pdf reader has a built in LLM chat and when you ask a question about the pdf you’re reading, the page text will be automatically prepended to the question.

    Nothing fancy or special. It was built with streamlit in about 150 lines and a single file. But I was impressed that Claude code 1 shot it

  • 2m
    [deleted]
  • dhumph2m

    I find cline to be incredible about 80% of the time for creating new concept websites or python scripts. I use it with Open router and choose Claude exclusively. I’d CC is the next step we are headed in a crazy and scary direction.

  • skerit2m

    I've successfully used it to fix a few issues here and there, but it also manages to make some pretty stupid mistakes. A few times it even started rewriting tests in a way so that the wrong outcome would be seen as a pass.

    • epolanski2m

      Kinda reminds me of how when it finds issues with typescript it hacks the types rather than refine the values or business logic.

  • eddyg2m

    See also: A quarter of startups in YC’s current cohort have codebases that are almost entirely AI-generated

    https://techcrunch.com/2025/03/06/a-quarter-of-startups-in-y...

  • jnsaff22m

    I gave Claude Code a java codebase of an open source database and it burned through $3 to tell me exactly why and how it's restore and database loading is 10-20 times slower than it should be.

  • mritchie7122m

    it's fun for things you're ok with throwing away.

    For example, I wanted a dbt[0] like tool, but written in rust, specifically focused on duckdb. Claude Code knocked it out[1] it without much guidance.

    Also added support for all duckdb output options (e.g. write to a partitioned parquet instead of a table).

    0 - SQL transformation tool (https://github.com/dbt-labs/dbt-core)

    1 - https://github.com/definite-app/crabwalk

  • kthxb2m

    As a junior, this scares me. I don't think I'll be out of a job soon, but certainly the job market will change drastically and the times where SWEs are paid like doctors and lawyers will end?

    • jtwaleson2m

      Make sure you learn a lot. Ask the LLMs to explain anything you don't deeply understand. With all of these coding assistants, there will be many juniors that get a lot done, but don't really understand what they are doing, and their worth will drop quickly.

      So far LLMs are great at producing code, but not at architecture or maintenance.

      • timeon2m

        Learning just by asking is not enough. One needs to exercise to build those muscles.

        I'm just restating the obvious because LLMs can do exercise for you, but there is not much to be gained if one follows this path.

      • sureglymop2m

        Who's to say that their worth will really drop. We already live in a world where security and performance are largely forgotten aspects when it comes to software.

        Unfortunately I fear we will just enter an era of general slop where people get away with creating without really understanding. It sucks for anyone who is actually passionate and curious and really investing the time.

    • throw2342342342m

      Across most of the world SWE's aren't paid like this anyway. It really is only the US that has this level of pay for the SWE staff unless you are into a different domain specialty (e.g. trading/finance) in which case you aren't just paid for coding knowledge.

      Coding is the real market for LLM's - it is the "killer app". I'm starting to think that a possibility is that for most jobs it ends up being a "what was all the hype about" but for SWE's it will be carnage globally w.r.t jobs. They aren't generally intelligent (lots of hallucinations in other domains), but with RL can be trained on the mountains of open source software out there to displace software.

      Software developers automating their own jobs away. The promise of "free software" with open source was realised - just not as people envisioned it.

      No other profession would do this or at least not as fast. As a honest answer, as much as I want to be wrong, I don't recommend new comers join the industry unless they really want to do it and are happy taking the risk. The uncertainty is high right now, and the anxiety these tools are giving quite a number of people I talk to is very high. Capitalism doesn't reward value or how much is built - it allocates resources and rewards people who produce into scarce markets. AI makes the product of code significantly less scarce.

  • bv_dev2m

    I have been using the Claude Code for about 48 hours now and nearly done with one full MVP. It did both my FastAPI side, building all the models guiding me through the Postgres part as well as a well polished React frontend. Burnt about $25 in 2 days [I have a lot of Anthropic credits] and created about 10k lines of good quality code all the way to a deployment. It is scary how much it can do to the point I'm not sure how long I will be needed.

  • tipsytoad2m

    I usually am a huge fan of “copilot” tools (I use cursor, etc) and Claude has always been my go to.

    But Sonnet 3.7 actually seems dangerous to me, it seems it’s been RL’d _way_ too hard into producing code that won’t crash — to the point where it will go completely against the instructions to sneak in workarounds (e.g. returning random data when a function fails!). Claude Code just makes this even worse by giving very little oversight when it makes these “errors”

    • curiouser32m

      this is a huge issue for me as well. It just kind of obfuscates errors and masks the original intent, rather than diagnosing and fixing the issue. 3.5 seemed to be more clear about what it's doing and when things broke at least it didn't seem to be trying to hide anything.

  • mikeocool2m

    Counter point: I just spent $1 and 25 minutes to try and have Claude Code figure out why a relatively simple test was failing. It repeatedly told me incorrect things about the basic functioning of the code, and ended with "Try adding debug logging in your code to see what's happening in that specific check, or look at the exact error message from the failing test."

    In other words: "do your job, developer!"

  • adamgroom2m

    I find AI code completion can be annoying and misleading at times. I do find that I spend less time typing though, it’s great at guessing the simple stuff.

  • theusus2m

    Hilarious claim without any proof. Sure I believe you

  • jonwinstanley2m

    Do any of the current AI systems allow you to use voice?

    I’d love to sometimes chat to an agent and dictate what I want to happen.

    For one project there’s a lot of boilerplate and I imagine AI could be really fast for tasks like: “create a new controller called x”, “make a new migration to add a new field the to users table called x” etc

  • andrewstuart2m

    I had to give up my attempt to use Claude code when it didn’t let me specify my API key and password, instead requiring me to sign into a user account first which then forced creation of an API account ?

    Something like that. Anyhow Claude needs to allow me to put in my own API keys.

  • tifik2m

    The second paragraph is clearly sarcastic, but the rest seems genuine, so Im a bit confused.

    • throwaway3141552m

      The second paragraph isn't sarcastic. At least not w.r.t. to Claude Code. That bit there about North Korean hackers is mild sarcasm, but has no bearing on the remainder of the post.

    • jofzar2m

      I was confused untill I watched a video of it in use, nope it wasn't that sarcastic.

      https://youtu.be/W13MloZg03Y

      • cpldcpu2m

        That guy leads in with stating that he is missing "autocomplete" in claude code. Cleary a misunderstanding of the scope.

  • rw22m

    I tried this vs Cline/Aider/Lovable.

    For full stack, I would say Lovable is the best, for complexity, I think Cline is the best.

    Claude Code with the several request I gave it just produced code that didn't run. I think it has a long way to go.

    • rishikeshs2m

      +1 Cline is the best. I've tried it with openrouter!

  • winrid2m

    So far I like using LLMs to create nasty scripts to do large migrations. It can churn out regexes that I would never write, but it's more predictable than just having the LLM make a bunch of changes directly.

  • sixQuarks2m

    All I see here is coping by developers. You all are so blind to what’s coming.

    Let’s not forget, if you relied on hacker news to give you accurate takes on things, you would’ve never bought bitcoin at one cent.

    • someothherguyy2m

      What is coming? AGI that will destroy any notion of property rights or freedom that you ever held? Or something better?

      • sixQuarks2m

        If past human behavior is any indication, way worse.

  • CSMastermind2m

    I really wish they'd improve the Windows experience. Even with WSL2 set up I had to completely remove node and npm from my system and reinstall them only in Ubuntu to make it work.

  • emalafeew2m

    Somebody here said programming is about designing and reusing libraries rather than coding from scratch like LLMs. But that choice has always been a tradeoff between abstraction and bloat vs performance and debuggability. Writing 50 lines of intentional code is often preferable to adapting a 50000 line library. The trick to using LLMs is to not ask for too much. Claude 3.5 could reliably code tasks that would have taken 5-10 min by hand. Claude 3.7 is noticeably better and can handle 10-30 min tasks without error. If you push those limits you will lose instead of save time.

  • thatsallfolkss2m

    I think that was the shortest Yegge rant I have ever read. I was expecting 2000 words or more. Is the endless rant dead as an art form?

  • egorfine2m

    Sidenote: thank you for using twitter dot com instead of x. This is a little detail and I'm sure I'm not the only one appreciating it.

    • Philpax2m

      For me, Twitter as we knew it has been long dead, and X is the shambling, corrupt corpse that's taken its place.

      I no longer mind referring to it as X, because that clearly outlines that it's a different website with a much more rancid vibe.

      • egorfine2m

        That's another take. Never thought of it that way.

    • hleszek2m

      We should use xcancel instead.

      • tom_2m

        There is also nitter.poast.org. There sites are possibly better than twitter.com or x.com, as more of the thread is made visible to non-users.

  • ant6n2m

    I don't get to code much these days, so I mostly use chatGPT to get occasional helps with scripts and whatnot. I've tried to get help from ChatGPT on a simple javascript/website project that is basically a css/js/html file, but I feel like I don't know how to give the chatbot the code except just pasting it in the prompt. Is Claude better for that, like does it have some IDE or something to play with? Or do people generally use some third party tool for integrations.

  • dailykoder2m

    AI Policy for Application *

    While we encourage people to use AI systems during their role to help them work faster and more effectively, please do not use AI assistants during the application process. We want to understand your personal interest in Anthropic without mediation through an AI system, and we also want to evaluate your non-AI-assisted communication skills. Please indicate 'Yes' if you have read and agree.

  • chilldsgn2m

    It makes working with a complex Angular enterprise application less painful. I hate Angular with a passion.

  • ElijahLynn2m

    Can someone post a non-twitter, archive link? I have it blocked on my computer and phone.

  • fiatjaf2m

    It's sad that I can't read the contents of that URL without an account.

  • RamblingCTO2m

    Has anybody compared claude code with clint/roo code yet?

  • motorest2m

    Does anyone know how Mistral fares against Claude Code?

  • jamil72m

    How does it compare to Aider with the same model?

    • CGamesPlay2m

      I just posted as a top-level, but the biggest difference I see is that Claude Code does better at identifying which files to change than Aider does, and it's a lot more expensive.

      https://news.ycombinator.com/item?id=43315371

      • jamil72m

        Thanks, super helpful. I tried it out and had similar observations, in my case, I also work on a Swift codebase but with a few internal dependencies that are in separate repos, Aider can handle this by pointing it to files in other repos with the read-only command. I wasn't able to get that working with Claude code. I also found the "magic" dependency resolution part of Claude code to be a little wasteful/poor, at least in my codebase in which it chewed through time and tokens searching for the correct files.

  • adamtaylor_132m

    Is it just me or is Anthropic years ahead of the competition? I’ve used several other AI tools and not a single one feels remotely close to Claude.

    And notice how few benchmarks will include Sonnet 3.5/7 in their new model announcements, because they’re awful compared to Sonnet.

    What’s going on here?

  • calrain2m

    Please stop using X

    If the bar lets Nazi's in, it's a Nazi bar.

  • martypitt2m

    Further follow-up from Steve (OP), where he says it just gets better:

    > Claude Code keeps doing stuff. It keeps solving massive problems, one after another. I throw larger and larger things at it, and it is unfazed. Chomp. Chomp. Chomp.

    https://x.com/Steve_Yegge/status/1898993080931611112

  • KeplerBoy2m

    The comment "@grok can you summarize" kills me. This post is like 200 words and takes a minute to read. Is this the direction we (or some of us) are headed?

    • ramblerman2m

      lol - I thought the same, but the charitable take after looking at the user's profile is that he is a journalist, and not tech savvy.

      So I understand his request as more along the lines of, can you explain this in a way that I understand it. For which summarization is the wrong phrasing.

      ie.. It seems trivial on hacker news, but that post would be pure giberish for most of our parents.

    • joshmlewis2m

      It's become very commonplace for there to be a dozen replies on viral posts where users all @grok to ask if the post is real or for more context. It's almost more work to compose a reply asking Grok about it than it is to just click the Grok button and it give you more context that way. I don't get it.

    • hhh2m

      I see this a lot, I think most of these people would have just scrolled on otherwise. I don’t get it.

  • curtisszmania2m

    [dead]

  • Thing11Uniq2m

    [dead]

  • Sterling9x2m

    [dead]

  • darepublic2m

    TLDR; Claude code is the bomb yo. Anthropoic are the only ones who know wtf they are doing!

    Please like and follow the smiling non threatening avatar of the author

  • bflesch2m

    [flagged]

    • vlod2m

      That's quite a wide brush you're painting with.

      I think it would be useful if you provide a reason for 'why'.

      Quite a few people know how to use mutes, lists etc to get the best from it.

      If all you click on is rage-bait articles, hate-driven politics or people promoting their Only Fans pages with not subtle imagery, then the Algorithm will keep feeding you more of it.

    • bentobean2m

      First it was Facebook, now it’s Twitter / X. What you’re essentially saying is - “I can’t take a platform seriously where many / most of the people disagree with me.”

    • spiderfarmer2m

      I have the same thought. I also want all politicians to stop using it. It might have been useful in the past but I’m now convinced it’s largely useless. A large part of the population actively shuns the website and the remainder is mostly people who love rage bait.

  • juzipar2m

    Remember, LLMs have seen all the code in the Internet. And there is also a lot of bad code out there...