Hacker News

ryanhn•22h

What if writing tests was a joyful experience? (2023)janestreet.com

41 comments

shruubi•15h
I realize that the example is contrived, but what is the point of writing a test of a fibonacci function if your test harness is designed to just take whatever it tells you and updates the assert to verify that what it told you is indeed what it just told you.
This assumes the code you wrote is already correct and giving the correct answer, so why bother writing tests? If, however you accept that you may have got it wrong, figure out the expected outcome through some reliable means (in this case, dig out your old TI-89), get the result and write your test to assert against a known correct value.
I wouldn't trust any tests that are written this way.
- pickleRick243•14h
  Oftentimes, the main purpose of writing the tests is to prevent future regressions. This is pretty common for instance in ML (pytorch) code. You put in some tests with various random inputs and assert it against whatever your network says the output is. That way you don't accidentally change how the network works in future refactors.
- debugnik•11h
  These are great to test code that has no formal specification or oracle, so you're writing a reference implementation.
  First, the test fails because there's no expected output, and you get to check the existing behaviour pretty-printed. Then, if it's correct, you approve it by promoting the diff into the source code, and it becomes a regression test.
- warpspin•10h
  > This assumes the code you wrote is already correct and giving the correct answer, so why bother writing tests?
  It catches regressions. Which is the one thing where such semi-automated testing is most useful in my eyes.
  No clue though why they gave it that weird "expect" name. Basically, it's semi-automated regression testing.
  - g8oz•5h
    Expect is a classic Unix testing utility. So naming it expect gives it a connection to that heritage.
    https://en.wikipedia.org/wiki/Expect
- prophesi•11h
  I believe it's called test-driven development but often I write tests hoping that what I tell myself the application code will do does what I want it to do. It's also self-describing of the changes made, and what people new to the codebase should reference if they actually want to learn what's going on.
- YetAnotherNick•9h
  And that's why the example is good. As you first get fib(3) then see the result, verify it and then freeze and then do for fib(20). That's how software development works that you can spot errors easier than a test could.
lihaoyi•14h
I was inspired by the jane street post and implemented exactly this in my Scala unit testing library uTest (http://www.lihaoyi.com/post/GoldenLiteralTestinginuTest090.h...). Can confirm that auto updating golden test assertions does make working with a test suite much more joyful than struggling with each assertion by hand
- 9rx•5h
  > make working with a test suite much more joyful than struggling with each assertion by hand
  There is a time and place for golden tests, but this reads more like a case for property-based testing.
deathanatos•19h
> You start writing assert fibonacci(15) == ... and already you’re forced to think. What does fibonacci(15) equal? If you already know, terrific—but what are you meant to do if you don’t?
Um …duh? Get out a calculator. Consult a reference, etc. Otherwise compute the result, and ensure you've done that correctly, ideally as independent of the code under test as possible. A lot of even mathematical stuff has "test vectors"; e.g., the SHA algorithms.
> Here’s how you’d do it with an expect test:
```
  printf "%d" (fibonacci 15);
  [%expect {||}]
```
> The %expect block starts out blank precisely because you don’t know what to expect. You let the computer figure it out for you. In our setup, you don’t just get a build failure telling you that you want 610 instead of a blank string. You get a diff showing you the exact change you’d need to make to your file to make this test pass; and with a keybinding you can “accept” that diff. The Emacs buffer you’re in will literally be overwritten in place with the new contents:
…you're kidding me. This is "fix the current state of the function — whether correct or not — as the expected output."
Yeah… no kidding that's easier.
We gloss over errors — "some things just looked incorrect" — well, but how do you know that any differently than fib(10)?
- Storment33•18h
  It is called snapshot testing, very valid technique. Maybe not best suited to a mathematical function like they have here, but I have found it useful for stuff like compilers asserting on the AST, where it would be a pain to write out and assert on the output and may also change shape.
  - xenophonf•16h
    TIL. That looks like a nice way to add tests to legacy code without having to re-create what TDD would have had the developers started that way.
    - globular-toast•13h
      It is indeed a good way to add regression testing to code with no tests. But it's no substitute for TDD. It can't tell you why something is the way it is, nor can it distinguish between intentional and incidental (although maybe some would argue you shouldn't, given Hyrum's law and all). But it will at least guide you as you try to figure that out and stop you breaking stuff constantly.
      - Storment33•5h
        The problem is some stuff is a real pain to test with static assertions. Such as I was saying about compilers. It would be a real pain to maintain an expected AST in a unit test, then you'd have to go rework it all if you change the shape and or add/remove nodes etc.
        You can mix the approaches, have some static assertions(as sanity checks) but make most snapshot tests. Like I said I wouldn't use snapshot testing for a fibonacci method, but there are problems out there that are a real pain to test via static assertions.
- nippoo•18h
  A lot of tests are designed as regression prevention. You know the system is working as designed, but what if somebody comes along and changes the Fibonacci function to compute much more efficiently (and, in the process, makes some arithmetic errors?).
- mikrl•17h
  I think “test the function does what it does” is not necessarily the intent here, it’s being able to write tests that fill themselves in and assuming you’ll double check afterwards.
  That said, I don’t see how it’s much different to TDD (write the test to fail, write the code to pass the test) aside from automating adding the expected test output.
  So I guess it’s TDD that centres the code, not the test…
- lelandfe•15h
  > Um …duh? Get out a calculator. Consult a reference, etc. Otherwise compute the result
  Article:
  > This is a perfectly lovely test. But think: everything in those describe blocks had to be written by hand. The programmer first had to decide what properties they cared about... then also had to say explicitly what state they expected each field to be in. Then they had to type it all out.
  The article is about not getting out the calculator.
  - twic•13h
    > The programmer first had to decide what properties they cared about... then also had to say explicitly what state they expected each field to be in.
    Yes, this is the point of testing. You have to think about what you're about to write! Before you write it! The technique in the article completely discards this. It's a terrible way to write tests.
- imtringued•11h
  Yeah this is one of the weirdest takes ever.
  "If you already know, terrific—but what are you meant to do if you don’t?"
  You're supposed to look at the first gif that visualizes a waveform diagram. How are HDL designs tested? Testbenches (akin to unit tests) and model checking. With model checking you define the property you want to test and the model checker will try to find a counter example.
  Said property is so obvious for fibonacci, that it is staring right at your face and you're consciously trying to avoid looking it in the eyes. Fibonacci is defined as fib(n) = fib(n-1) + fib(n-2), so that's what you need to test. This means you can simply test fib(1) = 1, fib(2) = 1, fib(3) = 2, for a fixed set of n to cover the edge cases, then choose a fixed set of random n and make sure that fib(n) = fib(n-1) + fib(n-2) is true. Obviously the only way to be 100% sure is to use a model checker and write code that is bounded in its runtime.
BiraIgnacio•5h
It is joyful. I think of it like a proof being written and that is confirmed to be correct (most of the times) in the end.
mlmonkey•18h
Recently, I have given up on writing unit tests, instead prompting an LLM to write them for me. I just sit back and keep prompting it until it gets it right. Sometimes it goes a little haywire in our Monorepo, but I don't have to accept its changes.
It feels ... strangely empowering.
- tasuki•7h
  I do it the other way around: I write the specs and unleash an agent to turn my test suite from red to green.
  Each of us does half the work, the other half being done by a LLM. The difference is that I specify the desired behavior, while you leave the specification up to the LLM. A little strange if you ask me!
- elevation•15h
  When I build unit tests around the right routines, I feel like all is right with the world. But some employers consider this gilding the lily.
  But with LLMs in hand, I can generate entire suites of tests where they're most useful before management has the time to complain. All the little nice-to-have-but-hard-to-google environment tweaks are seconds away for the asking. It's finally cost effective to do things right.
- Cthulhu_•9h
  Same, and to avoid it going haywire I wrote an agents.md file with some prompts, like how to run a test for a single file and what to do before saying "I am done".
- krater23•9h
  i think thats the best use you can get from LLM's in programming. Doing the boring simple test code that doesn't have to meet any quality requirements. Normally on a code review I ignore unit tests, written from humans oder LLM's out of this reason.
mayoff•19h
If you’re a Swift programmer, the swift-snapshot-testing package is a great implementation of these ideas.
https://github.com/pointfreeco/swift-snapshot-testing
tomhow•18h
Discussed at the time:
What if writing tests was a joyful experience? - https://news.ycombinator.com/item?id=34350749 - Jan 2023 (122 comments)
benrutter•13h
There's some cool ideas about unit testing here, and I know I'm kind of missing the point, but am I the only one who finds unit tests and documentation sort of, soothing?
Of course I love solving the initial problem / building the feature etc, but I always find unit tests a calming easy going exercise. They are sometimes interesting to think about writing, but normally fairly simple. Either way, once you're testing, you're normally on the home straight with whatever it is you're developing.
- shafyy•12h
  Yes, same feeling here =)
TacticalCoder•18h
Amazing to see Jane Street uses Emacs. And property-based testing too.
> you don’t just get a build failure telling you that you want 610 instead of a blank string
So I had to scratch my head a bit because I was thinking: "Wait, the whole point is that you don't know whether what you're testing is correct or not, so how can you rely on that as input to your tests!?".
But even though I didn't understand everything they do yet I do see at least a big case where it makes lots of sense. And it happens to be a case where a lot of people see the benefits of test: before refactoring.
> What does fibonacci(15) equal? If you already know, terrific—but what are you meant to do if you don’t?
Yeah a common one is reuse a function in the same language which you believe is correct (you probably haven't proven it to be correct). Another typical one is you reuse a similar function from another language (once again, it's probably not been proven it is correct). But if two implementation differ, you know you have an issue.
> let d = create_marketdata_processor () in > ( Do some preprocessing to define the symbol with id=1 as "APPL" )
Typo. It's AAPL, not APPL. It's correctly used as AAPL later on.
FWIW writing tests better become a joyful experience for we're going to need a lot* of these with all our AI generated code.
- zem•18h
  > And it happens to be a case where a lot of people see the benefits of test: before refactoring.
  it's also very nice if you have a test-last working style, that is, develop the code first using some sort of ad hoc testing method, then when you're convinced it's working you add tests both as a final check that the output is what you expect across a lot of different corner cases, and to prevent regressions as you continue development.
breatheoften•19h
I really like this style of testing -- code that can be tested this way is also the most fun kind of code to work with and the most likely to behave predictably.
I love determinism and plain old data.
- Joel_Mckay•18h
  Could look at high-level constraint modelling languages:
  https://www.minizinc.org/
  It often bypasses the need to get bogged down in probabilistic markdown syntax =3
  https://www.youtube.com/watch?v=X6WHBO_Qc-Q
o_nate•22h
This is a cool idea. I wish something like this existed for C#.
- Smaug123•13h
  The thing that most surprises me is that IDEs don't have a standard protocol for this, so you basically need a custom test runner if you want one-click "this snapshot failed; update it" self-modifying tests.
  I wrote WoofWare.Expect for F#, which has an "update my snapshots on disk" mode, but you can't go straight from test failure to snapshot update without a fresh test run, even though I'm literally outputting a patience diff that an IDE could apply if it knew how.
  Worse, e.g. Rider is really bad at knowing when files have changed underneath it, so you have to manually tell it to reload the files after running the update or else you clobber them in the editor.
  - RhysU•9h
    > ...if you want one-click "this snapshot failed; update it" self-modifying tests.
    I am envisioning the PR arguments now when the first instinct of the junior developer is to clobber the prior gold standard outputs. Especially lovely when testing floating point functionality using tests with tolerances.
    Some things should be hatefully slow so one's brain has sufficient chance to subconsciously mull over "what if I am wrong?"
- legulere•7h
  But there is: https://www.meziantou.net/inline-snapshot-testing-in-dotnet....
- PretzelPirate•19h
  An Agentic coding tool like Github Copilot will do this for you.
9rx•5h
Since when has writing tests not been a joyful experience?
I do see a lot of useless tests out in the wild. I can see writing those not bringing any joy. That is true of any useless activity. Is that what we're thinking of here?
i_don_t_know•7h
mdx[1] is another variation on this, also in the Ocaml ecosystem. It’s Ocaml’s version of documentation tests as in Elixir and Rust.
But it’s not limited to that. You can write tests in markdown files independently from your documentation. Use “dune test” to run the tests and review failures with “git diff”. Accept the changes if they are correct (changed behavior) with “dune promote”. Very nice workflow.
[1] https://github.com/realworldocaml/mdx
3vidence•17h
In my experience the lack of joy or difficulty with tests is almost always that the test environment is usually different enough from the real environment that you end up needing to kind of stretch your code to fit into the test env instead of actually testing what you are interested in.
This doesn't apply to very simple functions but tests on simple functions are the least interesting/ valuable.