How does it feel to test a compiler?

alexvitkov · on Aug 16, 2024

Compilers are one of the easiest and most fun pieces of software to test, because you can test very specific behavior without touching the internals at all.

E.g. if I want to test that `*` has higher precedence than `+`, I would write something like this:

    assert_ast_equals(parse_expr("(a*b)+c"), parse_expr("a*b+c"))
    assert_ast_equals(parse_expr("a+(b*c)"), parse_expr("a+b*c"))

You can rewrite the whole compiler if you want, but as long as you have some notion of a "parser", an "AST" and "two AST nodes being the same" this test will keep working.

This is much more powerful than going into the parser internals and comparing get_operator_precedence('+') with get_operator_precedence('*') which is the default thing you would do if you're told to test every function after writing it.

MaxBarraclough · on Aug 16, 2024

Well, low skill floor, sure, but there's a very high skill ceiling.

John Regehr and his students have done impressively deep work in finding compiler bugs.

• https://blog.regehr.org/archives/category/compilers

• https://blog.regehr.org/ (some of his compiler-related posts aren't properly tagged)

• [PDF] https://users.cs.utah.edu/~regehr/papers/pldi11-preprint.pdf Finding and Understanding Bugs in C Compilers (edit I see someone else already posted this paper)

• A (very) little discussion: https://news.ycombinator.com/item?id=7728035

matrss · on Aug 16, 2024

This is good advice not just for compilers, but for any kind of software. Test the behavior of the public interface, not of the internals. Doing the latter too much will just unnecessarily lock you into one implementation of your software, without much gain.

As examples: If you are creating a web API, only test against the exposed public routes of that API, instead of writing tests for internal helpers. If you are creating a GUI application, programmatically exercise the GUI instead of starting your tests halfway into the innards that respond to the GUI buttons.

Tests lock behavior into place, so you should only write tests for behavior that needs to be locked.

pfdietz · on Aug 16, 2024

For the most reliable software, don't choose between different testing approaches, use them all. Testing is not like choosing an architecture or programming language for your system; the more approaches the merrier. As each testing approach experiences diminishing returns, it encourages using a different approach to start near the top of its curve.

Unit testing enables hidden functionality to be tested. This can prevent future bugs when changes in the system suddenly uncloaks those functions, exposing them to higher level tests.

matrss · on Aug 16, 2024

I don't fully agree. Yes, employ as many testing approaches as possible (e2e tests, property-based testing, golden tests, whatever), but only test for behavior that you expose and want to guarantee will stay as-is. Otherwise you will get into the situation where every refactor will require a test change.

Unit testing is fine, if you do it for your public interface. If you are writing a math library then sure, unit test that `add(1, 2) == 3`. But if you just have an internal helper function for that, then think about if you really want to lock its existence and behavior into place, or if that would just hinder future architectural changes.

You can always test the exposed functionality that uses the helper and achieve full coverage of it that way. If you can't, then you have dead code.

Of course this is all a bit more nuanced. Past a certain size it might make sense to e.g. consider one modules interface to be public for the rest of your application and test it. But you can definitely overdo it and testing every single function you write (as I've seen people unironically suggest) is very likely detrimental.

randomdata · on Aug 16, 2024

While that is a good rule for the general case, there are exceptional circumstances where testing an internal helper function can help improve development velocity. One should not shy away from using what is useful. What is important to remember is that "public" tests are your documentation that remain for the lifetime of your application. "private" tests are throwaway.

A good language will provide clear boundaries such that it is obvious which is which.

senbrow · on Aug 16, 2024

Alternatively, if an internal function is important enough to need good coverage, it should be pulled out into an internal "library" that exposes the interface explicitly (even if this is just a separate file or folder with limited visibility to the rest of the codebase).

Testing internals is almost always a code organization smell IMO.

randomdata · on Aug 16, 2024

If you seek good coverage, you undoubtedly would be better off moving that against the public interface.

"private" tests are more for like when you're having trouble figuring out an edge case failure and want to narrow it down to a specific helper function to aid debugging or if you need help coming up with the right design for an internal function. As before, we're talking exceptional circumstances. Rarely would you need such a thing. But if it helps, no need to fear it.

Either way, I'm not sure you would be looking for good coverage, only the bare necessities to reach the goal. Once settled, the tests are disposable. Organizing your project into internal libraries in case you encounter a debugging problem in need of assistance, for example, is extreme overkill.

senbrow · on Aug 16, 2024

Hmm, I guess I don't really consider throwaway assertions in pursuit of debugging "tests" in the typical sense.

I certainly wouldn't advocate for code reorganization for that case, but if there is a property that is important to maintain over time that isn't easily expressed by exercising the public API, it does suggest that reorganization is probably in order.

randomdata · on Aug 16, 2024

> Hmm, I guess I don't really consider throwaway assertions in pursuit of debugging "tests" in the typical sense.

I agree it is not "tests" in the documentation sense, and I did mention that, but, regardless, you do seem to align on "throwaway". Now you have me curious, what kind of ephemeral "tests" were you imagining if not something similar to what I described in more detail later?

senbrow · on Aug 17, 2024

I suppose I was imagining tests that are created to validate an internal piece of code when it's first built or refactored, not necessarily in pursuit of a specific observed bug.

_gabe_ · on Aug 16, 2024

> But if you just have an internal helper function for that, then think about if you really want to lock its existence and behavior into place, or if that would just hinder future architectural changes.

I agree with everything you’ve said up until this point. The add function should be locked down, whether or not it’s internal code. I’ve written some physics libraries for games I’ve coded in the past, and you can bet that I wanted to lock in the functionality of that complex physics code by working some examples by hand and then codifying them into unit tests!

I think you’re right at the end, everything is nuanced. Using the right tool for the right job is easier said then done :)

pfdietz · on Aug 17, 2024

How do you propose testing defensive programming? In a good program, most of those checks are not coverable from the public interfaces.

The other problem is that finding tests that cover everything from the public interfaces can be extremely difficult. Most test suites don't achieve full coverage even of theoretically reachable points in the code.

matrss · on Aug 17, 2024

> In a good program, most of those checks are not coverable from the public interfaces.

If they aren't then you are defending against situations that can never occur. In that case the check seems unnecessary, and I at least wouldn't make it a priority to cover it. Also, those checks are usually simple enough that one can reason about them easily, again making it less of a priority to have tests cover them.

> The other problem is that finding tests that cover everything from the public interfaces can be extremely difficult. Most test suites don't achieve full coverage even of theoretically reachable points in the code.

Yes, testing is hard. That is not a reason to write more detrimental tests though.

pfdietz · on Aug 18, 2024

Yeah, let's only do defensive programming in the places where it will actually end up being needed. Brilliant idea! In the same vein, let's only write tests for the places where bugs will actually be.

This insight of yours cannot help but save on costs. Be sure to suggest it to management.

matrss · on Aug 18, 2024

> In the same vein, let's only write tests for the places where bugs will actually be.

Sounds like a good idea, actually. These places being the entire surface of your code that is exposed to the real world. So not unlike what I have been suggesting.

matrss · on Aug 18, 2024

I mean, if you really want to start every function with multiple `assert true == true` statements, feel free to do so. But I will question its usefulness.

Do you have any particular example in mind that is not superfluous and really can not be exercised by the public API? I have a hard time thinking of any.

kazinator · on Aug 16, 2024

Testing a compiler is very hard because the unit tests which validate that some code transformation is happening do not prove that the code transformation will be correct in all conceivable situations in which it occurs.

If you pin down too much in test cases, on the other hand, it will be hard to make changes to the compiler without breaking tons of tests.

I think, the best way to test a compiler is to have a large standard library or other body of code, including that compiler written in itself. Recompile the whole thing and then recompile it again with the compiled compiler and again. By the third iteration, you should have it a fixed point. That doesn't prove thing are correct, but it gives a lot of confidence, especially if the code base is large and uses a lot of the language. The second piece of confidence is that all that compiled code passes its tests.

YorickPeterse · on Aug 16, 2024

For parsing this might be the case, but writing tests becomes a lot more tricky the further down the compilation pipeline you go, as the amount of boilerplate code needed to set things up increases.

actionfromafar · on Aug 16, 2024

Just be careful to not return null from all calls to parse_expr() :)

01HNNWZ0MV43FF · on Aug 16, 2024

That's what mutation tests are for :)

JonChesterfield · on Aug 15, 2024

Jetbrains have a really solid testing team.

Lots of places have a two tier system where the real developers write the code and those who don't make the cut test the code, with pay delta and an aspiration of being promoted out of testing.

Other places have a mandatory stint in testing for new developments as a way to get some headcount on the task.

Jetbrains don't do that. Or at least they didn't sometime before covid when I met a bunch of their test devs at a conference. The developers mostly doing testing were equals to those mostly doing product work. Possibly with a more extreme bias towards case analysis.

I don't think it's a coincidence that jetbrains treat their test team as peers to the others and that their software seems to mostly not fall over in the field.

gleenn · on Aug 15, 2024

I feel like developers who don't write tests while writing their production code aren't doing a real job in 2024. Sure you might have a QA team or something, but that should be a vast minority of the tests being written. Test driven development and Agile and all that had some people upset and whiny 20 years ago, but if I see code without tests, I assume it's broken. The only way anyone knows anything works both now AND when the next 10, 100, 1000 commits roll in, your manually-tested edge cases have long been forgotten and I doubt you even got it right at all. Let's keep normalizing writing tests and thinking, up front, how to ensure our code works. That's the best time to think about all those pesky edge cases, and it's the only practical way to keep code working. Don't hoist this on to some poor chum because you are lazy or don't know how.

(Spelling)

stouset · on Aug 16, 2024

Most tests I encounter in the real world are terrible, fundamentally broken, and a waste of build times. I have almost never had a potential bug caught by a test written by others. I have been a software developer for 25 years across multiple languages and ecosystems.

Everyone jumped on the testing bandwagon but writing code that is testable is a learned skill that nobody bothers to learn. Instead we end up with overly-mocked tests that—in practice—test that “the code is written like it currently is” rather than that it actually behaves correctly.

These kinds of tests actually provide negative value. Besides taking up time during every PR and build, they fire off on any attempt to clean up or refactor. Every time you edit the code you have to edit the tests which entirely defeats the point.

The solution ends up being, unsurprisingly, lessons learned from the functional world: don’t access or manipulate external state, operate only on direct inputs and only manipulate your outputs. As a result code ends up being dumb, short, and obvious. But nobody learns to write code this way, so two hundred line spaghetti functions are the norm.

kortilla · on Aug 16, 2024

It sounds like you’re specifically talking about unit tests and people that suck at writing those.

I have seen functional and e2e tests prevent regressions far more times than I can count. The nice thing about those is that they force testing outcomes and don’t require “testable” code like unit tests do.

Don’t throw the baby out with the bathwater

SatvikBeri · on Aug 16, 2024

It's really useful to ask the simple question "what tests could have caught the most costly bugs we've had?"

At one job, our code had a lot of math, and the worst bugs were when our data pipelines ran without crashing but gave the wrong numbers, sometimes due to weird stuff like "a bug in our vendor's code caused them to send us numbers denominated in pounds instead of dollars". This is pretty hard to catch with unit tests, but we ended up applying a layer of statistical checks that ran every hour or so and raised an alert if something was anomalous, and those alerts probably saved us more money than all other tests combined.

tetha · on Aug 16, 2024

This reminds me of a fun thing I hacked up at a game shop. We were implementing some new feature involving a probability distribution and were somewhat stumped on how to make sure it worked right.

Eventually I setup something simple to exercise most of these functions, as well as most other balancing code, plot their results and send the results to the balancing team a week or so. For example, using D2 terms, it would run 1M item drops at different magic find levels and plot the resulting item level distribution, graph the implemented level -> hp/mana/... curves and such.

That little thing caught so many implementation issues at first and we'd regularly have balancing poke us because someone "optimized" some code in there.

stouset · on Aug 16, 2024

Integration tests can (and often do) suffer from the same issues since being composed of poorly-testable pieces doesn’t make things any more testable.

I’ve come to the conclusion that mocks at all are evil when done for anything except external services not under your control. And then, what you should be making is a trivial fake implementation of the other end, not just checking that specific methods were called.

RandomThoughts3 · on Aug 16, 2024

When I was doing safety critical software at the beginning of my career, we had a whole team in charge of building simulators replicating the API provided by the pieces of equipment we were integrating and their full behaviour. These simulators were themselves fully tested using both mock network traffic and recording of the real piece of equipment. This was used for integration testing and then repurposed for training.

Every testing environment I have seen since have looked like a joke but to be fair the testing budget of this place was bigger than what some companies spend on a whole product.

taeric · on Aug 16, 2024

I mean, anything can be done poorly. But consider that the team that builds the vehicles you drive is not the same as the team that tested said vehicle for safety concerns.

That is, you can have a solid testing team for pretty much anything. You have to empower them, though. Largely, a lot of what you get to empower them with is ability to put constraints onto the engineering team.

This could be crappy constraints that are far sweeping on the product. There is no need that it has to be that, though. The only constraint you have to have is stability of product interface. Which.... yeah, our industry doesn't do that so well. (Note, not stability as in "doesn't crash." Stability as in, same inputs that worked last year work today.)

stingraycharles · on Aug 16, 2024

Yeah. I work for a database company, if we didn’t have copious amounts of integration, regression and stress tests we would absolutely be having a very bad time.

And of course we still have manual exploratory QA, it’s difficult to replace that safety net.

stouset · on Aug 16, 2024

To be clear, tests when done well are incredible.

Very little in this industry is done well. Tests are one of those things.

cjfd · on Aug 16, 2024

All this speak about whether we prefer to have no tests to bad tests is pretty pointless. It is a bit like asking whether you would prefer to be beheaded or hung. The correct answer is neither. If one wants some semblance of quality one needs BOTH tests and good quality tests. Let me sum up some attributes of good tests.

(1) They do not test at a level that is too low. If you test individual classes that do not have much logic in them, it probably is pointless. You are probably testing exciting facts like 'is the container type in my favorite language still capable of storing elements'. Mostly, it is best to test the interplay between a small number of classes on such a level that the attributes are such that the customer could recognize them as something that they value.

(2) Don't test at a level that is too high. Integration tests often take a lot of effort to set up, they are brittle, and they are slow. Have a small number of them but not too many. I.e., the concept of the testing pyramid.

In this thread one can read various assertions that sound like nonsense to me.

(1) Only write functional code: nice if you can get away with it but some code has the explicit purpose to manipulate things in the real worlds or is explicitly there to maintain some state. One might also watch https://www.youtube.com/watch?v=j71n33A0CkI&t=314s . The video is right. There actually is not much difference.

(2) Don't mock. Well, if your program is big enough to contain many classes and/or functions so that it would be too big according to criteria (1) and (2) above, it becomes impractical to test all of them together so you will need mocks. What can just be passed in as function arguments can be passed in as function arguments but what comes out as behavior may need to be mocked.

(3) Create test doubles instead of mocks. Use whatever is most convenient. Mocks record series of function calls. Test doubles maintain some state and one can see how that changes. Both can be good or bad depending on the situation. It seems pointless to have a preference separate from what you are trying to do

hyperman1 · on Aug 16, 2024

I understand this very much, but I have one big difference in experience: Once your coverage reaches about 60%, tests expose crashes simply by running the code. Even if there are no assertions.

I've tamed a few untested code bases by writing huge integration test-ish unit tests, and comparing the output with pre-recorded answers. After that, you start adding real unit tests for fresh code, knowing the existing stuff is quite safe.

pfdietz · on Aug 16, 2024

> a waste of build times

There's an assumption here that tests need to be run for each build. This rules out more laborious testing that could be exposing bugs not found by time-limited tests. For compiler testing, one can set up property-based tests that run for unlimited lengths of time. These can be set up to be always running in the background. Literally billions of tests can be constructed and run.

WalterBright · on Aug 16, 2024

> As a result code ends up being dumb, short, and obvious.

I did a presentation on that last year:

https://dconf.org/2023/#walterb

moonlessday · on Aug 16, 2024

what would you recommend to read to be better at writing tests?

BobbyTables2 · on Aug 16, 2024

I fully agree. For most of my career, I hated the idea of writing (unit) tests. They were almost always a waste of time.

At one point, I ended up on a team that had virtually no QA and only did automated end-to-end tests on a complex product. After working through a lot of tech debt on the testing (took a few months), it was solid and caught many defects introduced by junior/new developers (easily missed in code review).

I now have the fortune to be in a different group with a similar mindset. I’m motivated to write tests for my own code just to prove to myself that it works properly in odd error cases too. My team prefers testing instead of “run fast and break things.” (And with multiple decades of experience, I humbly can state I cause significantly fewer bugs than in the past.) This also makes me write code in an easily testable manner. Trying to test already written code is usually a lengthy and painful exercise in frustration.

It’s also too easy to write error handlers that are NEVER actually exercised once, even manually — and surprise, they may not even work! People who never write tests do this quite often.

rtpg · on Aug 16, 2024

I'm shocked that there are places in 2024 that have a separate set of devs writing test code. What decade is this?

deredede · on Aug 16, 2024

Having one set of developers write the code and another set of developers write tests from the same specification without reading the code is probably the most effective ways to use tests for high software quality and is typically required by certification bodies e.g. for avionics.

rtpg · on Aug 16, 2024

That’s a good point for things with certain kinds of specs, but it seems a hell of a lot more straightforward for the person who is writing an implementation to also be in a testing mindset (if only to be writing testable code)

Perhaps people in these environments are made of different stuff but touching both sides feels almost essential to me.

adrianN · on Aug 16, 2024

Having a separate test team is pretty much mandatory in safety critical development. You really want people who haven’t written the code go through the specification independently and write test cases for every requirement.

JonChesterfield · on Aug 16, 2024

The (unit) tests developers write themselves check the developer did what they intended to, and then that things don't change. Whatever case analysis the developer missed in the implementation tends to be missing from their tests as well, as they didn't think of it.

Testing your own work also does nothing to protect against correctly implementing the wrong thing.

rtpg · on Aug 16, 2024

This is a fair point. I imagine in a stricter environment you could (for example) have tests before implementations even to avoid huge feedback loops

BobbyTables2 · on Aug 16, 2024

It is indeed shocking what happens when a company doesn’t treat its (offshore) QA team as third class peons relative to the developers.

I’ve seen both and have actually seen much higher product quality when the QA team is smaller (or even almost non existent).

Turns out white-box testing usually fails to catch things only SMEs would know.

spc476 · on Aug 16, 2024

At my previous job, prior to new management (we got bought out), my team worked closely with QA during the development process---they would get daily, often times, more than daily, builds of our project and so by the time it got to the staging environment, it pretty much worked [1]. New management comes in, forces a hard wall between us and QA and pretty much, we went from "favored vendor" of our customer [2], to "least favorite vendor" in less than a year. One deployment took four attempts to get right. Bugs found left and right. All because the new management didn't want to "pollute" QA's testing mindset with implementation details. Sigh.

[1] During the 11 years I was there, there were only two deployments that were reversed, prior to new management. And the bugs were found during the immediate testing after deployment and we were able to reverse quickly.

[2] The Oligarchic Cell Phone Company.

HideousKojima · on Aug 16, 2024

I frankly think my company's offshore devs and QA (largely India-based) are a net negative. The existence of a decent sized team to do QA manually has made adoption of automated testing (and writing code that it easily testable) almost impossible because no one wants to change.

kabdib · on Aug 16, 2024

I've had "quality-team-as-peer" situations only a few times in my career, and it's been great every time. Getting meaningful feedback on specs and early builds from people who knew what they were doing (and were happy doing it) was like the first time I had a real editor (the human kind) marking up my writing, just a wonderful feeling that someone had your back and that you all had the same goal: Happy customers.

yreg · on Aug 16, 2024

A class system with developers writing production code looking down on developers writing tests doesn't seem healthy to me.

BobbyTables2 · on Aug 16, 2024

It never is!

high_na_euv · on Aug 16, 2024

Ive worked in hw industry and arch team created specs, dev was developing product and its own tests

And val team was creating fake product which was used to help to write tests without product (so tests were available faster, so deploy was faster), or just tests - all of this basing on spec

It worked well when it comes to bug/issues/inconsistencies catching

tgma · on Aug 16, 2024

The article touches on basic testing strategies that apply to general software as well as compilers. I have been involved with compiler testing projects on production LLVM and GCC in my past life. One thing that makes compiler testing specifically more difficult than general software is the Oracle Problem: how do you verify the output is in fact correct? Crashes are relatively easy to find by random fuzzing, but in the general case, proving that the output program is not miscompiled is non-trivial.

There are a couple effective techniques in the literature that might be useful here:

- Differential testing[1]: generate a bunch of random, correct, deterministic programs; run them under different compilers or under different compilation flags and check if the output of the program is identical

- Equivalence Modulo Inputs[2]: a class of techniques that can be used transform a program to a distinct program that is supposed to be equivalent to the original for a specific input. (shameless plug)

[1]: https://users.cs.utah.edu/~regehr/papers/pldi11-preprint.pdf

[2]: https://web.cs.ucdavis.edu/~su/publications/emi.pdf

sojuz151 · on Aug 16, 2024

What about using multiple open source projects with already existing tests? You check if results of execution are the same between compiler versions? This should also provide you with a better coverage of various features

tgma · on Aug 16, 2024

Sure that counts as differential testing. The issue is there are limited number of them compared to the amount of code you can generate mechanically, and especially in the case of C, it is not straightforward to write a standardized script to build and run a bunch of random projects.

nj5rq · on Aug 16, 2024

I am making a simple Lisp interpreter, and this is my whole testing stage:

    $ valgrind --leak-check=full --track-origins=yes ./lisp < test/test.lisp
    $ cat test/expected.txt

What can I say, works for me.

gumby · on Aug 15, 2024

From my years on GCC I can remember that 90% of regular users' bug reports were user error, not a compiler bug.

But as with any code, compilers have bugs too and sometimes they can be quite surprising.

adgjlsfhk1 · on Aug 16, 2024

to be fair most of that is GCC/C's fault. C has tons of UB that is very difficult to reason about, and GCC chooses to do weird things by default that are at best ambiguously standard compliant (e.g. evaluating floating point expressions in higher than specified by the user). Compilers of languages with more intuitive semantics tend to get less user error bug reports.

jcranmer · on Aug 16, 2024

> that are at best ambiguously standard compliant (e.g. evaluating floating point expressions in higher than specified by the user).

That's not ambiguously standards-compliant, that is literally something that the standard has explicit mechanisms to indicate that's what the compiler is doing (look up FLT_EVAL_METHOD).

adgjlsfhk1 · on Aug 16, 2024

the way GCC does it doesn't quote follow the meaning of any of the options of FLY_EVAL_METHOD at least as I read it. it's close to 2 (i.e. all operations and constants evaluate in the range and precision of long double), but it's actual semantics are that some random subset of it's math is done in 80 bit (which isn't the size of is long double), except when that doesn't happen and it used 64 bit instead. that said, I guess you could call this standard compliant, since the standard says that negative options other than -1 are implementation defined behavior, so maybe the GCC default is to set the option to one of the values allowed by, but not documented in, the standard. https://stackoverflow.com/a/20870215/5141328 documents the insanity pretty well

kragen · on Aug 16, 2024

it may be helpful to have the context that gumby's years on gcc were something like 01987–01997, before most of the ub fuckery

gumby · on Aug 16, 2024

Honestly, mostly they were vanilla misunderstanding of the language being compiled.

Some people would rather blame the compiler than their own code.

Others were shocked that a compiler could have a bug at all. Because most code is not corner cases, this isn’t a bad default assumption, really, but bugs do happen.

lucidguppy · on Aug 16, 2024

Can you imagine being one of the five people in the universe that has confident knowledge of c++ undefined behavior?

wslh · on Aug 15, 2024

We recently posted about fuzzying a compiler with success [1]. The article contains the details. There is an error on the Zest link that should point to [2]. The key is how to craft the generator.

[1] https://www.coinfabrik.com/blog/why-the-fuzz-about-fuzzing-c...

[2] https://dl.acm.org/doi/10.1145/3293882.3330576

JonChesterfield · on Aug 16, 2024

> Rust’s implementation of LibFuzzer,

Would that be the LLVM one wrapped in some rust API? From the last month or so accidentally rabbit holed by fuzz testing I am really liking the LLVM one but have a vague sense that I should probably try AFL++ as well.

The paper suggests it was a java implementation distinct from those two. The approach of turning the bytestring into a program is good, worth looking at the IR fuzzer in the LLVM suite - this looks like a fair reference for it https://arxiv.org/abs/2402.05256

rbanffy · on Aug 16, 2024

It seems difficult to test. Being difficult to test is usually a sign of inconvenient APIs and modules. For instance, the hiding of elements happens in multiple levels - the front-end needs a test to make sure the right intermediate representation is being generated. The backend, in turn, needs to be tested to make sure the intermediate representation with a hidden element generates the proper linker information and the linker itself needs a test to make sure the correctly annotated object code results in the correct artefact lacking external information about the hidden symbol. Testing end-to-end seems very laborious and error prone. Root-cause-analysis should have pinpointed the place the issue originates and one or more possible paths where it propagates downstream to inform future tests.

It also disturbs me that the author mentioned the order at which sources are compiled to matter in the final result. It should never matter.

When we build software we should always make it in such way it’s trivial to write tests for it. If writing a test is easy, it indicates using the tool you wrote is also easy.

Neywiny · on Aug 16, 2024

I did this for a class on compilers and interpreters where we wrote our own, each week expanding on functionality but keeping the 2 in feature parity. I wrote Python to auto generate test cases. I vaguely recall that the test was if the two operated the same (given the simplicity of the interpreter I viewed it as a "gold standard"). See [1] for his it worked. Among the bugs in my code I found 2 things through that exercise:

1. Functional programmers often write slow code. It turned out that my compiler was spending most of its time in my professor's code that while I'm sure was very mathematically pure, was a large consumer of immutable, short lifetime objects. Meaning under the hood mallocs. I should've valgrinded it but I'm certain it would've overflowed the counters (jokes)

2. If a comment spanned multiple lines in the resulting assembly, I could escape the comment and operate outside the bounds the professor setup, letting me use more assembly directives to solve the problems way easier. Ultimately we worked to fix that because usually it just means the student will try to compile part of a comment as assembly and that can be very confusing for the less assembler-error inclined. I used it for having a constant before a variable for type tagging. A 1 line solution. I believe the class's preferred way was putting the tag in a register and yadda yadda something that took a lot more finagling and effort. I did that maybe once before using my knowledge of the comment escape to do the arbitrary code injection.

[1] interpreted was written in the language we were interpreting, so as long as there were no typos or logic errors, the functionally was perfect vs running the code in the programming language. The compiler would return back a series of objects that wrapped assembly. For example, Add(R2, R2, R3). Usually pretty transparent. The framework we were given would then write out the .s file, I believe it would call some gcc or other thing, and we'd run the binary to make sure it worked.

Haugsevje · on Aug 16, 2024

Somehow I came to think of 'Like a rolling stone'..Dylan I believe.

azakharenko · on Aug 16, 2024

It partially inspired me, right :)

fidotron · on Aug 16, 2024

Contrary to the Jetbrains praise in here, I despise Kotlin and the evidence is the tailrec keyword.

“tailrec” is what you mention in function definitions to indicate that function is tail recursive, but it only actually worked when you called the same function itself in the return, and not any other tailrec function. The part of this which was idiocy was this would only manifest as a StackOverflowException at run time. (I found this as my language evaluation involved implementing a state machine idiomatically). If you are going to make tailrec only work as a while loop then have the compiler alert the programmer at build time, but this got all through their design and QA. Not exactly a well thought out process.

They have probably fixed this now, but instead I went off to golang, where the features are few but when they exist they are done properly.

kragen · on Aug 16, 2024

the jvm is to blame here, not kotlin. clojure has the same problem

fidotron · on Aug 16, 2024

The point is the kotlin compiler not handling tail recursive calls of other functions. If it cannot handle it that is a build time error, not a runtime one. It shouldn’t even get to the jvm. Clojure is way more dynamic so such errors would be expected.

The whole thing is attempting to hack together something that superficially competes with c# but just doesn’t have the substance.

neonsunset · on Aug 16, 2024

Tailcalls is something that a language compiler that targets VM bytecode cannot easily address by itself and ideally needs cooperation from the VM.

I'm not aware of implementation details of either Clojure or Kotlin but if it is anything like F#, then I would really not hold tailcalls against either of them.

In order to effectively support FP languages, CIL specification defines tail. prefix for call, calli and callvirt opcodes, and CoreCLR fully supports it except scenarios outlined by the spec.

Does JVM bytecode specification have something like this?

Edit: Turns out F# compiler does additional heavy lifting besides emitting the prefix https://github.com/dotnet/fsharp/blob/main/docs/large-inputs...

kragen · on Aug 16, 2024

interesting, thanks! any idea what the situation is like on wasm?

neonsunset · on Aug 18, 2024

WASM does appear to have significant tailcall consideration: https://github.com/WebAssembly/tail-call/blob/main/proposals...

kragen · on Aug 19, 2024

the high bit here seems to be 'Currently, the Wasm design explicitly forbids tail call optimisations'?

neonsunset · on Aug 19, 2024

I'm not keeping up closely with WASM development, but doesn't it say this was the case at the time the proposal was written? It seems it goes quite extensively into the subject which is a good look for WASM (which I'm otherwise on the fence about when it comes to using it outside of browser scenarios).

kragen · on Aug 16, 2024

`tailrec` in kotlin is a modifier to the function, not to the function call site. let's stipulate that that's a sensible decision (i don't think it is, but the rest of the thread is moot otherwise). are you suggesting that a `tailrec` function shouldn't be allowed to call any function other than in tail position? i think that would make it so restrictive as to be useless for most tail-recursive loops

the clojure approach (as i understand it; i've only written a few dozen lines of clojure) is to use an explicit `recur` operator, at the call site, for a tail-recursive call. implicitly that operator invokes the same function, not a different one, because that's the best you can do on the jvm. it's not a runtime error

fidotron · on Aug 16, 2024

I am suggesting that calling another tailrec function in the return statement and failing to error out when this isn’t actually going to work is an error. (I believe they have since adopted this position).

i.e. someone familiar with tail recursion but not the specifics of kotlin would not reasonably expect any such restrictions. It is better to not have such concepts at all than half done in this way, as it destroys confidence in the rest of the language constructs.

kragen · on Aug 16, 2024

if you're programming in kotlin you're probably going to have to get pretty familiar with the jvm

fidotron · on Aug 17, 2024

Once again, it's not the JVM that's the problem, I am very familiar with the JVM. It is the Kotlin developers not caring to alert people to the idiosyncrasies of their features.

You don't get to say "it supports tail recursion" and then only do so in the most narrow sense imaginable - i.e. the case that is trivial to turn into a while loop. If they can't make actual uses work on the JVM properly then don't claim to have the feature in the first place.

kragen · on Aug 18, 2024

often the more general concept is called 'tail-call optimization' to distinguish it from the narrower sense kotlin can handle. sometimes it's even called 'tail-call elimination' because 'optimizations' aren't supposed to change semantics

i agree that 'Kotlin supports a style of functional programming known as tail recursion.' is somewhat exaggerating kotlin's capabilities here

danmur · on Aug 16, 2024

Probably better than paying money to test an IDE :P

pfdietz · on Aug 16, 2024

I was disappointed they apparently don't do high volume differential testing with randomly generated programs, a kind of property based testing. Each individual program has very little testing value, but when you can crank out millions or even billions of them they can find all sorts of bugs.

ultracakebakery · on Aug 16, 2024

nahh I don't buy it. You don't have friends OFTEN asking you how it feels to test a compiler bro stop the cap