Rust took out cloudflare

dmitry-kazakov · November 21, 2025, 5:37pm

This juggling words makes no sense. Recovery is possible = unbreakable.

A system fails when recovery is not possible = not foreseen.

Talking trivialities adds nothing. Either the programmer is an idiot who does not know that reading file may fail, or he anticipates this legal program state by handling it in any reasonable way the language and program logic offer.

Mechanical systems are governed by stochastic laws of physics and chemistry. There is nothing stochastic in software system faults. You either have a faulty state reachable through legal inputs or not. End of story. Hardware faults is a different case.

Yes, it is called fault tolerant software architecture.

Ada lacks only exception contracts.

P.S. Things like finally and last chance handlers simply introduce new bugs. When the program is in an undefined state no recovery is possible. Restarting tasks and partitions in presence of persistent data might turn a very bad idea. The first engineering rule is - know what are you doing!

OneWingedShark · November 21, 2025, 7:09pm

There’s two big problems with status-codes:

First, since they’re typically realized as an System.Integer, each place they’re handled is a sea of overwhelming non-valid values (e.g. “this function returns 0 on success, and -1 on failure” leave something like (2**32)-2 values) — this could be slightly mitigated with enumerations, assuming (a) case-coverage over the enumeration with, (b) non-valid throwing something like a Unexpected_Value, Program_Error, or some such exception.
Second, the handling of a status-code almost always forces you to handle exceptional cases “up front” (at the point they’re encountered) rather than allowing something like exception-handling structures. (i.e. this is [at least one reason] why Linux kernel code uses status-codes [+goto] rather than exceptions: the status-code forces it.)

OneWingedShark · November 21, 2025, 7:14pm

Oh, where would I find those talks?
There was a time a while back, where I wanted to write an Erlang (including a BEAM replacement) in Ada, but I couldn’t find the specs for anything but a VERY old version of Erlang and BEAM seemed to fall into the old C programmer’s “the source is the documentation” rather than anything high level.

pmnw · November 21, 2025, 7:25pm

Blaming the programming language for software failure is an old story.
Clearly, it wasn’t Rust that brought Cloudflare down…
Nothing can prevent all failure, although careful engineering can mitigate it (I wager that’s what was missing in this case).

There’s a big collection of those to be found on YouTube [1]
Armstrong’s thesis is a valuable document (although it’s basically erlang marketing) [2]

[1] For example: https://www.youtube.com/playlist?list=PLvL2NEhYV4ZsIjT55t-kxylCU0BRlQjpl
[2] https://erlang.org/download/armstrong_thesis_2003.pdf

dmitry-kazakov · November 21, 2025, 7:45pm

No, it was programmers using Rust and other software developing tools. It is a legitimate question how much a given tool helps.

Granted, facing madness of modern software developing processes and general incompetence on each and every level I doubt that even SPARK Ada would help much.

P.S. I willingly admit having Schadenfreude when stones start flying into the Rust’s garden!

sbenitezb · November 22, 2025, 9:01am

I was under the impression that handling of exceptions is forbidden in SPARK mode and why I don’t use exceptions in SPARK compliant code. Has that changed lately?

sbenitezb · November 22, 2025, 9:07am

Supervisor concept and watchdogs are what all reliable software need to implement to survive hangs and crashes. The OS provides that typically, but with less granularity than in process implementation and less control.

sbenitezb · November 22, 2025, 9:16am

Since each process in Erlang is isolated in memory, the a failure in one does not affect directly the others, except lack of communication or interaction with that process. If you build the hierarchy correctly, then your system as a whole is more reliable because supervisors will terminate misbehaving processes and finally themselves if all children have failed too many times.

This is very powerful but of course you need to design for it and for every possible interaction where one process is not responding correctly or failing. I suggest reading on OTP to understand it.

dmitry-kazakov · November 22, 2025, 9:38am

I never saw a bug when a task corrupted memory of another task. Not even in C++.

Yep, “except” lack of whole functionality. Tasks are assuming to do something that manipulates shared state of the system. Otherwise they are useless. Corruption of the shared state (= data, expectations e.g. state transitions, missing deadlines etc) is what constitutes a bug.

Decomposing programs into asynchronous talking tasks is of many orders of magnitude more complex and error prone than synchronous decomposition into procedural calls. We do that only when forced to.

kevlar700 · November 22, 2025, 11:50am

OneWingedShark:

There’s two big problems with status-codes:

First, since they’re typically realized as an System.Integer, each place they’re handled is a sea of overwhelming non-valid values (e.g. “this function returns 0 on success, and -1 on failure” leave something like (2**32)-2 values) — this could be slightly mitigated with enumerations, assuming (a) case-coverage over the enumeration with, (b) non-valid throwing something like a Unexpected_Value, Program_Error, or some such exception.

Second, the handling of a status-code almost always forces you to handle exceptional cases “up front” (at the point they’re encountered) rather than allowing something like exception-handling structures. (i.e. this is [at least one reason] why Linux kernel code uses status-codes [+goto] rather than exceptions: the status-code forces it.)

1./ True but not true for the example of what I use in Ada. Which I use as exception propagation support would need to be written for my targets and exceptions were only prove they couldn’t happen in SPARK until quite recently which is hard to consider as being possible in lower hardware layers.

2./ Yes always handling can be a pain and a tiny performance hit but it can just be pass it up to the next layer but visibly and optionally logging the path but an exception does this automatically (on all targets?) which is more reliable. There is also an issue of allocating and translating status enums from different packages and predicates. The benefit is you can see exactly what status and exceptions each package and procedure/function provides which is akin to exception visibility that atleast some users are hoping exceptions can provide in the next Ada standard.

My concern is that I’m not sure it’s practical to write exception runtime support for every target like clock issues cause problems for tasking. The late Simon Wright (bless him) said it got pretty hairy when he worked on it so I’m reluctant to sink time into that. Apparently the Dos Ada version shown in the meetup recently had a form of tasking that didn’t require microchip target dependent clock speed code but I expect it’s tasking wasn’t very useful but maybe it was. I think clock independent alternatives to protected objects might be under consideration but not sure really.

dmitry-kazakov · November 22, 2025, 12:23pm

True, but a more general rule says - never fix hardware issues with software.

Maybe such targets are simply not worth the efforts? Surely not in long run.

OneWingedShark · November 23, 2025, 11:52pm

One of the great things about Ada is what it puts onto the runtime, Task is an excellent example: it [in-general] allows you to write programs w/o platform-dependency. Take this example. DOS on single-core CPUs, all that would be needed is to [re]implement the runtime with multicore awareness and you could simply relink its executables and instantly get use of all the CPU cores. — I’m actually rather surprised that there aren’t full runtimes for many of the now microcontrollers, given that very compiler and how it ran on DOS.