2023 Day 5: If You Give A Seed A Fertilizer

@JeremyGrosser hasn’t posted on today’s puzzle yet; I hope he won’t mind if I do.

There’s always at least one puzzle with ginormous numbers, though I don’t recall their appearance in as early as Day 5!

The first speed bump I encountered is that gnat defaults (?) to 32-bit integers for Natural, so my machine choked, though nearly 12 hours later I can’t remember whether it choked on the input proper or during the tracing. Very well, then: Ada’s Long_Long_Integer it is.

Is 32-bits a fixed standard for Ada’s Natural? Since 64-bit numbers are not uncommon in these puzzles (at least 2 or 3 puzzles will go there), should I by default define a numeric type and use that? (Isn’t that arguably the Ada way anyway? I’ve always wished the puzzle master would be more specific about the data we might encounter in these puzzles.)

The second speed bump is that solving Part 2 the natural, “brute-force” way takes the computer while, and I mean a while. It is doable! It took my machine “only” about 15 minutes. (I’ve paid for those gigahertz; might as well use them, right?)

So, while I was able to solve Part 2 via brute force – not usually possible when we start to see 6-digit numbers, let alone 9-digit ones! – I decided to employ interval operations to improve the solution. I dedicated a type Interval, then traced its endpoints’ behavior through the almanac. When an interval in the almanac has an endpoint lying between an interval I have, I split the interval I have, in order to treat each entry properly. At the end, you only need the check the final intervals’ left endpoints..

My solution is here. I’d be interested to know if anyone took a different approach.

I have suspected that brute force would be a no-go given the numbers involved, and intervals would lead to something too complex and a kind of combinatorial explosion. So I have made a conjecture that turns the search into n^2 iterations where n=number of rules. The solution works but the proof of the conjecture had a hole, so perhaps I was lucky with the data, unless I find a correct proof.
Also the method implies that the maps are injective (two different source numbers always have two different destination numbers).

2 Likes

Is the big numbers thing part of part 2? I missed any mention of large numbers in the original.

What precisely is the conjecture?

I could be wrong, but I don’t see a way around the potential combinatorial explosion, but the data isn’t bad enough for that to happen. IIRC I ended up with roughly 150 intervals.

My sketch of a proof of combinatorial explosion:

  1. You have to know both the left and right endpoints of each interval, either explicitly or implicitly (via length).
  2. The worst case occurs when an old “incoming” interval splits into 3 new “outgoing” intervals. This occurs when it completely contains a mapping interval.
  3. It is not hard to construct (I might be wrong here, but it’s certainly not hard to imagine) an initial set of seed intervals I(0), and a sequence of mappings M(1), … M(k), with M(i) describing N(i,1), …, N(i,l1) intervals, when you apply M(i) to I(i-1) to obtain I(i), each interval of I(i-1) splits into 3.
  4. When the intervals of I(i-1) split, still in the worst case we end up with 3 times as many intervals.
  5. To maximize splitting, you need an equal number of incoming intervals as mapping intervals. (again, I might be wrong here, but if so then the explosion would only be worse)
  6. Through each M(i) we can say that each interval of I(i-1) corresponds to an interval of N(1,1), …, N(1,l1), so we have 3*(1,l1).
  7. For each interval to split at the next step, each of those intervals must correspond to an interval of the next mapping, so we have 33(1,l1).
  8. And so forth, leading to O(3^k) potential intervals.

The solution can come from any of those intervals’ left endpoints, so I don’t see how you evade combinatorial explosion in the worst case.

My input has the number 2_956_956_868, and gnat chokes when trying to read that in as a Natural:

raised ADA.IO_EXCEPTIONS.DATA_ERROR : a-tiinau.adb:79 instantiated at a-tiinio.adb:48 instantiated at day5.adb:23

I interpret that to mean that gnat is using 32-bit integers as its universal integer type, which kind of sucks in 2023, though I suppose there are good reasons for it.

Edited: I copied in the error message, which reminded me that I’m using Ada.Text_IO.Integer_IO to read the numbers, so if my conjecture is wrong then I guess it has something to do with how that package works. Which still sucks. :grin:

This guy claims to have brute forced Part 2 in 800ms… using CUDA

:astonished:

Yeah it looks that way. I feel like gnat had a command line option that changes it to 64. I feel like the Gnoga library had to change up the makefile to switch to 64 bit.

I was more asking is that an input from part 2? I didn’t see it in part 1, but I have reading challenges so I was making sure I didn’t miss something in the part 1 section.

2 Likes

The conjecture is “for the optimal solution, the left bound of at least one explicit interval is touched”.
But it seems now a silly idea. But it worked in my data’s case (link to solution here)… I think now that searching for such cases will give good candidates but not the solution for anybody’s data. Unless I reformulate the conjecture…

If you add the implicit intervals, the conjecture seems OK: if the path from seed s to the lowest location number, L(s), doesn’t touch any left bound, you can subtract 1 to the seed number and the location number is also 1 less (L(s-1) = L(s)-1), therefore s is not optimal.

1 Like

Sorry, what do you mean by explicit interval?

When reading up on other people’s solutions, I’ve read that some people are reversing the mapping process, starting from the known outputs and going backwards. Once that’s complete, they compare the corresponding inputs to the seed inputs, and from that determine which left seeds they have to trace, thereby – if I’ve understood this correctly – reducing from exponential order to linear :astonished:.

But I’m not sure I actually understood it, so…

ARM 3.5.4(21) says, “the range of Integer shall include the range –2**15+1 .. +2**15–1.” There are still compilers (Janus/Ada for one, IIRC) that use 16 bits for integer, so portable code should not expect a greater range.

Personally, I think it would be better for Ada to define Integer as that range, and omitted all the Long_* and Short_* optional types. Then people would have to use user-defined numeric types more often. I used

type Thing_ID is range 0 .. 2 ** 63 - 1;

Note that recent GNATs for 64-bit targets use 128 bits for Long_Long_Integer, so your brute-force version might be a little faster if you used an explicit 64-bit type. Not likely to be significant, though. However, this might be a place where tasking could be useful.

1 Like

I solved part 2 with parallel brute force. In my case, I have 10 seeds intervals and a 16-cores CPU … so yes, I created an array of tasks to parallelize the force.
It didn’t took too much time, just around 5 minutes (~22.5 min user time).

Check it here: advent-of-code/2023/05 at main · rocher/advent-of-code · GitHub

Notes:

I agree, yes!! that’s the solution. This morning I was thinking to implement it, but was not sure if the @zertovitch’s conjecture was true, necessary for this solution to work. Thus, I ended the parallel brute force (well, nice to have).

As @zertovitch pointed out:

If I understood it correctly, you mean the left bound of a location interval. And that’s right, not a conjecture anymore. With a detailed analysis I realized that the explanation is this: when you start the backwards search, starting from the lower bound of a location interval and go backwards, you can reach an implicit interval only in intermediate maps. When you reach the Seeds, it must be in an explicit interval, otherwise the path is invalid, because you can only start in explicit Seeds intervals to reach the final location.. Well, hope my explanation is clear!

I meant a broader set of intervals (to simplify, we take the “source” side of each transition step).
Explicit intervals: all ranges described in the Almanach.
Implicit intervals: the set of intervals that the explicit intervals do not cover in the transitions (seed_to_soil, soil_to_fertilizer, …).
When a path from seed to location doesn’t touch any left bound of any interval of the set defined above, that path cannot be optimal (i.e. with lowest location number) because you can shift the seed number by -1 and the location at the end will be also shifted by -1. Ergo, the optimum touches a left bound on its way. So you check each left bound at every level; from that point you go up to see if you are in a seed range; if so, you go from the same starting point down to the location and do a result := Min (result, location). In the end you will have caught the optimal solution because you will have checked all pathes touching a left bound.
Maybe there is a possibility to restrict the search further, but only the humidity_to_location explicit ranges from the Almanach are not enough; I miss the optimum on my data.
NB:

  • the walk-up assumes injectivity in the Almanach’s rules. Otherwise, multiple seed numbers may lead to the minimum location number, but the seed number found by the walk-up might be not in a seed range, so we would miss the optimum.
  • on your data, the minimum location (10834440) is one of the left destination bound of humidity-to-location map, but it is a special case. In the example, the left bound hit by the optimal path is in the light-to-temperature map; in my data it is in the soil-to-fertilizer map.

I’m (still) not convinced: what’s the problem if multiple seed numbers lead to the minimum location number? If there are several paths from a lower bound of a location interval to different seeds (walk-up), if one of these paths ends in an explicit seed interval, then that location lower bound is the minimum.

On my input data the minimum location is 10834440, but there is one location interval starting at zero. Zero is not the minimum precisely because (I’m pretty sure) the walk-up path ends in an implicit seed interval.

Mathematically speaking, the lack of injectivity is not an issue at all. But when you walk up (depending on the algorithm) you may not find all the possible seeds for a given soil, fertilizer, etc., but only one seed. For instance, my algorithm is lazy and finds only one originator per level. In that context, if you are unlucky, the seed you find is not in a seed range, but another seed you don’t find, is.

1 Like

Understood!

The algorithm using the walk-up strategy should look like this:

  1. Start with the minimum lower bound of a location interval
  2. Trace the path to a seed (walk-up)
    2.1. if the seed reached is inside an explicit interval, you’re done!
    2.2. otherwise, check if there are additional paths (diverging at some intermediate map, which suggest some kind of memoization) and evaluate according to step 2.1.
  3. If there are no more paths from that lower bound, then select the next lowest lower bound and jump to step 2.

It looks on the way to perfection :slight_smile: . But you also need to start from lower bounds at other levels.
On your data it is ok, but in the example, the lower bound hit by the optimal path is in the light-to-temperature map; in my data, it is in the soil-to-fertilizer map.