A Less Trivial Trie Library

Hello. I’ve recently finished a new trie library:
http://verisimilitudes.net/2024-04-30

Unlike the predecessor library, I use Controlled types to ensure node storage is reclaimed, and protected types to ensure safe access. Some of this was a little tricky, but I managed.

Now that I have this piece, I can make progress on work that needs a trie as a subordinate data structure. I intend to release a Bencoding library soon enough, as an example.

Interesting: trying to open the body file in Emacs causes Emacs to pause seemingly indefinitely, with the following message in the echo area, until i C-g.

waiting for wisi parser Ada start in buffer less_trivial_trie.adb

After the C-g, i can switch to that buffer without issues.

What version of ada-mode are you using? I don’t reproduce that with ada-mode v8.1.0 compiled with GNAT v12. How did you compile the Ada executables? If you are using Linux, try with these binaries or building yourself in the same way as it is done in its GitHub action.

This is the setup i got working in this thread: ada-mode 8.1.0, installed via package-install in Emacs 29.3, ./build.sh using alr 2.0 to build all dependencies, including GNAT.

$ alr toolchain
CRATE       VERSION STATUS  NOTES                
gprbuild    22.0.1  Default      
gnat_native 13.2.1  Default

i’ll try to check out the AppImages when i can.

This is interesting to me. I also use ada-mode, and yet have none of these problems. I just checked, and I’m using ada-mode version four.

Interesting - the earliest version of ada-mode available here is 5.1.9, released in August 2016. i imagine there have been quite a few changes since then! But i’d also be interested to know what platform you’re on, and what version of Emacs you’re running.

(Emacs is basically a platform-within-a-platform for me - i not only use it for programming, but also for writing documentation and documents more generally; as my email client (mu4e); as my IRC client (ERC); to manage my Web bookmarks (Ebuku); to browse Geminispace (elpher); and so on. So i keep it pretty up-to-date, regularly updating not only Emacs itself, but also the ~180 packages i’ve installed via Emacs’ built-in package manager.)

i just tried the AppImage. i moved the existing files:

  • ada_annex_p_lr1_re2c_parse_table.txt
  • ada_mode_wisi_lalr_parse
  • ada_mode_wisi_lr1_parse

out of ~/.local/bin/, then did an --appimage-extract and mv squashfs-root/usr/bin/* ~/.local/bin.

Opening less_trivial_trie.adb now opens succesfully, but with an error in the echo area:

File local-variables error: (error parser process died)

i’m not sure i can easily replicate the process of the GitHub action: the version of dev-lang/gnat-gpl in the main Gentoo repo is 10, not 12, and the version of dev-ada/gprbuild is 23, not 18.

I’m starting to realize that getting the executables from the AppImage is not going to be portable. I thought the binaries did only link with the C library, but they are linking also against libgnat and libgnarl for version 12. This is probably the problem you’re having. You can confirm executing ldd ~/.local/bin/ada_mode_wisi_lalr_parse or running the binary directly.

You can get the needed libraries from the AppImage and make sure they’re found by the processes (e.g. through LD_LIBRARY_PATH):

squashfs-root/usr/lib/libgnarl-12.so
squashfs-root/usr/lib/libgnat-12.so

But there should be a better way to run the executables as proper AppImages. I’ll think something.

It’s probably irrelevant, but with macOS 14.4.1/Emacs 29.3/ada-mode 8.0.5, built with Alire 2.0.1, GCC 13.3 candidate/GCC 14.1 candidate I don’t see the executables linking with the gnat/gnarl shareable images.

On investigating, it looks as though gnatbind assumes -static by default.

I tried building in this way (adding -- -bargs -static to alr build):

That works, but later, alr install is called to generate the AppImage, with the undesired consequence of a second build of the executables, this time with default arguments. Is that a problem from Alire or from ada-mode?

Yes, ada_mode_wisi_lalr_parse is looking for libgnat-12.so, but not (apparently) for libgnarl-12.so:

$ lddtree ada_mode_wisi_lalr_parse 
ada_mode_wisi_lalr_parse => ./ada_mode_wisi_lalr_parse (interpreter => /lib64/ld-linux-x86-64.so.2)
    libgnat-12.so => not found
    libm.so.6 => /usr/lib64/libm.so.6
        ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2
    libgcc_s.so.1 => /usr/lib/gcc/x86_64-pc-linux-gnu/13/libgcc_s.so.1
    libc.so.6 => /usr/lib64/libc.so.6

And indeed, copying libgnat-12.so to /usr/lib64/ and running ldconfig resolves the issue, thanks!

On a broader topic: Has anyone done any work on creating a tree-sitter parser for Ada? That would do an end-run around a lot of the complexity and issues here. Nowadays Emacs can be built with the tree-sitter library, and there’s an increasing number of tree-sitter-based modes being created, which as of Emacs 30 will be sub-modes of their non-ts-based modes, which will be an additional parent. For example, Emacs 30 will include such modes for HTML, Elixir and Lua.

Of course, it doesn’t help that there are three different package management ecosystems involved here: Gentoo’s, which is of course source-based; Emacs’ system; and Alire. Not needing to build or obtain specific binaries for parsing would at least remove the need for Alire or AppImages to be involved.

Is that a problem from Alire or from ada-mode?

Sorry, i don’t follow / understand what you’re asking. Could you please restate, or expand on, this? E.g. the build.sh that comes with the ada-mode package i installed via Emacs’ package manager has this:

if type alr; then
    # alr can be installed from https://alire.ada.dev/
    echo "building ada-mode executables via Alire"

    # alr get --build builds dependencies with release, but top with development.
    alr get emacs_ada_mode~8.1.0
    cd emacs_ada_mode_*; alr build --release

which i presume you’re already aware of?

Yes, there is one mode based on tree-sitter, see this: Re: Still problems with ada-mode-8.1.0

I haven’t tried it, and I don’t know if you have to uninstall ada-mode to use ada-ts-mode. If you try it, please share your thoughts.

Sorry, I was too telegraphic. I wonder whether the rebuilding caused by alr install after being built on alr build is an issue of ada-mode or Alire. In general, this doesn’t happen for a crate, gprbuild is called again, but nothing is rebuilt (it says gprbuild: "crate" up to date). Thinking more about it, if gprbuild is called again, it should be called with the same options as a previous alr build command, otherwise it should support also the same build options.

Yes, there is one mode based on tree-sitter, see this: Re: Still problems with ada-mode-8.1.0

I haven’t tried it, and I don’t know if you have to uninstall ada-mode to use ada-ts-mode. If you try it, please share your thoughts.

Ah! Thank you for bringing this to my attention. i’ve just tried it; more on that below. But to add some of that post into this discussion, the author of ada-ts-mode wrote:

I had the same issues previously mentioned where I wanted the ada-mode package to be installed and work immediately, not have to then build from source as a follow-on activity. I often hesitated updating ada-mode because I didn’t want to rebuild it. I had thought this
would have gotten better with the Alire integration, but by that time I had moved on. However, I’ve still seen a number of complaints on the mailing list even with Alire integration. The issue of building from source becomes compounded when you are running on multiple Operating Systems.

My main goal with ada-ts-mode was to remove the friction I felt in ada-mode. In addition to installation issues, I didn’t like the opinionated defaults and some of them were more involved to disable than others. I’ve mentioned some of these defaults that I didn’t like
in past posts. I also had other problems such as ada-mode’s trouble handling files with mixed LF/CRLF end-of-line markers. Working on legacy code bases where this occurs is troublesome with ada-mode.

So, what i did just now was install ada-ts-mode via Emacs’ package manager, without uninstalling ada-mode. It installed without issues. i then opened less_trivial_trie.adb - in doing so, i was informed that there was no tree-sitter grammar installed, and would i like to install one from <url>? i answered yes, it installed without issues, and the file was syntax-highlighted. :slight_smile: A very quick and painless process.

That said, indentation is very obviously not yet working (as of the 20240426.256 version), as the mode’s author wrote:

I believe the only thing missing from ada-ts-mode (at least for me) is indentation support (which I have a significant portion of this implemented, just not publicly available yet as I wanted to have a complete implementation before releasing it). If desired, you can use
the Ada Language Server indentation support.

i haven’t tried using the ALS yet. i’ll try to test it out and report back.

Looking at the source of ada-ts-mode.el, it doesn’t depend on ada-mode. It does, however, basically make itself the provider of ada-mode.

All in all, i’d say ada-ts-mode feels very promising indeed - here’s the repo README. But of course, i say that as someone new to Ada who isn’t currently actively using it for any projects; i’d be interested to hear the opinions of experienced Ada devs who use Emacs.

I wonder whether the rebuilding caused by alr install after being built on alr build is an issue of ada-mode or Alire. In general, this doesn’t happen for a crate, gprbuild is called again, but nothing is rebuilt (it says gprbuild: "crate" up to date). Thinking more about it, if gprbuild is called again, it should be called with the same options as a previous alr build command, otherwise it should support also the same build options.

Hmm, well, grepping for instances of gprbuild in the ada-mode directory makes me realise that the mode and its build process are very complex; i suspect it would take me quite some time to try to get my head around how all the pieces fit together. One thing of note though: the use of Alire by build.sh results in a second copy of ada-mode being created inside the original ada-mode directory, and that might be related to the rebuilding you mention.

Usually container types in Ada are not protected, because concurrent access to the same object is considered as an error. For instance Ada.Containers.Vectors or Maps. You can’t create such an object and use it in two tasks without some sort of protections. So, if you don’t have some special use case for your data structure, then you shouldn’t have protected type inside to be inline with other Ada container types.

While I appreciate the advice, I don’t care how Ada 2012 does things. I want multiple tasks to be able to safely use this generic data structure, and designed it with this in mind.

i haven’t tried using the ALS yet. i’ll try to test it out and report back.

With assistance from the ada-ts-mode dev, i’m now using ALS for indentation support. Here’s the relevant discussion in the ada-ts-mode repo.

2 Likes

Say, I’d like some advice. So, I may want to use this trie library to map strings to strings, and I figured I could simply use an access to String type to achieve this, but I was thinking about it today and realized I can’t.

The problem is complex. No linked structure can be used properly with this library, and use of a Controlled type won’t work either. Any access type storage will be reclaimed, but the storage which it accesses will be left alone. A Controlled type won’t work because of assignment, and that’s also just not a pleasant solution. I looked at Unbounded_String, including its implementation in GNAT, but it has the same issue.

Still, it seems unreasonable for each node to store an access type instead, and to both allocate and deallocate for every assignment.

I don’t see a good solution to this problem that doesn’t require a slightly different trie library, which is annoying.

Controlled types support assignment through the Adjust procedure. Unbounded_Strings make copies of the data when you do assignment. If you don’t want copies but want like shared references, then there are some reference counted types out there you could use instead if interested. Then assignment would only make shared references and reclaim memory only once all references disappear.

1 Like

Yes. The issue is Adjust can’t deallocate anything, because the access value is overwritten by the time it’s called.

I looked at GNAT’s Unbounded_String implementation, and it looks to me like memory leaks aren’t hard to make, which would happen through blind assignment.