Libadalang question

Because the upcoming GNATformat doesn’t do casing, I’m working on a program ada_caser to adjust case in Ada texts.

At the moment, ada_caser will do

  • identifiers converted to title case, only changing the first character
  • casing dictionaries for exceptions.
  • keywords in lower case

I’ve recently been using Ada TS Mode in Emacs, which gets GNATpp [libraries] to do its formatting, whose default MO is to adjust case to as-declared; so you write ada.text_io, and when the document gets formatted (e.g. you select Format, or you hit ;) it’s turned into Ada.Text_IO.

Ada_Caser uses Libadalang to find identifier tokens, but to do the same as GNATpp I need to find nodes. Does anyone know how to convert from a Libadalang.Common.Token_Reference to the corresponding Libadalang.Analysis.Node?

Hi Simon, I just wanted to mention a couple things in case that happens to influence your efforts here. ada-ts-mode doesn’t have to use GNATpp as the formatter, it uses whatever engine you’ve configured for ALS. Unfortunately, in the most recently released version of the ALS, the formatting engine was unintentionally defaulted to GNATpp, but that will be fixed in the next release. Until then, you can explicitly set GNATformat as your formatter (i.e., via the ALS useGnatformat setting).

Additionally, I recently created a feature branch with casing support for ada-ts-mode (see here) which I intend to formally release very soon. This provides support for word lists and/or dictionary files in addition to formatting commands and an auto-casing minor mode. I’d love to get some more eyes on it before releasing in case there are any problems.

On a side node, I’m also getting ready to release tree-sitter based indentation support, which I know will likely not be perfect, but which I need to just get out there so people can start using it and improve it. I was planning to release this along with the casing support in the next release, along with a few other minor improvements.

Hi Troy, there are 2 reasons why I actually prefer gnatpp to gnatformat:

  • building ALS with GCC 14.2.0 (the current compiler) for aarch64 results in crashes in gnatformat
  • I really really like its casing support.

I’ll be happy to have a look at your casing feature branch. I have to say, I think it’d be hard to replicate GNATpp’s ‘adjust case to match declaration’ without using libadalang, though I lived for a long long time with dictionary files.

Tree-sitter based indentation sounds interesting!

Ah okay, my misunderstanding, thanks for the clarification.

Agreed. If you’re looking for this, then building a tool sounds like the right solution.

The indentation for gpr-ts-mode uses tree-sitter based indentation rules (ALS doesn’t support indentation for GPR files), this would just be on a larger scale. Since Ada is such a large language, I’m sure I’m missing some areas for the indentation rules, but using it will help find those areas.

As a neovim user, I would be interested in the tree-sitter based indentation as well. Is this something you would consider making a pull request for in the tree-sitter Ada grammar repository ? It might be useful for others as well. I had tried implementing that, but with little success (and little time spent)…

I’d love to have something like an “indents.scm” that could be used across editors, unfortunately the built-in Emacs tree-sitter support does not use “indents.scm files”, it uses it’s own format…which I think is a double-edged sword. On one hand, it seems like re-inventing the wheel if an indentation rules file already exists, but on the other hand it allows the Emacs modes to add in their own processing, which isn’t constrained by the limitations of the “indents.scm” format. I haven’t looked at the “indents.scm” format in-depth to know what those limitations are, since it was never available for Emacs, so maybe it’s very powerful, I’m not sure. Maybe you can share what you’ve observed.

One of the big issues I’d encountered, and which I experimented with for a long time while working on indentation for gpr-ts-mode, was handling invalid syntax. The tree-sitter based indentation rules are fine when you have valid syntax, but when there is a syntax error near the indentation point, it becomes much more problematic. The tree-sitter library will choose to create ERROR nodes, or the parents will be ERROR nodes, or a sibling will be an ERROR node, and then the syntax tree with the ERROR nodes will not look like you’d expect and your rules will no longer match. Trying to write rules including ERROR nodes can be very brittle. Furthermore, I ran into issues when the tree-sitter library was inserting missing nodes into the syntax tree as part of it’s recovery algorithm, but it wouldn’t let me access the next sibling after a node it had inserted itself. That was probably a bug in the library, but all of this just led me to give up on trying to use the syntax tree at all when there was any kind of ERROR node around the desired indentation location.

Instead, when I detect ERROR nodes in the vicinity of the indentation point (node itself, parent node, sibling node, etc.) I use a heuristic approach, looking for keywords, punctuation (e.g., comma, semi-colon), etc. in the buffer near the node to help guide the indentation. There was a lot of trial and error to get something that was robust with gpr-ts-mode, but what I have now seems to work quite well. When indentation is requested, the mode will seamlessly determine whether to use rules-based indentation or heuristic-based indentation, depending on the presence of ERROR nodes near the indentation location. Furthermore, when a line indentation is attempted and there are no ERROR nodes around it (i.e., when I can use rules-based indentation), I expand the indentation to the nearest declaration (although this can be disabled if desired), so that it will correct any indentation in the declaration caused by the heuristic approach.

At this point, I’m pretty happy with the gpr-ts-mode indentation (and am attempting to use this approach with the ada-ts-mode indentation). I don’t know how you could describe that in an editor-independent fashion that could be shared. I’d love if there were a way though, but so far I have not seen anyone do this. Very seldom do I even see indentation rules that account for ERROR nodes at all, which can cause indentation to annoyingly jump around while you’re typing, which is what I originally had, and was not an enjoyable experience at all.

I’ve decided to pause my attempt to get ada_caser to find the declaration for an identifier.

Remembering that the default is to convert identifiers to title case, a default overriding dictionary could be very simple; subwords IO and GNAT. You could probably have a default dictionary for Interfaces: char, int, unsigned etc.