Check for Ada code in comment

I want to check if some Ada code are in comments, as:
-- if X = 3 then
I looked for existing tools.
I considered AdaControl, but I couldn’t find any thing relevant in my version (2019 last ASIS public).
Any idea?

First, why do you want this?
Second, this is an exceptionally hard problem due to the nature (namely partial statements and expressions), consider:

if 
-- X + 1 > 3
X in Positive then

should this be picked up?


That said, in Byron I actually have a token for precisely this, ID of Comment_Code in Lexington.Aux. (Due to the intent of the design being that things in the IDE/system are oriented toward meaningful data/structures rather than text, such a node would obviously be useful.)

What do you consider “Ada code”?

One way to do such things is to write a pattern. Here is an example:

with Ada.Text_IO;
with Ada.Command_Line;

with Parsers.Multiline_Patterns;  use Parsers.Multiline_Patterns;

with Parsers.Multiline_Source.Text_IO;

procedure Ada_Comments is
   File     : aliased Ada.Text_IO.File_Type;
   Ada_Code : constant Pattern_Type :=
              Blank_Or_Empty &
              (  abs "begin"
              or abs "declare"
              or abs "case"
              or abs "if"
              or abs "then"
              or abs "else"
              or abs "end"
              or abs "procedure"
              or abs "function"
              or abs "type"
              or abs "with"
              or abs "use"
              or abs "raise"
              or abs "abort"
              );
   Pattern  : constant Pattern_Type :=
           + (  """ & Anything & """
             or "--" &
                (  Put_Line (Ada_Code & Anything & End_Of_Line) & NL_or_EOF & Success
                or NL
                )
             or Any
             or NL
             );
   use Ada.Text_IO;
begin
   Open (File, In_File, Ada.Command_Line.Argument (1));
   declare
      Source : aliased Parsers.Multiline_Source.Text_IO.Source (File'Access);
      State  : aliased Match_State (10000);
   begin
      while Match (Pattern, Source'Access, State'Access) = Matched loop
         null;
      end loop;
   end;
   Close (File);
end Ada_Comments;

The pattern matches recurrently:

""" & Anything & """ is an Ada string or its part: anything between " and ".
"--" followed by Ada_Pattern or anything up to the line end.
one character

Ada_Code defines what “Ada code” is. I simply suffed it with case-insensitive Ada keywords.

Put_Line is used to print the matched Ada_Code and whatever else up to the line end. Alternatively you can stuff it into a variable or define your own action.

It is very crude and does not recognize naked expressions, but how do you know them Ada code?

1 Like

AdaControl can check any pattern in comments (rule “comments”), so yes, you can give it a list of Ada keywords as a clue.

3 Likes

Thanks @dmitry-kazakov for the proposed solution, implemented as always so easily with the SC.

In fact the reverse, searching for true comment, might be easier for humans as it should be natural language. I’m curious to know if AI could help effectively.

I was just curious to know if such that tool would exist. I guess it would exist in industry as it is a common rule to avoid code in comment but unfortunately they don’t share or use commercial tools…

Has this been considered to add it in AdaControl as a basic functionality?

AI is a confusing term. It means roughly anything we do not know how to do it yet. Pattern matching was considered AI at some point. If you mean the current hype of NI and LLM then I think LLM has good chances if trained on a massive core of Ada sources, provided anybody would do that (unlikely). From the practical point of view direct parsing is a straightforward direct way. Mentioned here AdaControl could do that more or less out of the box.

In general, it cannot be done. How do you recognize code in comments? F.e., checking if the comment contains the word “if” would certainly give too many false positive. Checking “if .* then” would be closer, but certainly give false positives as well as false negatives (some people like to put the “then” keyword on another line. Trying to compile the code after “–” would be hopeless, since most of the time a single line does not constitute a compilable entity…

Therefore, I think giving some patterns (taylored to your own coding style) is all that can be done.

1 Like