RegPat bug or handling error?

The following program

with Ada.Strings.Unbounded;  use Ada.Strings.Unbounded;
with Ada.Text_IO;            use Ada.Text_IO;
with GNAT.RegPat;            use GNAT.RegPat;

procedure test is
   RE: constant Pattern_Matcher := Compile("^(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})/(\d{1,2})$");

   ip4:     Unbounded_String;
   matches: Match_Array(1..5);

begin
   ip4 := To_Unbounded_String("10.41.129.37/129");

   Match(RE, To_String(ip4), matches);
   Dump(RE);
   Put_Line(matches(5).First'Image);
   Put_Line(matches(5).Last'Image);
   Put_Line(To_String(Unbounded_Slice(ip4, matches(5).First, matches(5).Last)));

   if matches(matches'Last) /= No_Match then
      Put_Line("OMG");
   end if;
end;

gives me:

Must start with (Self.First) = NUL
   1:BOL (4)
   4:OPEN1 (8)
   8:  CURLY (18) {1,3}
  15:    DIGIT (-)
  18:  CLOSE1 (22)
  22:EXACT (27) (1 chars) <.>
  27:OPEN2 (31)
  31:  CURLY (41) {1,3}
  38:    DIGIT (-)
  41:  CLOSE2 (45)
  45:EXACT (50) (1 chars) <.>
  50:OPEN3 (54)
  54:  CURLY (64) {1,3}
  61:    DIGIT (-)
  64:  CLOSE3 (68)
  68:EXACT (73) (1 chars) <.>
  73:OPEN4 (77)
  77:  CURLY (87) {1,3}
  84:    DIGIT (-)
  87:  CLOSE4 (91)
  91:EXACT (96) (1 chars) </>
  96:OPEN5 (100)
 100:  CURLY (110) {1,2}
 107:    DIGIT (-)
 110:  CLOSE5 (114)
 114:EOL (117)
 117:EOP (-)
 14
 14
1    -- The "1" from "129"
OMG

The exact same expression behaves as expected in C++, and FreePascal (etc).
Bug or my wrong?

Looks like the non-greedy version does what you expect. According to the documentation, /(\d{1,2})$ is the greedy version, but tested with /(\d{1,2}?)$ (the non-greedy version according to the docs) there is no 5th match. So possibly there is a mistake in the doc (at least in the file s-regpad.ads)

With that change the output is:

:
0
0

raised CONSTRAINT_ERROR : regpat.adb:18 range check failed

The point is that neither version must match (due to the “$” at the end)…

I think that the last component of the RE should take up to 3 digits (or even 6 if it’s a port number?).

I think matches should be Match_Array(0..5).

And, isn’t the test against No_Match the wrong way round? (and the test should be on matches(0)).

That said, I don’t see why the match of 129 vs (\d{1,2}) returns “1” & not “2”. But matches(0) is No_Match, anyway.

As @simonjwright mentioned, Match (i) when i > 0 refers to the groups in the regular expression, whilst Match (0) refers to the whole string. If Match (0) is No_Match, then the result of the groups is inconsistent, no matter if you use greedy or non-greedy expressions (I confused some terms in my previous greedy/non-greedy comment).

Try this version with and without the final 9 in the ip4 constant:

with Ada.Text_IO; use Ada.Text_IO;                                                       
with GNAT.RegPat; use GNAT.RegPat;                                                       
                                                                                                    
procedure test is
   RE: constant Pattern_Matcher := Compile("^(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})/(\d{1,2})$");

   ip4 : constant String := "10.41.129.37/129";
   matches: Match_Array(0..5);

begin
   Match(RE, ip4, matches);
   for m of matches loop
      if m = No_Match then
         Put_Line ("no match");
      else
         Put_Line (ip4 (m.First .. m.Last));
      end if;
   end loop;
end;

author of g-regpat here (though that was close to 25 years ago, amazing…)

I think the last answer is correct: you need to test whether Matches(0) /= No_Match before you look at any of the other match groups.
Though I would actually qualify this as a bug in g-regpat.adb, in that it should likely reset everything to No_Match to avoid such ambiguities, as is done in multiple other cases.

Though in practice I think they should obsolete g-regpat.adb, and replace it with a binding to libpcre2. The latter is I believe used by gcc itself, so that would not be an extra dependency, and the regexp engine is way way more advanced and efficient (while having compatible syntax, so mostly existing Ada code would not be impacted). I always meant to do that, but did not have time.

You should likely report that to AdaCore

4 Likes