Trouble interfacing with C libraries; Making thick bindings

Background

While I’ve used Ada for work projects, those projects did not interface with C code and did not use access types. So, I’m lacking experience needed for my personal projects that interface with C libraries.

When trying to learn how to interface with C code, the various tutorials seemed like they would be helpful. But once I faced real-world code, I realized the tutorials did not cover examples of what I needed to do.

During the process of creating this post, I solved the problems, that prevented compiling, that I originally intended to include as questions. But this still stands as an example some of the difficulties new Ada users face. The Ada ecosystem is not as comprehensive as, for example, Python with its “batteries included” philosophy. So, new Ada users are likely to need to interface with C libraries.

This C library function, that I needed to interface with, used the common C practice of taking a “buffer” argument to output to. I could not find any example of interfacing with this kind of function. So I was left to decipher the Ada Reference Manual’s Appendix B.3.1 “The Package Interfaces.C.Strings”. Which didn’t really provide any help with how to use it.

Although my original intent was to get help making this code compile, I am still posting it because it may help others who are facing a similar problem. I also do have other (less important) questions about the code I wrote.


Problem

For a Linux utility, I need to read the target of symbolic links (symlinks). GNAT.OS_Lib has a function Is_Symbolic_Link that I can use to verify a file is (or is not) a symbolic link, but it doesn’t have a function to read the target of the link (what it links to). I’ve tried to find an existing Ada package that has this function, but I have not found one.

So, I need to create a binding to the POSIX function. The function readlink in unistd.h has the following signature:

ssize_t  readlink(const char*  restrict  path,
                        char*  restrict  buf,
                        size_t           bufsize);

Seems like it would be straight forward to use…

Auto Generating the Thin Binding

I used g++ to generate thin bindings to the unistd.h library:

g++ -c -fdump-ada-spec -C /usr/include/unistd.h

This produced bindings to everything in that file and all its dependencies, creating the following files (in no particular order):

  • unistd_h.ads
  • stddef_h.ads
  • bits_unistd_ext_h.ads
  • bits_types_h.ads
  • bits_getopt_core_h.ads
  • bits_confname_h.ads

Most of these I won’t need, But it doesn’t hurt to leave them.

The thin binding generated for readlink is as follows:

  -- Read the contents of the symbolic link PATH into no more than
  --   LEN bytes of BUF.  The contents are not null-terminated.
  --   Returns the number of characters read, or -1 for errors.   

   function readlink
     (uu_path : Interfaces.C.Strings.chars_ptr;
      uu_buf  : Interfaces.C.Strings.chars_ptr;
      uu_len  : stddef_h.size_t) return ssize_t  -- /usr/include/unistd.h:838
   with Import        => True, 
        Convention    => C, 
        External_Name => "readlink";

Creating a Thick Binding

The trouble of creating a thick binding is the same as the trouble of using the thin binding. So, even if I was to use this thin binding directly, I would be having the same problem(s).

Concept of use:

  • A string containing the path/name of the symlink file is passed in.
  • A pre-allocated string buffer is used to pass out the target path/name.
    • This is where I was struggling.
  • An integer is used to pass in the size of the buffer.
  • An integer is returned as the size of the string put in the buffer.

An Attempt at a Thick Binding:

function Read_Link(filename: String) return String is

   path   : Interfaces.C.Strings.chars_ptr := Interfaces.C.Strings.New_String(filename);
   -- path/filename of the symlink to read

   buf    : Interfaces.C.Strings.chars_ptr := Interfaces.C.Strings.New_String("How do I allocate a large size without a very long literal?");
   -- Pre-allocated string (buffer) to receive the target path/filename

   Len    : stddef_h.size_t                := buf'size;
   -- Size (length) of the buffer; Maximum length that can be used for the target path/filename

   Used   : unistd_h.ssize_t;
   -- How many characters of "buf" were used, or an error code if negitive

begin

   -- Make the call to the system library:
   Used := unistd_h.readlink(path, buf, Len);

   declare
      Target : String(1..Integer(Used)); -- Return variable
   begin

      -- If "used" is within the range of "buf":
      if Used > 0 and then Used <= buf'size then -- can't use buf'range

         -- Copy just the valid part of "buf" into the return variable
         Target := Interfaces.C.Strings.Value(buf, Interfaces.C.size_t(Used)); 
      else

         -- The value of "Used" is out of bounds. This signals an error code was returned.
         Target := ""; -- Just eat the errors for now; TODO: raise exceptions for the errors.
      end if;

      -- "New_String" allocated memory. We are done with them.
      Interfaces.C.Strings.Free(path);
      Interfaces.C.Strings.Free(buf);

      return Target;
   end;
end Read_Link;

This compiles and seems to run successfully.


Questions:

  1. I think New_String is allocating memory and making a copy of the string. But in this use case, the string, filename, on the stack will live for the life of the call to readlink so I would like to pass that existing string. Can I do that cleanly?
  2. buf is being created using a literal string. I would prefer to just specify the bounds or size. Can I do that? And how?
  3. Bonus question: Is there anything else wrong with this code?

Thank you for taking the time to read this post and consider my questions.

I started a binding project to autogenerate them ages ago, but it stalled.

Don’t use chars_ptr for in parameters, use char_array, the compiler will generate a C string from a literal and you can pass them directly.

with Interfaces.C;

procedure Link is
   package C renames Interfaces.C;

   Buffer_Size : C.size_t := 100;

   type buffer is new C.char_array (0.. Buffer_Size);

   function readlink
     (uu_path : C.char_array;
      uu_buf  : out buffer;
      uu_len  : C.size_t) return C.ptrdiff_t
   with Import        => True,
        Convention    => C,
        External_Name => "readlink";

   buf    : buffer;
   result : C.size_t := readlink ("somefile", buf, buf'Length);
begin
   null;
end Link;

New_String will allocate memory and then copy over 1 char at a time. The To_Ada/C functions don’t allocate memory, but they still do a slow conversion. So, if you are dealing with C strings in a performant area, it will be best to not do conversions every frame, for example.

Well, chars_ptr is used when you deal with dynamically allocated C “strings” or with ones returned from C. When arrays are managed by you, you should use char_array.

Here is more or less complete implementation of thick binding to readlink. I did not test it, but you should have the idea how such things are done and why no bindings generator can help. Semantics of C functions is fundamentally ill-defined, and do not tell us fairy tales about Python! :unamused:

   function Read_Link (Path : String) return String is
      use Interfaces.C;
      use GNAT.OS_Lib;
      function Internal
               (  Path    : char_array;
                  Buf     : in out char_array;
                  BufSize : size_t
               )  return ptrdiff_t; -- A substitute for ssize_t
      pragma Import (C, Internal, "readlink");
      C_Path : constant char_array := To_C (Path);
      Size   : size_t := 256;
   begin
      loop        -- We do not know how large it might be, there should be
         declare  -- some define somewhere. PATH_MAX? Who knows...
            Buffer : char_array (1..Size);
            Length : ptrdiff_t := Internal (C_Path, Buffer, Size);
         begin
            if Length = -1 then
               raise Data_Error with Errno_Message (Errno);
            elsif Length < ptrdiff_t (Size) then -- Not truncated
               return To_Ada (Buffer);
            end if;
            Size := Size + 256; -- Try a bigger buffer
         end;
      end loop;
   end Read_Link;
1 Like

Binding generators can help IF they have the meta information for each parameter, but they don’t.

i.e. this pointer parameter, is it:

  1. const?
  2. an in or an out?
  3. an array?
  4. returning a pointer to something (** and ***)?
  5. is it filling an array?
  6. etc.

I think you can instead use the switch -fdump-ada-spec-slim to only generate bindings for the specified file (and not all the others).

I presume that the result of this is a status code, used to report errors such as

  • Path is not a valid path name
  • Path does not exist
  • Path is not a link

You’ll have to decide how you want to deal with those. What you want is

function Link_Target (Link : in String) return String;

which should probably be

function Link_Target (Link : in String) return String is
   function Readlink (Path    : in     Interfaces.C.char_array;
                      Buf     :    out Interfaces.C.char_array;
                      Bufsize : in     Interfaces.size_t)
   return Interfaces.C.size_t with
      Import, Convention => C, External_Name => "readlink";

   Bufsize : constant Interfaces.C.size_t := 1000;

   Buf    : Interfaces.C.char_array (0 .. Bufsize);
   Result : Interfaces.C.size_t;
begin -- Link_Target
   Result := Readlink (Interfaces.C.To_C (Link), Buf, Bufsize);

   -- Error processing based on Result

   return Interfaces.C.To_Ada (Buf);
end Link_Target;

ssize_t is a signed type. It is not size_t. POSIX standard in its infinite wisdom leaves to the implementation the situations ssize_t is too small for the bytes count. Anyway, the result is -1 on errors or else the number of bytes. When the result is equal to the buffer size it is a potential truncation and your implementation is incorrect.

See readlink(2) - Linux manual page

A correct substitute for ssize_t is likely ptrdiff_t. I provided an implementation that handles truncation.

1 Like

Ah, I didn’t notice the extra s on the return turn, modified mine now.

I misread that. An appropriate convention-C type should used, then.

I don’t think I’ve ever had to deal with a path that is 200 characters, so I’m not too worried about truncation with a 1001-character buffer.

Thank you everyone for you responses! This has been very helpful.

What I’m getting from you all is that (maybe over simplified here):

  1. The auto generated bindings are not the best.
  2. chars_ptr is for dynamically allocated C-Strings originating from C code.
  3. char_array is for C-Strings originating from Ada code.

:rofl:
I couldn’t help but to make the comparison because this project is taking a Python script I wrote a while back, and converting it to Ada. The python library had this function and Ada didn’t.

I originally threw this script together for fast results. It worked for the expected case, but wasn’t designed to handle anything outside the expected case. A few years later, I try to use it again and the environment changed somewhere, shifting the norm and breaking my script.

Looking at my script, years later, I find the code to be an unintelligible mess, as would be expected for the lack of effort. So, I decided it’s time to do it right. Hence Ada.

I did fix the original Python script, but it’s time to do it right.


Thanks @jere. I wasn’t too concerned about all the files that were generated, I was just trying to be complete in my documentation for future readers. But this will be handy for the next time I use the auto generate bindings feature.


C/C++ VS. Ada Rant

One of the aspects of Ada that I love is how it can create complete abstractions; where the specification is sufficient to correctly use the interface without needed to rely on the documentation or peeking behind the curtain at the implementation.

In contrast, C and its related languages are, as @dmitry-kazakov said, ill-defined. The “specification” (header) leaves out necessary information that must then be supplemented with documentation (but often is not).

Some of the things left to be identified and explained in the documentation include:

  • is a pointer for
    • a null-terminated string
    • a non-terminated string
    • an array
    • something else
  • how is the length/size specified for an array or non-terminated string
  • Is a parameter an “in”, “out”, or “in out” parameter

It seems that inevitably something is missing in the documentation and you must peek behind the curtain at the implementation to see how it works in order to know how to interface with it.

Unfortunately, the badness of C bleeds into Ada when interfacing with C. The Interfaces.C package gives the tools for the job, but how to use them relies on some knowledge not capture in the interface itself and the existing documentation is not sufficient.


Improving documentation

At some point in the near-ish future, I would like to add this info to the documentation at (Tutorial | ada-lang.io, an Ada community site), but I don’t understand this topic well enough to do it right by myself.

I could rough out a draft, if others could correct the content and suggest missing parts. I could then handle polishing up the final draft.

If we are going to grow the Ada ecosystem, we need more developers who are familiar with and comfortable working with C interfacing tasks. So, I think it is important to give this topic comprehensive documentation.

Would anyone be willing to work with me on this? :raising_hand_man: :raising_hand_woman:


1 Like

You can look at SDLAda and it’s associated libs as they are mostly written by hand, they should give you an idea of how it should be done.

1 Like

Yes, the easiest way is to do something like this:

function Read_Link(filename: String) return String is
  MAXIMUM_SIZE : Constant := 2048;
  Use Interfaces.C;
  Pragma Assert( Char'Size = Character'Size, "Characters must be the same size." );
  Path : char_array(Filename'Range)
    with Import, Convention => C, Address => Filename'Address;
  Buffer : char_array(1..MAXIMUM_SIZE);

BUT, be warned: this is likely not nul-terminated, as Ada-strings “know their length”.

See above.

I’d have to get back into one of my binding projects; I haven’t had to really touch C in ages, but it’s a pain because “simple” really means offloading onto the programmer rather than deal with it.

I think the best things to have in your mind are (1) C’s arrays will degenerate into an address at the slightest sneeze, and (2) C-Strings are terrible.

1 Like

I played with the suggestions here for a bit and learned some interesting things.

Such as To_Ada expects a Null terminated string, but the output buffer does not get Null terminated by the C function (as the code comments state). The memory space received from Linux is already all Null out. And Ada expects the memory to be initialized by the program before use; it does not automatically clear the memory space. So, the buffer string happens to be Null terminated if both:

  1. the buffer is not full
  2. the target name is longer than or equal to the length of all previous target names return by the C function.

When I called the Read_Link Ada function on files with long target names and then ones with short target names, the remains of the previous target name(s) would be returned with the new target name:

put_line("""" & Read_Link("/dev/log")    & """"); -- Should be "/run/systemd/journal/dev-log"
put_line("""" & Read_Link("/dev/fd")     & """"); -- Should be "/proc/self/fd"
put_line("""" & Read_Link("./shortlink") & """"); -- Should be "."

Output:

"/run/systemd/journal/dev-log"
"/proc/self/fdjournal/dev-log"
".proc/self/fdjournal/dev-log"

If I explicitly initialize the buffer, then the problem goes away:

Buffer : C.char_array (1..Size) := (others => C.nul);

Output:

"/run/systemd/journal/dev-log"
"/proc/self/fd"
"."

Since the C function returns the size of the string in the buffer, I think I should be able to use that to directly return the slice of the buffer that is valid. (Internally, Bounded_String is doing something similar; allocated space and a size.)

Something like this:

return String( Buffer( C.size_t(Buffer'First) .. C.size_t(Length) ) );

But that doesn’t work.


I like the feedback I’ve been getting to my questions. I hope to play with them some more this weekend. If I have the time, I’ll start compiling a pull request for the ada-lang.io/docs/learn/ repo. If I get that far, I hope I can get good feedback on that too.

Huh, the documentation is silent about not adding NUL.

The standard C (and so Ada) solution is simple:

   function Read_Link (Path : String) return String is
      use Interfaces.C;
      use GNAT.OS_Lib;
      function Internal
               (  Path    : char_array;
                  Buf     : in out char_array;
                  BufSize : size_t
               )  return ptrdiff_t; -- A substitute for ssize_t
      pragma Import (C, Internal, "readlink");
      C_Path : constant char_array := To_C (Path);
      Size   : size_t := 256;
   begin
      loop        -- We do not know how large it might be, there should be
         declare  -- some define somewhere. PATH_MAX? Who knows...
            Buffer : char_array (1..Size);
            Length : ptrdiff_t := Internal (C_Path, Buffer, Size);
         begin
            if Length = -1 then
               raise Data_Error with Errno_Message (Errno);
            elsif Length < ptrdiff_t (Size) then -- Not truncated
               Buffer (size_t (Length + 1)) := Nul; -- Add terminator
               return To_Ada (Buffer);
            end if;
            Size := Size + 256; -- Try a bigger buffer
         end;
      end loop;
   end Read_Link;

This is an unrelated question: how do I implement To_Ada?

That is quite simple. In the code I posted, replace:

Buffer (size_t (Length + 1)) := Nul;
return To_Ada (Buffer);

with

declare
   Result : String (1..Natural (Length));
begin
   for Index in Result'Range loop
      Result (Index) := Character'Val (char'Pos (Buffer (size_t (Index)))); -- Could be To_Ada (Buffer (size_t (Index)))
   end loop;
   return Result;
end;

To complete the picture. I do not recommend this, but since char_array and String representations are same, you can simply map String onto char_array and be done with that. The same place:

declare
   Result : String (1..Natural (Length));
   pragma Import (Ada, Result); -- Do not initialize
   for Result'Address use Buffer'Address; -- Map Result onto Buffer
begin
   return Result; -- We are done
end;

Just wanted to clarify that this isn’t necessarily correct, it expects either a null terminated string or a buffer that is the same size as the Item you pass in, so if you know the size of the buffer you can set that for scenarios that don’t have null terminated strings. Usually if you are dealing with non terminated strings, the C function will have a length parameter, you can use that to bound the Item you pass in to To_Ada. You’ll want the Trim_Nul parameter to be false if you are working with non nul terminated strings as well.

REF: Interfacing with C and C++

Good point. One can pass a slice to To_Ada.

FYI: I got started writing a section on bindings, but I didn’t get far enough to make a pull request.