I’ve written a lot of code in Python to read and manipulate DataFrames as part of the pandas library, but I’m not sure what the best way would be to work with CSV or table data in Ada? I’ve started to create an array of structures, where each entry in the array contains the table header information (e.g. name, age, height, account number). To clarify, this table header information I’m referring to is also known as a row of values that labels the columns of a CSV file.
It’ll depend on exactly what you want to do with it, but in the past, I have used the Index
function in Ada.Strings.Fixed
to parse out a line looking for commas and splitting out the parts in between. I start by declaring a vector to hold the split parts:
package Vectors is new Ada.Containers.Indefinite_Vectors
(Index_Type => Positive,
Element_Type => String);
subtype String_List is Vectors.Vector;
Then I iterate through the line read from the file and split the string based on commas:
function Split(Line : String) return String_List is
Result : String_List;
First : Positive := Line'First; -- Set to the start of the string
Last : Natural;
begin
loop
-- Look for the next comma in the string.
-- If a comma is found, store the index in Last.
-- If no comma is found, Last has a value of 0.
-- The value First is updated for each iteration
Last := Ada.Strings.Fixed.Index(Line(First..Line'Last), ",");
exit when Last = 0; -- Leave if no comma is found
Result.Append(Line(First .. Last - 1)); -- Append the item
First := Last + 1; -- set to index after comma for the next search
end loop;
Result.Append(Line(First .. Line'Last)); -- Append the last item
return Result;
end Split;
After that you have a list of all the cell strings from the line. You can start converting them. Some strings may be empty, so you’ll want to check their length against 0 to see if there was something between the commas. You can iterate through the string list or index it like an array:
Items : constant String_List := Split(Get_Line);
begin
-- Iterating through them
for Item of Items loop
Put_Line(Item);
end loop;
-- Or indexing directly
Put_Line("2nd item is " & Items(2));
Full example (online compiler): PltCRI - Online Ada Compiler & Debugging Tool - Ideone.com
You can
- create an enumerated type with the topics of the header row:
type Topic is (name, age, ...)
- parse the header line for mapping table columns to topics (
mapping : array (Topic) of Positive
) - if you need it, store the values in an array of
type Row_Type is array (Topic) of Real
- if you need it, store the entire table in a Vector of Row_Type.
This way is practical if you need only a part of the columns and the column order changes sometimes.
I use a CSV package (copy here) that allows for reading the items in a random order.
I typically begin processing CSV files by reading them line by line and parsing the lines with PragmARC.Line_Fields
. This also works for other common separators, such as semicolons and spaces.
You do not need any data structures to work with CSV. That the advantage of the format.
Strings editing library (a part of Simple Components) was designed for exactly that. You just read a line and then get columns one by one using the appropriate target data type advancing the line index, skipping blanks and separators.
As an example you can take a look how Unicodedata.txt is dealt with (it is CSV with semicolon as the separator). The file is read in Strings_Edit.UTF8.Categorization_Generator to generate a Unicode categorization map,
Yes, that is the way.
This is incorrect.
It will not work on values like "Smith, John"
.
In order to properly render a CSV file you need to parse it.
You have to be careful; this doesn’t work with text-fields like:
Oh say can you see,
by the dawn's early light,
Not really. See.
You are right that Ada.String,Fixed.Index should never ever be used being in fact a tokenizer = no, no.
Why? Space is a separator, comma is a blank.
The algorithm is:
loop
read line
for I in 1..N loop
skip blanks
get field I
skip blanks
if I /= 1 and I /= N then
get separator
end if;
end loop;
skip blanks
check line end
end;
Nah that’s pretty easy to work around. If you are parsing known data you just merge the strings ("Smith
and John"
with a comma between for example). If you want a more general solution you do a second pass merging all cells between the one starting with an unescaped quote and the one ending with an unescaped quote. I’ve never had any trouble with it
This is awful and a good example why not to split into fields.
"Smith\,\ John"
1,5, 3,1415 -- means 1.5 and 3.1415 in Europe
You should get a quoted string just like you would do a number. Syntax diagrams Wirth used in the Pascal User Manual is the way to describe how to parse such things, all in a single pass, no tracebacks.
PragmARC.Line_Fields
handles quoted fields properly.
For your second example, I presume you are referring to fields with embedded line terminators. This is true, but since in 50 years I have never encountered such fields, I don’t consider it a problem.

For your second example, I presume you are referring to fields with embedded line terminators. This is true, but since in 50 years I have never encountered such fields, I don’t consider it a problem.
You are correct about embedded line-terminators.
I have encountered them in CSV, and even generated them. The first programming project I was put on after graduating & getting a job was a program that processed medical/insurance records… using PHP.
This particular problem came up when I had to implement a file import/export function for CSV; I’d used the internal parse function but it wouldn’t work on the production machine. I went through everything I could think of, and nothing. So I wrote up a CSV parser, tested/debugged it, and deployed it to take the place of that PHP-function. (Turns out the version of PHP on the other machine was different, and the parse-CSV function was added between those minor versions.)
PragmARC.Line_Fields doesn’t seem to handle UTF-8 encoding. When I tried to change the arguments from “String” to “Wide_String” I got errors.
Will Ada.Strings.Fixed work with wide strings? I need to parse strings containing UTF-8 characters.
For Wide_Strings you want to use the Wide Version: Ada.Strings.Wide_Fixed
I don’t know much about how it interfaces with UTF8 though. I assume it doesn’t out of the box.
I figured it out. I can share my solution if you think that would be helpful. It was kind of a pain, but I learned a lot about Ada in the process.
I figured this out. Is there a way I can push my corrections for PragmARC.Line_Fields to github?
You can try raising a PR at Jeffrey’s github: GitHub - jrcarter/PragmARC: The PragmAda Reusable Components
If you’re trying to parse UTF-8 encoded data, then you’re on your own. Encoded data should be decoded before processing. Wide_String is not a good choice for representing UTF-8 encoded data.
Modifying Line_Fields to work with (unencoded) [Wide_]Wide_String should be trivial. It could even be generic.

If you’re trying to parse UTF-8 encoded data, then you’re on your own. Encoded data should be decoded before processing.
UTF-8 was designed specially with the goal that all parsing algorithms remain same in UTF-8 encoded format.
One can remove Wide_String and Wide_Wide_String from the language and notice no difference.