VSS.JSON vs GNATCOLL.JSON

tiberius · April 22, 2023, 8:15pm

Hello, I’ve got a REST API client set up for a project that currently pulls data from the API via AWS as a Stream_Element_Array, parses the JSON via VSS.JSON’s pull reader, and then serialises it in a record for fast access and caches the data to disk.

I’m wondering if anyone who has used both VSS and GNATCOLL’s json packages has any opinions about which of the two are better. I end up working with up to 400MB of JSON at a time (when condensed, ~200MB), and as it is VSS currently takes about 45 seconds to parse the full JSON. Loading the cached data is quite quick and not an issue.

Looking through the spec I’m just a bit worried whether or not GNATCOLL can handle reading hundreds of megabytes from an unbounded string, and what the consequences of trying to do that would be for my memory usage.

This isn’t on an embedded system or anything, so ~1-2GB of RAM used would be okay, but it is part of a larger application that uses Gtkada and draws a lot of images to the screen, so minimising RAM usage and maximising performance would be great.

I’m also hopeful that this project, when released, will raise awareness of how nice Gtkada is compared to a number of other GUI libraries

Thanks,
“tiberius”

AJ-Ianozi · April 23, 2023, 2:26am

I wrote a benchmark for gnatcoll that you can try. It uses a ~450mb json file containing a json array, loads it all in, picks a random spot near the end of the array, and iterates that array:

Unfortunately, I don’t actually know how to use VSS, and didn’t find much documentation past the API references to use it, but you probably won’t have an issue with adding VSS into this. I have the content saved as an unbounded_string due to stack overflows, but you can probably get away with adding a Content : constant String := To_String(Contents); or do something with streams. (unless VSS can parse json from a file, then even better!)

There’s also Json-Ada which I’ve used before, it’s a pretty light library, and I included that in the benchmark against gnatcoll.

Easiest solution to run it is probably to clone the github repo, then run alr update && alr build, unzip large-file.zip to large-file.json and then run the program with the json as an argument.

Here’s what happened when I ran it on my macbook air with defaults:

aj@macbook json_benchmark % bin/json_benchmark large-file.json
Testing json-ada's JSON
json-ada read is  3.295961000 seconds
json-ada iterate is  0.009062000 seconds
Testing GNATCOLL's JSON
GNATCOLL.JSON read is  28.289018000 seconds
GNATCOLL.JSON iterate is  0.014264000 seconds

And when compiled with -O0:

aj@macbook json_benchmark % bin/json_benchmark large-file.json                                      
Testing json-ada's JSON
json-ada read is  7.085358000 seconds
json-ada iterate is  0.018148000 seconds
Testing GNATCOLL's JSON
GNATCOLL.JSON read is  22.921301000 seconds
GNATCOLL.JSON iterate is  0.010944000 seconds

If you want to compile it with -O0, you’ll want to go into alire.toml and uncomment the build switches.

Good luck!

godunko · April 23, 2023, 3:16pm

Main difference of JSON support in GNATCOLL and VSS is an abstraction level. GNATCOLL process JSON as document, while VSS as stream of events. So, it is quite hard to compare them.

GNATCOLL constructs JSON document in the memory, it takes time and space. It is not optimized to work on large documents too. It allows to access to arbitrary element of the document.

VSS doesn’t construct document in the memory, it provides only an access to the current event - some value, object’s key, start/end of object/array. Application should track events to extract necessary information.

In your case I suggest to take a looks at VSS first, it is known that GNATCOLL works slow on huge documents, and it was one of the reason to develop JSON parser for VSS.

Max · April 23, 2023, 3:27pm

In my opinion, you shouldn’t parse/“serialize” 400Mb JSON in one call under a GtkAda application, because it will freeze GUI. It would be better to refactor the code to do this in an incremental mode. AFAIK, this is doable with VSS, but not with GNATCOLL.

tiberius · April 23, 2023, 9:53pm

It’s running in a separate task, so the actual GUI responsiveness is not impacted I rewrote part of my VSS code using GNATCOLL and the results were sort of as expected - much lower latency for requesting individual objects, but a high initial startup cost due to constructing a tree for the whole JSON document.

I think given this information I’ll use GNATCOLL for small JSON documents and things with no latency requirements, but prefer VSS for reading in the largest files since I have more control over the performance and there is no initial latency.

AJ-Ianozi · June 8, 2025, 7:43am

Remember this?

I finally added VSS to the mix, and I’m kind of confused with the results:

Testing json-ada's JSON
json-ada read is  2.274622000 seconds
json-ada seek is  0.001402000 seconds
json-ada iter is  0.008116000 seconds
Testing GNATCOLL's JSON
GNATCOLL.JSON read is  5.622553000 seconds
GNATCOLL.JSON seek is  0.005264000 seconds
GNATCOLL.JSON iter is  0.004949000 seconds
Testing VSS's JSON
VSS.JSON read is  11.717198000 seconds
VSS.JSON seek is  31.041688000 seconds
VSS.JSON iter is  2.125321000 seconds

I may have just done the parsing in a terrible way, but with gnatcoll and json-ada I can simply do something like:

declare
   Random_Node : constant JSON_Array := Result.Get ("sixteen");
begin
   for X of Random_Node loop
      declare
         Random_String : constant String := X.Get ("id");
         Random_Int    : constant Long_Integer := X.Get ("actor").Get ("id");
         Another_Int : Long_Integer;
      begin
         Another_Int := Random_Int + 1;
      end;
   end loop;
end;

Is there any possible way to do that efficiently in VSS? I was following VSS/examples/blogs/json_1/input.adb at c370d7b1155a9d8d9c30e6d960acc0612ce7bb60 · AdaCore/VSS · GitHub as an example. Is there any good documentation on parsing JSON in VSS?

godunko · June 25, 2025, 10:12am

On my laptop, initial results:

Testing json-ada's JSON
json-ada read is  2.660047000 seconds
json-ada seek is  0.000004000 seconds
json-ada iter is  0.005390000 seconds
Testing GNATCOLL's JSON
GNATCOLL.JSON read is  7.613259000 seconds
GNATCOLL.JSON seek is  0.000268000 seconds
GNATCOLL.JSON iter is  0.004833000 seconds
Testing VSS's JSON
VSS.JSON read is  17.445335000 seconds
VSS.JSON seek is  43.277821000 seconds
VSS.JSON iter is  2.935902000 seconds

after changing to VSS release build:

Testing json-ada's JSON
json-ada read is  2.748486000 seconds
json-ada seek is  0.000004000 seconds
json-ada iter is  0.005203000 seconds
Testing GNATCOLL's JSON
GNATCOLL.JSON read is  7.837111000 seconds
GNATCOLL.JSON seek is  0.000256000 seconds
GNATCOLL.JSON iter is  0.004981000 seconds
Testing VSS's JSON
VSS.JSON read is  6.547536000 seconds
VSS.JSON seek is  13.593096000 seconds
VSS.JSON iter is  0.943084000 seconds

change a bit the way to prepare data stream for VSS

Testing json-ada's JSON
json-ada read is  2.615639000 seconds
json-ada seek is  0.000003000 seconds
json-ada iter is  0.005351000 seconds
Testing GNATCOLL's JSON
GNATCOLL.JSON read is  7.514052000 seconds
GNATCOLL.JSON seek is  0.000272000 seconds
GNATCOLL.JSON iter is  0.004837000 seconds
Testing VSS's JSON
VSS.JSON read is  0.164207000 seconds
VSS.JSON seek is  13.320057000 seconds
VSS.JSON iter is  0.921637000 seconds

and on current version of VSS

Testing json-ada's JSON
json-ada read is  2.673578000 seconds
json-ada seek is  0.000003000 seconds
json-ada iter is  0.005180000 seconds
Testing GNATCOLL's JSON
GNATCOLL.JSON read is  7.274794000 seconds
GNATCOLL.JSON seek is  0.000257000 seconds
GNATCOLL.JSON iter is  0.004798000 seconds
Testing VSS's JSON
VSS.JSON read is  0.158989000 seconds
VSS.JSON seek is  9.643288000 seconds
VSS.JSON iter is  0.675951000 seconds

AJ-Ianozi · June 25, 2025, 12:40pm

Thanks for the PR!

Replacing

Items : constant Stream_Element_Vector := Encoder.Encode (JSON_VSS);

with

Overlay : Ada.Streams.Stream_Element_Array (1 .. JSON_String'Length)
   with Import, Address => JSON_String'Address;
Items  : constant Stream_Element_Vector :=
   VSS.Stream_Element_Vectors.Conversions.To_Stream_Element_Vector
      (Overlay);

Indeed significantly sped the load up.

This is what I’m getting now:

Testing json-ada's JSON
json-ada read is  1.868145000 seconds
json-ada seek is  0.000051000 seconds
json-ada iter is  0.007163000 seconds
Testing GNATCOLL's JSON
GNATCOLL.JSON read is  5.373627000 seconds
GNATCOLL.JSON seek is  0.000313000 seconds
GNATCOLL.JSON iter is  0.008158000 seconds
Testing VSS's JSON
VSS.JSON read is  0.174583000 seconds
VSS.JSON seek is  31.209026000 seconds
VSS.JSON iter is  3.106500000 seconds

I think the only issue I’m fighting with now is dealing with that long seek time, but again, it could be how I implemented it?

godunko · June 25, 2025, 1:07pm

Can you please check content of VSS’s config/vss_config.gpr file? Is it replaced by Alire in your machine?

jere · June 25, 2025, 2:26pm

I may misunderstand your question, but Alire regenerates the config gpr file whenever you change between build types (release, development, etc.)

godunko · June 25, 2025, 3:12pm

It should, but it is not on my setup for unknown reason, thus VSS is build in development mode with all preconditions/postconditions/assertions enabled and optimization disabled. Rebuild of VSS in release mode makes it about two times faster.

jere · June 25, 2025, 6:45pm

Can you clarify a bit, two of your statements are in conflict. The first:

Sounds like both development and release build the same, but then the next statement

Then sounds like they do not build the same, which is the expected output for alire, since it rebuilds the config gpr for release mode in a different way than it does for development mode.

So I think I may be misunderstanding you. Sorry for the confusion.

godunko · June 25, 2025, 7:11pm

VSS provides config/vss_config.gpr file to build without Alire. Build mode is controlled by the VSS_BUILD_PROFILE scenario variable. It should be set to release.

Alire should overwrite this file setting necessary switches to build with somehow specified build mode.

On my setup, Alire doesn’t overwrite this file, thus when scenario variable is not set explicitly it fallbacks to development mode.

jere · June 26, 2025, 2:51pm

So when you do alr build --release it builds it in development mode instead? That’s really weird. You may need to put up a github issue with them. Maybe a platform specific bug.