VSS.JSON vs GNATCOLL.JSON

Hello, I’ve got a REST API client set up for a project that currently pulls data from the API via AWS as a Stream_Element_Array, parses the JSON via VSS.JSON’s pull reader, and then serialises it in a record for fast access and caches the data to disk.

I’m wondering if anyone who has used both VSS and GNATCOLL’s json packages has any opinions about which of the two are better. I end up working with up to 400MB of JSON at a time (when condensed, ~200MB), and as it is VSS currently takes about 45 seconds to parse the full JSON. Loading the cached data is quite quick and not an issue.

Looking through the spec I’m just a bit worried whether or not GNATCOLL can handle reading hundreds of megabytes from an unbounded string, and what the consequences of trying to do that would be for my memory usage.

This isn’t on an embedded system or anything, so ~1-2GB of RAM used would be okay, but it is part of a larger application that uses Gtkada and draws a lot of images to the screen, so minimising RAM usage and maximising performance would be great.

I’m also hopeful that this project, when released, will raise awareness of how nice Gtkada is compared to a number of other GUI libraries :slight_smile:

Thanks,
“tiberius”

I wrote a benchmark for gnatcoll that you can try. It uses a ~450mb json file containing a json array, loads it all in, picks a random spot near the end of the array, and iterates that array:

Unfortunately, I don’t actually know how to use VSS, and didn’t find much documentation past the API references to use it, but you probably won’t have an issue with adding VSS into this. I have the content saved as an unbounded_string due to stack overflows, but you can probably get away with adding a Content : constant String := To_String(Contents); or do something with streams. (unless VSS can parse json from a file, then even better!)

There’s also Json-Ada which I’ve used before, it’s a pretty light library, and I included that in the benchmark against gnatcoll.

Easiest solution to run it is probably to clone the github repo, then run alr update && alr build, unzip large-file.zip to large-file.json and then run the program with the json as an argument.

Here’s what happened when I ran it on my macbook air with defaults:

aj@macbook json_benchmark % bin/json_benchmark large-file.json
Testing json-ada's JSON
json-ada read is  3.295961000 seconds
json-ada iterate is  0.009062000 seconds
Testing GNATCOLL's JSON
GNATCOLL.JSON read is  28.289018000 seconds
GNATCOLL.JSON iterate is  0.014264000 seconds

And when compiled with -O0:

aj@macbook json_benchmark % bin/json_benchmark large-file.json                                      
Testing json-ada's JSON
json-ada read is  7.085358000 seconds
json-ada iterate is  0.018148000 seconds
Testing GNATCOLL's JSON
GNATCOLL.JSON read is  22.921301000 seconds
GNATCOLL.JSON iterate is  0.010944000 seconds

If you want to compile it with -O0, you’ll want to go into alire.toml and uncomment the build switches.

Good luck!

Main difference of JSON support in GNATCOLL and VSS is an abstraction level. GNATCOLL process JSON as document, while VSS as stream of events. So, it is quite hard to compare them.

GNATCOLL constructs JSON document in the memory, it takes time and space. It is not optimized to work on large documents too. It allows to access to arbitrary element of the document.

VSS doesn’t construct document in the memory, it provides only an access to the current event - some value, object’s key, start/end of object/array. Application should track events to extract necessary information.

In your case I suggest to take a looks at VSS first, it is known that GNATCOLL works slow on huge documents, and it was one of the reason to develop JSON parser for VSS.

1 Like

In my opinion, you shouldn’t parse/“serialize” 400Mb JSON in one call under a GtkAda application, because it will freeze GUI. It would be better to refactor the code to do this in an incremental mode. AFAIK, this is doable with VSS, but not with GNATCOLL.

It’s running in a separate task, so the actual GUI responsiveness is not impacted :slight_smile: I rewrote part of my VSS code using GNATCOLL and the results were sort of as expected - much lower latency for requesting individual objects, but a high initial startup cost due to constructing a tree for the whole JSON document.

I think given this information I’ll use GNATCOLL for small JSON documents and things with no latency requirements, but prefer VSS for reading in the largest files since I have more control over the performance and there is no initial latency.