IEEE 32bit float representation

evanescente-ondine · December 19, 2024, 12:30am

Hi,
I try representation clauses for the first time. The exercise asks to convert a Float into a record.
But… I see that gnat’s Float use the IEEE 32 or 64 bit representation.
The little code below fails:

with System, ada.text_io, Ada.Unchecked_Conversion;
use ada.text_io;
procedure Main is
	type Byte is mod 2**8 with Size => 8 * System.Storage_Unit;
	type FractionType is mod 2**23 with Size => 23 * System.Storage_Unit;
	type Floatrecord is record
		Sign: Boolean;
		Exponent: Byte;
		Fraction: FractionType;
	end record with Pack, Storage_Size => 32*System.Storage_Unit;
	-- for Floatrecord use record
	-- 	Sign at 0 range 31..31;
	-- 	Exponent at 0 range 30..23;
	-- 	Fraction at range 22..0;
	-- end record;
	function Convert is new Ada.Unchecked_Conversion (Float, Floatrecord);
	function Convert is new Ada.Unchecked_Conversion (Floatrecord, Float);
	B: Float := 2.9;
	A: Floatrecord := Convert (B);
begin
	null;
end main;

I can’t impose the FractionType’s size, nor Floatrecord’s storage_size, and when I do without either, I am warned that 64 bit of FractionType are unused…
I looked into record representation clauses then but I’m in a pickle because as I see here the IEEE norm has the Sign bit not side by side with the Fraction part, so Exponent would be straddling on two bytes/words… what am I missing here ?

jere · December 19, 2024, 3:14am

Size is specified in bits, so the sizes should just be 8 and 23, not multiplied by Storage_Unit

Here’s a working example:

with Ada.Text_IO; use Ada.Text_IO;
with Ada.Unchecked_Conversion;

procedure jdoodle is

    type Sign_Bit is (Positive, Negative) with Size => 1;
    type Exponent_Bits is mod 2**8  with Size => 8;
    type Fraction_Bits is mod 2**23 with Size => 23;

    type Float_Record is record
        Sign     : Sign_Bit;
        Exponent : Exponent_Bits;
        Fraction : Fraction_Bits;
    end record with Size => 32;
    
    for Float_Record use record
        Fraction at 0 range  0 .. 22;
        Exponent at 0 range 23 .. 30;
        Sign     at 0 range 31 .. 31;
    end record;
    
    function To_Record is new Ada.Unchecked_Conversion(Float, Float_Record);
    
    v : Float := 0.15625;
    r : Float_Record := To_Record(v);
    
    package Fraction_IO is new Ada.Text_IO.Modular_IO(Fraction_Bits);
        
begin
    Put_Line("Sign is " & r.Sign'Image);
    Put_Line("Exponent is" & r.Exponent'Image);
    Put("Fraction is "); 
    Fraction_IO.Put
        (Item  => r.Fraction, 
         Width => 26, 
         Base  => 2); New_Line;
end jdoodle;

Output:

Sign is POSITIVE
Exponent is 124
Fraction is  2#1000000000000000000000#

Matches the output from this example: Single-precision floating-point format - Wikipedia

Hopefully that clears some things up?

That said, I don’t know the RM well enough to say if it guarantees the IEEE specification for floats or if that is just a GNAT choice. You might want to make sure whatever floating point type you are using has some specification that you are aware of.

evanescente-ondine · December 19, 2024, 8:55am

So indicating the octet is just for alignment, it’s not mandatory ? It does work… But given the values of the three fields, I see I had no good idea how floating point types actually work.

Why can’t Floatrecord32b have 32 bits like Floatrecord ?

with System, ada.text_io, Ada.Unchecked_Conversion;
use ada.text_io;
procedure Main is
  	type Byte is mod 2**8 with Size => 8;
	type FractionType is mod 2**23 with Size => 23;
  	type Byte64 is mod 2**11 with Size => 11;
	type FractionType64 is mod 2**53 with Size => 53;
	type Floatrecord (Nbits32: Sign_bit) is record
		case Nbits32 is
			when True => Exponent: Byte;
							Fraction: FractionType;
			when false => Exponent64: Byte64;
							Fraction64: FractionType64;
		end case;
	end record with Pack, Unchecked_Union;
	type Floatrecord32b is record
		Exponent: Byte;
		Fraction: FractionType;
	end record with Pack, Size => 32;
	type Floatrecord32 is new Floatrecord(True) with Pack, Size => 32;
begin
	null;
end main;

I can’t use unchecked_union because there’s no variant part, but I can’t get rid of the discriminant ? The size’s stuck to 39 bits. I wanted to have a discriminated type as a prototype with different sizes for different values of the discriminant (once a type copied from it, as, below).

dmitry-kazakov · December 19, 2024, 10:15am

Floating point endiannes may differ from the machine’s one. Especially when you get a floating point number from a file or network. You must first rearrange octets to the endianness of the machine.

As for field, not that exponent is shifted and mantissa is normalized with the most significant bit hidden.

jere · December 19, 2024, 1:04pm

It’s because you aren’t using a record representation clause like in the example I posted. When I did that it then started complaining about floatrecord32 which is derived from floatrecord

kevlar700 · December 19, 2024, 4:31pm

Size usually works especially on the types as you have done but if you want guaranteed size enforcement then you should use object_size on the record.

Nordic_Dogsledding · December 19, 2024, 4:42pm

The 'Size attribute, if not specified by the type declaration, denotes the minimal size needed to denote all values. This can be different from the size a stand-alone object occupies.
Therefore Gnat created the 'Object_Size attribute that was later taken over to the standard.
Natural'Size = Integer'Size - 1

kevlar700 · December 19, 2024, 5:07pm

It is explained in some detail here.

evanescente-ondine · December 19, 2024, 11:06pm

system.ads:1:01: error: size clause not allowed for variable length type

Ok, what I wanted to do is not possible: so you’re not allowed to parametrize Object_Size or Size depending on the discriminant.

with Pack, Unchecked_Union, 
    Object_Size => (if Floatrecord.Nbits32 then 32 else 64);

The intro is big. Isn’t it preferable to read about computer architecture and assembly before?

evanescente-ondine · December 21, 2024, 2:18pm

This intro is excellent pedagogic material.
But while I surprisingly get 95% of it all (save for the C-centric stuff which I can’t and won’t read), I have no experience of things like address arithmetic, so please enlighten me here:

with Ada.Text_IO; use Ada.Text_IO;
with System.Storage_Elements; use System.Storage_Elements;
with System.Address_To_Access_Conversions;
procedure main is
	type R is record
		X : Integer;
		Y : Integer;
	end record;
	R_Size : constant Storage_Offset := R'Object_Size / System.Storage_Unit;
	Objects : aliased array (1 .. 10) of aliased R;
	Objects_Base : constant System.Address := Objects'Address;
	Offset : Storage_Offset;
	procedure Display_R (Location : in System.Address) is
		package R_Pointers is new System.Address_To_Access_Conversions (R);
		use R_Pointers;
		Value : R renames To_Pointer (Location).all;
	begin	
		Put_line (Value.X'Image & ' ' & value.y'Image);
	end Display_R;
	begin
		Objects := ((0,0), (1,1), (2,2), (3,3), (4,4), (5,5), (6,6), (7,7), (8,8), (9,9));
		Offset := 0;
		for K in Objects'Range loop
			Display_R (Objects_Base + Offset);
			Offset := Offset + R_Size;
		end loop;
end main;

objects’component_size, R’object_size and R’size all equal to 64. I do not understand why R_size, the offset increment, is set to one byte (64/ 8). Obviously it’s correct because R_Size = R’Object_size fails, but if the purpose is to iterate through the Objects array, why does an increment of 8 work, which it should ?

kevlar700 · December 21, 2024, 3:18pm

If your code is dependent on the records size then I would always specify the records size in any case.