недеља, 22. август 2021.

Floating-point implementation in GCC for my FPGA computer

This is a followup of my original post.

My FPGA computer supports floating-point instructions. GCC port doesn't. If we try to make the following code:

float d = 0.5;
float e = 0.2;
float f;

f = d - e;

The compiler will generate the following assembly code:

# small.c:10:   float d = 0.5;
    mov.w   r01056964608  # tmp43,
    st.w    [r13 + (-12)], r0   # d, tmp43
# small.c:11:   float e = 0.2;
    mov.w   r01045220557  # tmp44,
    st.w    [r13 + (-16)], r0   # e, tmp44
# small.c:14:   f = d - e;
    ld.w    r0, [r13 + (-16)]   # tmp45, e
    st.w    [sp + (4)], r0  #, tmp45
    ld.w    r0, [r13 + (-12)]   # tmp46, d
    st.w    [sp], r0    #, tmp46
    call    __subsf3        #
    mov.w   r1r0  # tmp47,
    mov.w   r0r1  # tmp48, tmp47
    st.w    [r13 + (-20)], r0   # f, tmp48

We can see that when the C code has floating-point subtraction, the generated assembly code places two operands on the stack and then calls the __subsf3 function. What is __subsf3? It is a software implementation of the floating point operations. You can find the list of all soft-float functions here:

https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html

You can see that there are not only the arithmetic functions. There are also conversion functions, comparison functions, and other functions. For example, if we write this in C:

printf("%f\n"f);

the generated assembly code would be:

ld.w   r0, [r13 + (-20)]   # tmp51, f
st.w    [sp], r0    #, tmp51
call    __extendsfdf2       #
...

The __extendsfdf2 function extends float to double, in order to pass it as an argument to the printf function. 

Next, if we multiply float by the int like this:

int x = 5;
f = d * x

we will get this:

ld.w   r0, [r13 + (-8)]    # tmp61, x
st.w    [sp], r0    #, tmp61
call    __floatsisf     #
mov.w   r1r0  # _3,
# small.c:20:   f = d * x; // 0.5 * 5 -> 2.5
mov.w   r0r1  # tmp62, _3
st.w    [sp + (4)], r0  #, tmp62
ld.w    r0, [r13 + (-12)]   # tmp63, d
st.w    [sp], r0    #, tmp63
call    __mulsf3        #
mov.w   r1r0  # tmp64,
mov.w   r0r1  # tmp65, tmp64
st.w    [r13 + (-20)], r0   # f, tmp65

The __floatsisf function converts int to float in order to multiply x (the integer value) with the d (float value). The result of the conversion is stored as a temporary variable tmp62 and then floating-point multiplication of d and tmp62 is done and the result is stored in the f variable.

Writing my own soft-float implementation

Now the obvious step was to write the implementation of all the functions that compiler calls. Arithmetic functions were by far the most simple ones. For example, here is the _subsf3 function:

float __subsf3(float afloat b
{
    asm(
        "ld.w r0, [r13 + (8)]\n \
push r1\n \
ld.w r1, [r13 + (12)]\n \
fsub r0, r1\n \
pop r1\n"
    );
}
 
As you can see, all I had to do was to get the arguments and then to call the fsub r0, r1 instruction which does floating point subtraction.

However, the conversion and comparison functions were not that simple. Conversion from float to int goes like this:

int __fixsfsi(float a)
{
    union { fp_t frep_t u; } fb;
    fb.f = a;

    int e = ((fb.u & 0x7F800000) >> 23) - 127;
    if (e < 0)
        return 0;

    rep_t r = (fb.u & 0x007FFFFF) | 0x00800000;
    if (e > 23)
        r <<= (e - 23);
    else
        r >>= (23 - e);
    if (fb.u & 0x80000000)
        return -r;
    return r;
}

And, in the other direction (from int to float):

fp_t __floatsisf(int a)
{
    const int aWidth = sizeof a * 8;

    // Handle zero as a special case to protect clz
    if (a == 0)
        return fromRep(0);

    // All other cases begin by extracting the sign and absolute value of a
    rep_t sign = 0;
    if (a < 0) {
        sign = signBit;
        a = -a;
    }

    // Exponent of (fp_t)a is the width of abs(a).
    const int exponent = (aWidth - 1) - __clzsi2(a);
    rep_t result;

    // Shift a into the significand field, rounding if it is a right-shift
    if (exponent <= significandBits) {
        const int shift = significandBits - exponent;
        result = (rep_t)a << shift ^ implicitBit;
    } else {
        const int shift = exponent - significandBits;
        result = (rep_t)a >> shift ^ implicitBit;
        rep_t round = (rep_t)a << (typeWidth - shift);
        if (round > signBitresult++;
        if (round == signBitresult += result & 1;
    }

    // Insert the exponent
    result += (rep_t)(exponent + exponentBias) << significandBits;
    // Insert the sign bit and return
    return fromRep(result | sign);

Conclusion

As you can see, my modified GCC compiler does not generate native assembly code for floating-point operations. Instead, it generates soft-float function calls which I had to write on my own (or to borrow from the Internet).