[z88dk-dev] floats and sccz80

Bridge to the z88dk-developers mailing list
alvin
Well known member
Posts: 1872
Joined: Mon Jul 16, 2007 7:39 pm

Post by alvin »

I finished specifying the asm interface to floating point math functions that is intended to be implemented by all math libraries.

The list of functions is here:

http://z88dk.cvs.sourceforge.net/viewvc ... 48/lm/z80/

These names are defined when you link the math library you are using as in "-lm" or "-lmz". If you write asm code using those names it will work with any floating point library linked on the compile line.

Those functions are not documented in those files but the math48 implementation has some interface information in them:

http://z88dk.cvs.sourceforge.net/viewvc ... ath48/z80/

As mentioned the idea is the primary floating point accumulator is in the z80's EXX set. What information is there depends on the library -- it could be an entire floating point number (as math48 with BCDEHL = float) or it could be a pointer to memory, eg. Results are returned in the primary accumulator and FASTCALL functions pass their parameter in the primary accumulator.

The secondary accumulator can be in the main register set or on the stack (only a few functions support a second operand on the stack). Functions that take two parameters use the primary fp accumulator and either the secondary or the stack.

Then there are operations to push, pop, load and store floats and a few other stack operations.

With math48 I've found that having the primary fp accumulator in the EXX set has led to some compact fp code.

You can see some examples as in the implementation of atan2:

http://z88dk.cvs.sourceforge.net/viewvc ... iew=markup

or acosh:

http://z88dk.cvs.sourceforge.net/viewvc ... iew=markup


Let me know what you think whether anything is missing or should be done better.


The other thing I did was get rid of the float/double dichotomy in the code. Everything is now double with aliases for float functions pointing at the double implementations. Since both compilers only support one float type, this is fine for the time being but the library is ready to support independent float/double if that would ever make sense on a z80 target. That's doubtful but maybe it might make sense on a z80 clone with lots more memory and cpu grunt.


Still remaining is connecting floats to scanf and getting sdcc plugged in.



------------------------------------------------------------------------------
alvin
Well known member
Posts: 1872
Joined: Mon Jul 16, 2007 7:39 pm

Post by alvin »

I found a very curious result. I tried using stefano's 3d plot example from the wiki http://www.z88dk.org/wiki/doku.php?id=e ... ts:coswave
.......
Now here is the surprise: the drawn graphs are not the same! You can have a look at some emulator snapshots I took after running each program.

https://drive.google.com/file/d/0B6XhJJ ... sp=sharing
This was really worrying to me so I've been working on this this morning. A slightly different test program and new results:

https://drive.google.com/file/d/0B6XhJJ ... sp=sharing


#include <stdio.h>
#include <math.h>
#include <arch/spectrum.h>
//#include <graphics.h>

double x, y, z, buf;
unsigned int /*z, buf,*/ px, py;

main()
{
for (x = -3.0; x < 3.0; x += 0.06)
{
buf = 100;

for (y = -3.0; y < 3.0; y += 0.2)
{
z = 70.0 - (10.0 * (y + 3.0) + (10.0 * cos(x*x + y*y)));

if (buf > z)
{
buf = z;

px = 32.0 * (x + 3.0) + 0.5;
py = z * 2.0 + 0.5;

*(unsigned char *)(zx_pxy2saddr(px, py)) |= zx_px2bitmask(px);
//plot(px, py);
}
}
}
}


clib=new, math48 : 2m25s
clib=classic, genmath : 2m34s
clib=classic, zxmath : 3m31s
zx basic : 4m19s


Everything looks good now... the graphs are almost identical except the zxmath version looks a little bit different. I am not sure why that should be given zxmath is using the same float code as zx basic.


Most early fp implementations in z80 systems were microsoft derived but Sinclair's implementation was a totally fresh implementation designed to be compact (it had to fit in a system with 8k rom and 1k ram that was more than just a calculator; incidentally there is a connection between Sinclair's implementation and Alan Turing, something I wasn't aware of until just the other day). There are a lot of things in favour of Sinclair's implementation and one of them is that there is only one number type with the system automatically selecting integer or fp as appropriate. Most other basics had you decide up front whether a number was either integer or not and this was sometimes done by variable name (initial letters less than "i" are int, for example).

Anyway this auto-selection of int or fp was what caused the nice graph by zx basic in the first test of the 3d graph program posted earlier with the disappointing results for math48 and genmath. The problem was actually in computing the z coordinate for the z-buffer algorithm :- in the C programs the z coordinate was being computed as an integer whereas in zxbasic the z coordinate was kept float. So I changed the z coordinate in the C program to double and added rounding by adding 0.5 (I haven't implemented the rounding primitives in the C standard yet).

Now the graphs look ok, phew :)



------------------------------------------------------------------------------
stefano
Well known member
Posts: 2137
Joined: Mon Jul 16, 2007 7:39 pm

Post by stefano »

Most early fp implementations in z80 systems were microsoft derived
This is something I didn't know, though the CP/M version of the MS BASIC (and its compiler) designed by Monte had very powerful options, a relevant one being the choice between the 4bit or 8bit mantissa.
There are two more fascinating implementations I'd like to look at in the future, the TI calculator's one, able to preserve the right rounding on a surprising number of situations, and the Erico Maria Giordano's recent experiments with the variable size mantissa:

http://www.emagsoftware.it/spectrum/math.txt
http://www.emagsoftware.it/spectrum/math.zip

I can get the source if you think we can use it somehow..

but Sinclair's implementation was a totally fresh implementation designed to be compact (it had to fit in a system with 8k rom and 1k ram that was more than just a calculator; incidentally there is a connection between Sinclair's implementation and Alan Turing, something I wasn't aware of until just the other day).
I think the Sinclair team sharpened their teeth with the Sinclair calculator project.. it is a very interesting story, indeed.
There are a lot of things in favour of Sinclair's implementation and one of them is that there is only one number type with the system automatically selecting integer or fp as appropriate.
This shouldn't be true on the ZX81 variant, which, if I remember correctly just converted everything into FP; the transparently embedded integer format was introduced with the Spectrum.
The famous slowness of the ZX BASIC implementations mostly a consequence of the numbers Handling.. sadly few people noticed the many extra functions put in the ROM due to the saved space :(
Anyway this auto-selection of int or fp was what caused the nice graph by zx basic in the first test of the 3d graph program posted earlier with the disappointing results for math48 and genmath. The problem was actually in computing the z coordinate for the z-buffer algorithm :- in the C programs the z coordinate was being computed as an integer whereas in zxbasic the z coordinate was kept float. So I changed the z coordinate in the C program to double and added rounding by adding 0.5 (I haven't implemented the rounding primitives in the C standard yet).
The intention of that demo was simply to show how small a math+graphics program could be.. by the way I find all your work very interesting and precise, please go ahead !



------------------------------------------------------------------------------
alvin
Well known member
Posts: 1872
Joined: Mon Jul 16, 2007 7:39 pm

Post by alvin »

the CP/M version of the MS BASIC (and its compiler) designed by Monte had very powerful options, a relevant one being the choice between the 4bit or 8bit mantissa.
4-byte and 8-byte?
There are two more fascinating implementations I'd like to look at in the future, the TI calculator's one, able to preserve the right rounding on a surprising number of situations, and the Erico Maria Giordano's recent experiments with the variable size mantissa:

http://www.emagsoftware.it/spectrum/math.txt
http://www.emagsoftware.it/spectrum/math.zip

I can get the source if you think we can use it somehow..
96-bit mantissa? He is out of his mind :D But why not, we do this for kicks and this is especially amusing. If we can get it to work I will have to replace my windows desktop calculator with spectrum emulator.
The intention of that demo was simply to show how small a math+graphics program could be.. by the way I find all your work very interesting and precise, please go ahead !
Yes and small it is! A change of type for z and buf will get you a much nicer graph though :)

I did find two printing errors in the output routine. One is with the hex format output (0xh.hhhhp+-ddd): if a really small magnitude number is printed the negative base two exponent rolls past -128 when printed and appears instead as a large positive exponent. This is happening because there is 8-bit overflow and the C standard requires output of the first hex digit of the mantissa to be to the left of the decimal point. That means the exponent has to be decreased by 4 when printed. If they had stuck to a "0x0.hhhhp+-ddd" format you could just output the binary representation without doing any arithmetic on it.

The other is either printing or reading a large magnitude decimal number. There is an infinity error someplace so numbers less than the largest representation are being printed as infinity.

I also have to check if math errors and signs are correctly handled. This has all been added to math48 which simply returned carry flag set if some sort of error occurred. I made an effort to make out of domain errors align with functions' asymptotic values and to make sure +-inf results had proper sign. I have no idea if I was successful yet; that's the difficulty of adapting someone else's code as you're never quite sure exactly how everything works.

Anyway I will see if I can do sdcc today. In my mind, I don't think it's too difficult but sdcc does have a bunch of float primitives so it's still more than a small amount of work. scanf I am putting off a little but because a state machine has to be written to accept all the weird float formats ("nan", "nan(...)", "inf", "infinity", "2432.2324e-10", "0.000000000000000000000000000000001e20", "0xabcd.p+2", etc). I was hoping I could use more of the strtod code but because of possibly degenerate input I probably have to write a complete parser without cheating.



------------------------------------------------------------------------------
alvin
Well known member
Posts: 1872
Joined: Mon Jul 16, 2007 7:39 pm

Post by alvin »

sdcc is still coming along.. it might take another day or two until it's ready. For the moment, I've decided to do all the float operations with the math library. That means even simple things like float<->int conversion or comparison involves a conversion from sdcc_float_format to the math_lib_float_format, the math lib does the operation and (if a float is returned) the result is converted from math_lib_format to sdcc_float_format. So there's some overhead there but the code size is being kept fairly small. In the future maybe these primitive operations should be done completely in sdcc_float_format. This issue is not as apparent with sccz80 because its native format is almost the same as math48's and it has a different floating point model.

While checking code generation:


extern float __CALLEE__ two(float a, float b);// __z88dk_callee;
extern float __FASTCALL__ one(float a);// __z88dk_fastcall;
extern float normal(float a, float b);

float a, b, c;

main()
{
c = two(a,b);
c += a - b;
a = one(c);
c = a + normal(a,b) - two(b,a) + one(c);
}


I noticed one problem with sccz80. In the last statement, after the call to "normal" the compiler clears the stack like so:

...
call _normal
exx
ld hl,12 ;const
add hl,sp
ld sp,hl
exx

All other return types (char, int, long, void*, etc) have their result in the main register set so the exx manages to save the return value in those cases. However with float/double type the return value is not in the main register set. With the new clib, it's in the exx set so this actually wipes out the returned float. With the classic lib, the return value is in memory in a static fp accumulator. So I'd like to change sccz80 to use the main set rather than the exx set to repair the stack if the return type is float. If you guys have time before me maybe you can have a look? I can make the change but I am not as familiar with the compiler code.


Philip, sdcc is generating some contorted code with static floats. This is probably more of the same problem sdcc currently has with statics.

For the C code above sdcc is creating a temporary float on the stack:

_main:
push ix
ld ix,0
add ix,sp
push af ;; space for temporary float used by compiler
push af
...


The following call to the callee float function "two" works as expected:

ld hl,(_b + 2)
push hl
ld hl,(_b)
push hl
ld hl,(_a + 2)
push hl
ld hl,(_a)
push hl
call _two
ld (ix-1),d
ld (ix-2),e
ld (ix-3),h
ld (ix-4),l
ld de,_c
ld hl,0
add hl, sp
ld bc,4
ldir


However the result in dehl is first copied into that temporary spot on the stack (the stores indexed by ix) and then copied from there into the destination variable with an ldir. The code is correct but it's really poor and again it is missing the really simple way to do this by writing dehl directly to memory:

ld (_c),hl
ld (_c+2),de

This is the same thing seen with statics in general, particularly longs so I think it's all the same missed code generation trick. Statics are quite important as usually one of the first steps taken to reduce z80 code size is to start changing locals to static locals.



------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors
network devices and physical & virtual servers, alerts via email & sms
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
alvin
Well known member
Posts: 1872
Joined: Mon Jul 16, 2007 7:39 pm

Post by alvin »

Good news, I think I have floats integrated into sdcc.

Some test programs (use cvs to get latest or wait for tonight's build)

************************************************************
** Add, Subtract, Multiply, Divide *********************************
************************************************************

// zcc +zx -vn -clib=new asmd.c -o asmd -lm_sccz80
// zcc +zx -vn -clib=sdcc_ix --max-allocs-per-node200000 --reserve-regs-iy asmd.c -o asmd -lm_sdcc_ix

#include <stdio.h>
#include <stdlib.h>

char buf[100];
double a, b;

main()
{
while(1)
{
printf("\n\nEnter number (a): ");

fflush(stdin);
scanf(" %50[^\n]", buf);

a = strtod(buf, NULL);

printf("Enter number (b): ");

scanf(" %50[^\n]", buf);

b = strtod(buf, NULL);

printf("\n");

printf("%g + %g = %g\n", a, b, a + b);
printf("%g - %g = %g\n", a, b, a - b);
printf("%g * %g = %g\n", a, b, a * b);
printf("%g / %g = %g\n", a, b, a / b);

printf("\n");

printf("%g > %g : %s\n", a, b, (a > b) ? "true" : "false");
printf("%g >= %g : %s\n", a, b, (a >= b) ? "true" : "false");
printf("%g < %g : %s\n", a, b, (a < b) ? "true" : "false");
printf("%g <= %g : %s\n", a, b, (a <= b) ? "true" : "false");
printf("%g == %g : %s\n", a, b, (a == b) ? "true" : "false");
printf("%g != %g : %s\n", a, b, (a != b) ? "true" : "false");
}
}


************************************************************
** Math10 **************************************************
************************************************************

// zcc +zx -vn -clib=new math10.c -o math10 -lm_sccz80
// zcc +zx -vn -clib=sdcc_ix --max-allocs-per-node200000 --reserve-regs-iy math10.c -o math10 -lm_sdcc_ix

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>

double a;

char in[128];
char *in_end;

char out[128];

main()
{
while (1)
{
printf("\nEnter float: ");

fflush(stdin);
scanf("%[^\n]", in);

errno = 0;

a = strtod(in, &in_end);

printf("%s\n", strerror(errno));

printf("Subject : \"%.*s\"\n", in_end - in, in);
printf("Trailing: \"%s\"\n", in_end);

ftog(a, out, 11, FTOA_FLAG_HASH);
printf("Float was: %s\n", out);

ftoh(a, out, -1, FTOA_FLAG_HASH);
printf("Float was: %s\n", out);
}
}


************************************************************
** Graph **************************************************
************************************************************

// zcc +zx -vn -clib=new graph.c -o graph -lm_sccz80
// zcc +zx -vn -clib=sdcc_ix --max-allocs-per-node200000 --reserve-regs-iy graph.c -o graph -lm_sdcc_ix

#include <stdio.h>
#include <math.h>
#include <arch/spectrum.h>
//#include <graphics.h>

double x, y, z, buf;
unsigned int px, py;

main()
{
for (x = -3.0; x < 3.0; x += 0.06)
{
buf = 100;

for (y = -3.0; y < 3.0; y += 0.2)
{
z = 70.0 - (10.0 * (y + 3.0) + (10.0 * cos(x*x + y*y)));

if (buf > z)
{
buf = z;

px = 32.0 * (x + 3.0) + 0.5;
py = z * 2.0 + 0.5;

*(unsigned char *)(zx_pxy2saddr(px, py)) |= zx_px2bitmask(px);
//plot(px, py);
}
}
}
}


clib=sdccix, math48 : 2m20s
clib=new, math48 : 2m25s
clib=classic, genmath : 2m34s
clib=classic, zxmath : 3m31s
zx basic : 4m19s


I was surprised by the faster time for the sdcc compile as the code generated by sccz80 is better but I think it comes down to all those float constants being 32 bit in sdcc while 48 bit in sccz80. So the float loads of constants is faster on sdcc and there are a lot of them in this program.


One other thing that has been affecting sdcc's code gen size is that its calls to its primitives are done (mostly) using C calling convention. That means, eg, performing a float add generates code like this:

... push params on stack ...

call ___fsadd
pop af
pop af
pop af
pop af

Every single operation has 4 pops after the call to clean up the stack. I can't change these to callee as I've done with the clib as these functions are implicit in the compiler (ie there's no header file). But it occurred to me we can apply peephole rules to change all these things. We could transform the above to:

...
call ___fsadd_callee

and save those four bytes right there. We can also come up with rules to try to clear up some of the weird stuff sdcc is currently doing with statics. The code will be much cleaner and smaller.



------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors
network devices and physical & virtual servers, alerts via email & sms
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
alvin
Well known member
Posts: 1872
Joined: Mon Jul 16, 2007 7:39 pm

Post by alvin »

I forgot to mention that I'm committing pre-built libraries without %aefg enable for printf. So if you want to printf floats you will have to edit the config file to enable %aefg and rebuild the library.

For the zx target, the config file is in:
libsrc/_DEVELOPMENT/target/zx/clib_cfg.asm
From windows to rebuild the zx library enter:
Winmake zx

from libsrc/DEVELOPMENT



There is a line between the compiler side code and the math library. When floats pass across that line, there is a conversion applied between the math lib's float format and the compiler's float format. This is especially visible with sdcc as sdcc's float format is 32 bit with 24 bit mantissa. You can see this in the math10 test program in the hex representation of the float: under sdcc the rightmost bits of the math lib's 40 bit mantissa will always be zero.


Math functions (cos, sin, etc) cannot be assigned to function pointers yet. There is no problem for sdcc but sccz80 does not generate proper code for function pointer calls with floats in the parameter list (at least for the one param case). A bugfix will have to be applied there. I'm also going to come up with the macro method to create prototypes in header files before doing this as that will save a great deal of typing in math.h



------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors
network devices and physical & virtual servers, alerts via email & sms
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
stefano
Well known member
Posts: 2137
Joined: Mon Jul 16, 2007 7:39 pm

Post by stefano »

Good news, I think I have floats integrated into sdcc.
Yeah!



------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors
network devices and physical & virtual servers, alerts via email & sms
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
stefano
Well known member
Posts: 2137
Joined: Mon Jul 16, 2007 7:39 pm

Post by stefano »

One more sin/cos based toy to have fun with:
http://z88dk.org/wiki/doku.php?id=examp ... pets:globe



------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors
network devices and physical & virtual servers, alerts via email & sms
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
alvin
Well known member
Posts: 1872
Joined: Mon Jul 16, 2007 7:39 pm

Post by alvin »

One more sin/cos based toy to have fun with:
http://z88dk.org/wiki/doku.php?id=examp ... pets:globe
Very nice. Another idea is to allow the user to enter longitude and latitude and have the globe centered on that. There's a line draw being used there so I may not be able to easily adapt it.

I wrote a simple program for tau day:


// zcc +zx -vn -startup=8 -clib=new tau.c -o tau -lm_sccz80
// zcc +zx -vn -startup=8 -clib=sdcc_ix --max-allocs-per-node200000 --reserve-regs-iy tau.c -o tau -lm_sdcc_ix

// printf %g must be enabled in config file

#include <stdio.h>
#include <math.h>
#include <font/fzx.h>
#include <stropts.h>
#include <arch/spectrum.h>
#include <input.h>
#include <stdlib.h>

double_t tau;

main()
{
zx_border(INK_MAGENTA);

ioctl(1, IOCTL_OTERM_FCOLOR, INK_MAGENTA | PAPER_BLACK);
ioctl(1, IOCTL_OTERM_BCOLOR, INK_BLACK | PAPER_MAGENTA);
ioctl(1, IOCTL_OTERM_FONT, &ff_ao_OldEnglish);

ioctl(1, IOCTL_OTERM_CLS);

printf("June 28 is Tau Day\n\n");

tau = 8.0 * atan(1.0);

printf("tau=%.11g\n\n", tau);
printf("sin(tau)=%.11g\n", sin(tau));
printf("sin(6.28)=%.11g\n", sin(strtod("6.28", NULL)));

printf("\n");

in_wait_nokey();
in_wait_key();
}


Link with snapshots and screenshots:

https://drive.google.com/file/d/0B6XhJJ ... sp=sharing


In comparing the sdcc and sccz80 compiles it looks like the sdcc compile's error on sin(tau) is way too large. A 24-bit mantissa gives about 4-5 decimal digits vs a 40-bit mantissa which is about 11, so the computed values of tau and sin(6.28) seem to agree under sdcc and sccz80.

The printed value of tau in the sdcc compile is after truncation to 24 bits of mantissa. Even if you enter that in a calculator, sin of that is -3x10^-7. Getting a result of 2x10^-2 is wrong. I am not sure why that is yet.


I'm also going to try to introduce fastcall and callee versions of sdcc's primitives. As an example of the potential impact on code size, in that 3d graph program:

for (x = -3.0; x < 3.0; x += 0.06)
{
buf = 100;

for (y = -3.0; y < 3.0; y += 0.2)
{
z = 70.0 - (10.0 * (y + 3.0) + (10.0 * cos(x*x + y*y)));

if (buf > z)
{
buf = z;

px = 32.0 * (x + 3.0) + 0.5;
py = z * 2.0 + 0.5;

*(unsigned char *)(zx_pxy2saddr(px, py)) |= zx_px2bitmask(px);
//plot(px, py);
}
}
}

everywhere there is a float operation (add, sub, mul, div, int<->float, etc) there are on average 2 pops after each call to the sdcc primitive. I count 19 such operations in the fragment above and getting rid of those two pops would save 38 bytes in this small fragment. I am hoping to see similar savings from the integer primitives that sdcc calls.



------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
Post Reply