[z88dk-dev] How does fastcall work in z88dk?

Bridge to the z88dk-developers mailing list
Post Reply
Philipp Klaus Krause

[z88dk-dev] How does fastcall work in z88dk?

Post by Philipp Klaus Krause »

I only found
http://www.z88dk.org/wiki/doku.php?id=usage:stackframe
for documentation. That tells me what fastcall does for functions that
take a singel 16-bit parameter?

What happens for parameters of other size? 8-bit in l? in a? 32 bit in dehl?
What happens when there is more than one parameter? First parameter in
registers, rest on stack?

Philipp
alvin
Well known member
Posts: 1872
Joined: Mon Jul 16, 2007 7:39 pm

Post by alvin »

I only found
http://www.z88dk.org/wiki/doku.php?id=usage:stackframe
for documentation. That tells me what fastcall does for functions that
take a singel 16-bit parameter?

What happens for parameters of other size? 8-bit in l? in a? 32 bit in dehl?
What happens when there is more than one parameter? First parameter in
registers, rest on stack?

Fastcall only supports one parameter in DEHL

L = 8 bit
HL = 16 bit
DEHL = 32 bit

In z88dk floats are 48-bit so they cannot be fastcalled although they can use CALLEE linkage.

More than one parameter is handled with CALLEE linkage where the compiler pushes params on the stack as per usual but they are popped off the stack inside the called function. Because the stack is cleaned up in the called function, there is no stack repair done by the compiler following the call.

Having multiple linkages affects function pointers. Calls through function pointers are always made using standard C linkage. To accommodate fastcall functions, the first parameter is held in DEHL as well as pushed on the stack when function pointers are invoked (this could be special case code for calls with one param). For functions with multiple parameters there are two C entrypoints :- one with standard C linkage and one with callee linkage. Some macro magic ensures function pointers can only be assigned the C linkage entry point. Doing things this way means function pointers don't have to be typed based on linkage.


There is no mixing of fastcall and other linkage so, eg, the first param is not held in register with the rest on stack. I don't think that's even desirable as that may interfere with parameter gathering in the target function.


I can show you some generated code below:

FASTCALL (16-bit)

Code: Select all

   zx_border(INK_BLACK);
zx_cls(INK_GREEN | PAPER_GREEN);

=

ld        hl,0        ;const
call        zx_border
ld        hl,36        ;const
call         zx_cls

CALLEE:

Code: Select all

      memset(buffer, 0, 100);
for (i = rand() % 100; .....)

=

;;; memset(...)
ld        hl,_st_main_buffer
push        hl
ld        hl,0        ;const
push        hl
ld        hl,100        ;const
push        hl
call        memset_callee

;;; rand() % 100
call        rand
ld        de,100        ;const
ex        de,hl
call        l_div
ex        de,hl

;;; (static) i = ...;
ld        (_st_main_i),hl

The callee linkage means no cleanup after memset(). The result of rand() in HL is passed directly to l_div to compute %100. The z88dk primitives (l_div for division here) are a special case in that they are assembly routines that take parameters in registers so the compiler does not treat them like C calls.

Because the result of a function call is also in DEHL, a call to a FASTCALL function that follows can use that result as parameter without any intermediate set up code. So the use of DEHL as the single parameter is particularly suitable.



------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
Philipp Klaus Krause

Post by Philipp Klaus Krause »

Would it be useful to have support for your fastcall convention in sdcc?
I'm considering introducing a __z88dk_fastcall keyword for this
convention. It would work similar to __smallc. I'd implement support for
them in the caller first, then in the callee.
I don't think we would support your mechanism for
calling-convention-independent calls through function pointers though
(as it would make calls through functions pointers more expensive).

Philipp
alvin
Well known member
Posts: 1872
Joined: Mon Jul 16, 2007 7:39 pm

Post by alvin »

Would it be useful to have support for your fastcall convention in sdcc?
I'm considering introducing a __z88dk_fastcall keyword for this
convention. It would work similar to __smallc. I'd implement support for
them in the caller first, then in the callee.

It will make some difference but the bigger impact on executable size comes from callee linkage simply because there are only so many functions that take a single parameter.

If you head over to:
http://z88dk.cvs.sourceforge.net/viewvc/z88dk/z88dk/include/_DEVELOPMENT/

you can check out some common headers (string.h, stdlib.h, etc) to see how prevalent fastcall is in the library. The headers are divided into two with an "#ifdef __SDCC" used to select sdcc prototypes and the "#else" part containing sccz80 prototypes. In the latter, you'll see fastcall qualifiers where applicable.

It's fairly easy for me to change the library for fastcall if you'd like to try it and I'm certainly willing to do it. I do think it will make a big difference to the float functions in particular (once I've put them in) since most of those are single parameter and 32-bit under sdcc (given that float is not used in most programs).

I don't think we would support your mechanism for
calling-convention-independent calls through function pointers though
(as it would make calls through functions pointers more expensive).

You'll end up in a situation where function pointers have to be typed by linkage. I personally don't like that. I think some other compilers have gone that route (eg, cc65 for the 6502 I believe).

Here's a completely contrived example:

Code: Select all

#include <stdio.h>
#include <string.h>

char *s = "Test String";

unsigned int weight(char *s)
{
static unsigned int i, w;

i = w = 0;

while (i != strlen(s))
w += s[i++] - 'A';

return w;
}

main()
{
static void *f;  /** sccz80 **/
//   static unsigned int (*f)(char *);  /** sdcc **/

printf(""%s"\n\n", s);

f = strlen;
printf("strlen = %u\n", (f)(s));

f = weight;
printf("weight = %u\n", (f)(s));
}

strlen() is a library fastcall function whereas weight() is compiler generated and uses standard C linkage. strlen() expects a pointer to "s" in HL and weight() expects a pointer to "s" on the stack. A single function pointer cannot be used to call to both functions even though both functions have the same C signature "unsigned int (*)(char *)" because information on linkage is lost. So you need to introduce a non-standard attribute added to the type information for the function pointer that indicates the linkage. Now code like the above is not possible because pointers to strlen() and weight() are not the same type, although under C standard they should be. People will end up writing subroutines that can only except FASTCALL linked function pointers or C linked function pointers or supply multiple versions of functions that support both at the cost of code size. I think this is what is done in cc65 although I should say I don't know for sure.

Given the typing issues I think I would personally prefer the accommodation to pointer typing considering calls through function pointers aren't too common. We can have a look at what the cost difference is.

For the code above, this is what sdcc is doing for the first "(f)(s)" invocation:

Code: Select all

        ld        hl,_s
ld        c, (hl)
inc        hl
ld        b, (hl)
push        bc
ld        hl,_main_f_1_154
ld        a,(hl)
inc        hl
ld        h,(hl)
ld        l,a
call        ___sdcc_call_hl
pop        af

This is something else I wanted to bring to your attention re: the loading of 16-bit static variables :) Anyway this code can be written like this:

Code: Select all

ld hl,(_s)
push hl
ld hl,(_main_f_1_154)
call ___sdcc_call_hl   ;; call to "jp (hl)"
pop af

Accommodating FASTCALL, sccz80 is doing this:

Code: Select all

        ld        de,(_st_main_f)
ld        hl,(_s)
ld        bc,i_6
push        hl
push        bc
push        de
ld        a,1   ;; ignore this as it's a requirement for vararg functions when param order is L->R
ret
.i_6
pop        bc

HL holds the only parameter which is pushed on the stack, followed by the return address (BC) and the call is made by pushing the function address (DE) followed by a "ret" to invoke.


sdcc is taking 11 bytes and 74 cycles to make the call (I don't count the single byte for the jp(hl) as that is always going to be present in an executable). sccz80 is taking 14 bytes and 97 cycles to make the call.



------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
Philipp Klaus Krause

Post by Philipp Klaus Krause »

On 07.04.2015 20:29, alvin (alvin_albrecht@...) wrote:
I don't think we would support your mechanism for
calling-convention-independent calls through function pointers
though (as it would make calls through functions pointers more
expensive).

You'll end up in a situation where function pointers have to be typed
by linkage.

Yes, and IMO, that's the only clean solution.

Given the typing issues I think I would personally prefer the
accommodation to pointer typing considering calls through function
pointers aren't too common. We can have a look at what the cost
difference is.

The cost that I'm not willing to pay here would be tying sdcc to a
particular calling convention. While I'm willing to add support
__z88dk_fastcall, I don't want to make it the default. Instead, in the
long term, I want to investigate different calling onventions
empirically, and then use the best one in terms of code size for sdcc.
And I have no idea which calling convention will win. But it might well
be incompatible with __z88dk_fastcall.

Philipp
alvin
Well known member
Posts: 1872
Joined: Mon Jul 16, 2007 7:39 pm

Post by alvin »

The cost that I'm not willing to pay here would be tying sdcc to a
particular calling convention. While I'm willing to add support
__z88dk_fastcall, I don't want to make it the default. Instead, in the
long term, I want to investigate different calling onventions
empirically, and then use the best one in terms of code size for sdcc.
And I have no idea which calling convention will win. But it might well
be incompatible with __z88dk_fastcall.

Ok. Yeah if it's easy to do, let's try it. I can solve the function pointer issue with the same sort of cpp trick used for callee linkage. As mentioned, the callee linkage is the one that saves quite a few bytes (maybe a couple hundred in a reasonably sized program) but it will be interesting to see how much is saved with the float pack using fastcall.



------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
Philipp Klaus Krause

Post by Philipp Klaus Krause »

We have support for __z88dk_fastcall in sdcc now. Both on the caller and
the callee side.

Philipp
Philipp Klaus Krause

Post by Philipp Klaus Krause »

On 09.04.2015 22:40, Philipp Klaus Krause wrote:
We have support for __z88dk_fastcall in sdcc now. Both on the caller and
the callee side.

Philipp

But there is a peephole optimizer issue. Code calling __z88dk_fastcall
functions needs to be compiled with --no-peep until bug #2371 is fixed.

Philipp
Post Reply