[z88dk-dev] sdcc --opt-code-size and long longs

Bridge to the z88dk-developers mailing list
Post Reply
alvin
Well known member
Posts: 1872
Joined: Mon Jul 16, 2007 7:39 pm

[z88dk-dev] sdcc --opt-code-size and long longs

Post by alvin »

I figured out how to reduce code size for sdcc compiles and 64-bit integer code. I've committed the results in cvs already so they will be available in the next build but zsdcc has to be rebuilt from the new patch. The updated windows binary is already in the patch zip: http://z88dk.cvs.sourceforge.net/viewvc ... _patch.zip

sdcc has been inlining a lot of expensive code (in size) when adding, subtracting and copying 8-byte longlongs. An example is adding two longlongs together:

ld a,(_x)
ld hl,_y
add a, (hl)
ld (_z),a
ld a,(_x + 1)
ld hl,_y + 1
adc a, (hl)
ld (_z + 1),a
ld a,(_x + 2)
ld hl,_y + 2
adc a, (hl)
ld (_z + 2),a
ld a,(_x + 3)
ld hl,_y + 3
adc a, (hl)
ld (_z + 3),a
ld a,(_x + 4)
ld hl,_y + 4
adc a, (hl)
ld (_z + 4),a
ld a,(_x + 5)
ld hl,_y + 5
adc a, (hl)
ld (_z + 5),a
ld a,(_x + 6)
ld hl,_y + 6
adc a, (hl)
ld (_z + 6),a
ld a,(_x + 7)
ld hl,_y + 7
adc a, (hl)
ld hl,_z + 7
ld (hl),a

This sort of code adds up in size quickly. So what I've done is turn these things into subroutine calls. The above example is turned into this:

ld bc,_x
ld de,_z
ld hl,_y
call ____sdcc_ll_add_de_bc_hl

which is a heck of a lot shorter.

I've made substitutions for a lot of operations including add, subtract, push for function call, various copies, and some shifting. But I am not done yet -- there are more shift code patterns and I haven't looked at logical operators yet.

The changes are not in the compiler itself but in sdcc's z80 peepholer and our peephole rules.

In any case, what I've done is added these code size optimizations to compiles when "--opt-code-size" appears. No "--opt-code-size" and you get the regular code sdcc generates.

An example program zip:
https://drive.google.com/file/d/0B6XhJJ ... sp=sharing

Compiling to binaries with this (the target has to be configured to allow the longlong and float printf/scanf converters; in my test compiles I just turned everything on):

zcc +zx -vn -SO3 -startup=4 -clib=sdcc_ix --reserve-regs-iy --max-allocs-per-node200000 lg.c -o lg -lm
zcc +zx -vn -SO3 -startup=4 -clib=sdcc_ix --reserve-regs-iy --max-allocs-per-node200000 lg.c -o lg -lm --opt-code-size

has the "--opt-code-size" version 497 bytes shorter.

The generated asm for each version is included in the zip which you can compare side by side to see where the savings are. These were produced with:

zcc +zx -vn -a -SO3 -clib=sdcc_iy --max-allocs-per-node200000 lg.c
zcc +zx -vn -a -SO3 -clib=sdcc_iy --max-allocs-per-node200000 lg.c --opt-code-size

(clib sdcc_ix or sdcc_iy doesn't matter when producing asm; the "--reserve-regs-iy" flag does matter and sdcc_iy sets that for us)



------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/3045 ... 31938128;j
alvin
Well known member
Posts: 1872
Joined: Mon Jul 16, 2007 7:39 pm

Post by alvin »

I am still missing one ingredient in the solution which I will try to add today before the build if I can find the time. Dom, if you can rebuild the zsdcc binary for osx that would be good too. Without doing that code generated when --opt-code-size is used may be incorrect. As mentioned, the updated windows zsdcc binary is available with the patch.

To sketch out the problem with a simple example, in the opt-code-size peephole rules, there is a bit of code to replace this:

xor a,a
ld (hl),a
inc hl
ld (hl),a
inc hl
ld (hl),a
inc hl
ld (hl),a
inc hl

with this:

xor a
call ____sdcc_lib_setmem_hl - 16


"____sdcc_lib_setmem_hl" maps to the library function "l_setmem_hl" which is used by the library to quickly initialize data structures without affecting registers other than A and HL. Since "l_setmem_hl" is very likely to be in the output binary, this is a win for code size.


However, since this substitution is made at the peephole stage, the peepholer will see a function call after the substitution and decide that A and HL before it are dead. This means it may eliminate a load into HL or A before the call which would be disastrous. So the last change I made to zsdcc was to add z88dk special functions to the peepholer. The special functions are a list of known functions with active registers. So "____sdcc_lib_setmem_hl" will have an entry that says A and HL are read by the call so that the peepholer will not eliminate those registers prior to the call.

However that's not where the difficulty ends. Because the compiler has already generated the output and we're doing text substitutions, the compiler may have other registers live through the substituted function call. For example "BC" might be live at the call to "call ____sdcc_lib_setmem_hl" above. However, because the call to "____sdcc_lib_setmem_hl" only says it reads A and HL, the peepholer will consider BC dead and may eliminate it even though it could be used after the call to "____sdcc_lib_setmem_hl". One solution is to add "BC", or rather, all registers to the might read list of "____sdcc_lib_setmem_hl" but of course that would just needlessly prevent optimizations from occurring. Another method is to put pushes around the call in the peephole substitutions and then try to remove them.

Instead of substituting this:

xor a
call ____sdcc_lib_setmem_hl - 16

we subsitute this:

xor a
push bc
push de
call ____sdcc_lib_setmem_hl - 16
pop de
pop bc

and then try to peel off the pushes and pops in the peepholer so that if BC and DE are not used after the call, then they can be optimized away before the call.

Since those pushes and pops are not actually necessary for "____sdcc_lib_setmem_hl" since that routine does not change those registers, a further post-processing step using copt can remove any remaining pushes and pops around "____sdcc_lib_setmem_hl".

So this strategy will solve the issues except the call to "____sdcc_lib_setmem_hl" is recursively built (appending a "-2" if more "ld (hl),a;inc hl" appear after it and I think the pushes and pops may prevent that. I haven't tried it yet so I am not sure.


This issue also affects our bugfix of sdcc's __critical code. sdcc has a language extension that allows code enclose with "__critical { .. }" to be run with interrupts disabled. __critical does not simply di and ei -- it determines the interrupt enable state, then dis and then restores the original interrupt enable state. This is great except the code only works on cmos z80 processors due to the nmos "ld a,i" bug. So what we are doing now is replacing the __critical code pattern with calls to our library functions "asm_z80_push_di" and "asm_z80_pop_ei" which do exactly the same thing but the library is aware of whether the target is a cmos z80 or an nmos so code generated will always be correct. However I had not noticed before that the substitutions may be bugged because of the issues mentioned above. So these will have to be fixed in the same way.

Anyway I will see if I can get this down before tonight's build.



------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/3045 ... 31938128;j
User avatar
dom
Well known member
Posts: 2091
Joined: Sun Jul 15, 2007 10:01 pm

Post by dom »

I am still missing one ingredient in the solution which I will try to add today before the build if I can find the time. Dom, if you can rebuild the zsdcc binary for osx that would be good too. Without doing that code generated when --opt-code-size is used may be incorrect. As mentioned, the updated windows zsdcc binary is available with the patch.
Quick reply: tonights builds should have the latest binaries.



------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/3045 ... 31938128;j
alvin
Well known member
Posts: 1872
Joined: Mon Jul 16, 2007 7:39 pm

Post by alvin »

Cheers dom. I believe all the issues are worked out for tonight's build.



------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/3045 ... 31938128;j
Post Reply