Optimize for speed? Unefficient code

ZX80, ZX 81, ZX Spectrum, TS2068 and other clones
Post Reply
siggi
Well known member
Posts: 343
Joined: Thu Jul 26, 2007 9:06 am

Optimize for speed? Unefficient code

Post by siggi »

When compiling my ZX81 midiplayer I had also a look on the generated code, because som parts of the player are time critical.
I found locations, where inefficient code has been generated.
This always occurs, if a variable need to be incremented or decremented, e.g.
"ev++;".

The generated code increments the memory using registers, but then decrements the registers again. That is useful for comparisons based on that variable, e. g.
"if (ev++) ..."
but wastes time (and space) if only increment is needed. The generated is e. g.

Code: Select all

;                debug_delay_calls++;
        C_LINE        211,"MidiPlayer.c"
        C_LINE        211,"MidiPlayer.c"
        ld        hl,(_debug_delay_calls)
        inc        hl
        ld        (_debug_delay_calls),hl
        dec        hl
The last "dec hl" is not necessary here. In case of bigger data types (e. g. long), a call to a library (for long dec) is not necessary.

So I did some optimations by using the generated assembler code in my C program (getting more and more ugly) and optimized it by hand.
Is there any compiler option available to avoid this?

Siggi
User avatar
dom
Well known member
Posts: 1188
Joined: Sun Jul 15, 2007 10:01 pm

Post by dom »

A couple of things.

Most of the dead assembler elimination for sccz80 is done by copt. Running with -c-code-in-asm option will clobber some of the checks for a situation like this.

In this case use a pre increment rather than a post decrement.
siggi
Well known member
Posts: 343
Joined: Thu Jul 26, 2007 9:06 am

Post by siggi »

Thanks for that information.

Maybe you could pu those kind of things (like speed) into a WIKI page, that could be easily found?

Siggi
alvin
Well known member
Posts: 1872
Joined: Mon Jul 16, 2007 7:39 pm

Post by alvin »

Although be careful because the post increment and pre increment don't mean the same thing.

if (ev++) ...

Means increment the value but use the old value in the if test. That's why the following "dec hl" must be there.

if (++ev) ...

Mean increment the value and use that for the if test.
User avatar
dom
Well known member
Posts: 1188
Joined: Sun Jul 15, 2007 10:01 pm

Post by dom »

siggi wrote:Thanks for that information.

Maybe you could pu those kind of things (like speed) into a WIKI page, that could be easily found?

Siggi
It's scattered about the place, but I've brought it together here: https://github.com/z88dk/z88dk/wiki/WritingOptimalCode

Some of the tips are based on my current conditional branch which will be merged in soon.
siggi
Well known member
Posts: 343
Joined: Thu Jul 26, 2007 9:06 am

Post by siggi »

Hi Dom
thanks again. That info helps a lot!

Siggi
siggi
Well known member
Posts: 343
Joined: Thu Jul 26, 2007 9:06 am

Post by siggi »

Hi Dom
here are my "benchmarks" using the latest compiler, giving strange results!

My midiplayer has a counter, which is incremented, when a delay between 2 midi-events (defined in the midi-file) could not be met by the player, because the system (Zeddy-player, USB stick holding the midi-file, RS232 uart used to send the midi-data) is too slow to be back from its work to fulfill the given delay to the next midi event.
My first player version was compiled using the old Z88DK version dated December 2015. Playing my test song, I got a value of 825 delay violations (goal is 0).

After that I optimized the code for speed (using also your hints) and used the latest compiler using the compiler options:
zcc +zx81 -startup=2 -m -O2 --opt-code-speed=all -create-app -Cz--disable-autorun -vn -o MidiPlay.bin MidiPlayer.c

Running my testsong I got the value of 800 delay violations (good progress :) )
But when I compiled that opimized program again, using the old compiler version and run my testsong again, I got a better result: 785 delay violations!
It seems, that the old compiler version creates faster code than the new version ...

Then I compiled an old project, where the size of the program is critical. I compiled with that options:
zcc +zx81 -startup=2 -O3 -zorg=11192 -vn -DDRIVER=8192 -o ufm-11192.bin ufm-driver.c
which gave a file size of 5475 bytes (too big!), using the latest compiler.

When I used again the old compiler version, I got a file size of 5095 bytes (is OK, limit is 5192).

Thus the current state (concerning at least my projects) is:
the current compiler makes slower and bigger code that the old compiler (using the same source and compiler options).

???

Siggi
User avatar
dom
Well known member
Posts: 1188
Joined: Sun Jul 15, 2007 10:01 pm

Post by dom »

That's odd, I can believe that files may be bigger - more stuff is being inlined, but slower? That shouldn't be the case at all - unless you're calling a lot of routines that use the index register .

Can you send me (via email (dom /at/ z88dk /dot/ org) the sources and your binaries/.maps and I'll take a look

The good news is that my tips worked!
siggi
Well known member
Posts: 343
Joined: Thu Jul 26, 2007 9:06 am

Post by siggi »

Hi Dom
e-mail is sent!

Regards
Siggi
User avatar
dom
Well known member
Posts: 1188
Joined: Sun Jul 15, 2007 10:01 pm

Post by dom »

I'm working offline with Siggi on this, but it looks like the increase in size is due to library changes - probably the extra functionality within stdio and the importing of the new lib integer maths routines.

The slowdown may well be related to a library routine as well given the compiler generated code has only minor differences.
siggi
Well known member
Posts: 343
Joined: Thu Jul 26, 2007 9:06 am

Post by siggi »

That is the current state (containing speed optimations) of my ZX81 midi-player:
https://youtu.be/kD9Tkxjx7yg

:-)
Siggi
User avatar
dom
Well known member
Posts: 1188
Joined: Sun Jul 15, 2007 10:01 pm

Post by dom »

That's really cool - I'm glad it's working so well.

Can you talk through that ZX81 setup? It doesn't look quite like my one.
siggi
Well known member
Posts: 343
Joined: Thu Jul 26, 2007 9:06 am

Post by siggi »

The Zeddy (a ZXNU: ZX81-clone without ULA) is mounted to the left side of a small rack. Internally it has an interface for VDRIVE2 to use an USB stick as mass storage. About the ZXNU:
http://forum.tlienhard.com/phpBB3/viewt ... f=2&t=1029
"Out of the box" the ZXNU has 80kB ram, but I modded it to use 96KB of the 128 KB ram chip.

The backplane in the rack is connected to the Zeddy via a bus driver board (between the Zeddy and the left side of the rack). 7 cards can be connected to the backplane. Currently is is equipped with (from left to right)

- a sound card (AY compatible, active speakers on top of the rack)
- a keyboard buffer card for the external Memotech keyboard (see http://forum.tlienhard.com/phpBB3/viewt ... f=2&t=2745 )
- MMC card interface used as MEFISDOS drive
- a RS232 card (with 8251 UART) used for MIDI output (see http://forum.tlienhard.com/phpBB3/viewt ... f=2&t=2404 )
- a ZeddyNet (Ethernet) card (see http://forum.tlienhard.com/phpBB3/viewt ... 19#p10835)

The serial board output (RS232-voltagel level) is converted on small vero board into a current loop signal, used at MIDI devices. The MIDI signal goes to a Yamaha synthesizer/keyboard.
cborn
Member
Posts: 10
Joined: Tue Oct 06, 2020 7:45 pm

Re: Optimize for speed? Unefficient code

Post by cborn »

Hello,
I dont know were to put my remark so i try it here since it says 'optimize'
A tread on WOS mentions some z88dk asm code and how to optimize.
it shows the compiler (at that time) created an extra 'or a,a' after a 'dec a'
https://worldofspectrum.org/forums/disc ... ent_970778

I dont think the 'or a,a' is needed but i have realy no clue about what part of which compiler this is and if this is still working like that.
OR resets most flags while DEC has a multiple outcome. perhaps there are cases that the flags should be resetted after a DEC A but that will be rare and probably only between different compile parts ,afa i can imagine, and not inside an asm loop. I hope i see this right and that it is usable.
User avatar
dom
Well known member
Posts: 1188
Joined: Sun Jul 15, 2007 10:01 pm

Re: Optimize for speed? Unefficient code

Post by dom »

Why not just use memset for this "problem" - both compilers have logic to inline it where one or more parameters are const, using djnz as appropriate or ldir for longer blocks.

They can also inline:

memcpy
strcpy
strchr

Not that the z88dk library implementations are bad, but they are general purpose so I think assume 16 bit lengths and you have to swallow the frame setup/call cost.
cborn
Member
Posts: 10
Joined: Tue Oct 06, 2020 7:45 pm

Re: Optimize for speed? Unefficient code

Post by cborn »

Hi,
actualy i did not react on optimizing the C commands from 16 to 8 bit but i point to the compiler result cq asm outcome being double.
2 points:
a) djnz does NOT touch the Zero-flag
b) OR a,a is used to do that after all

in the 8bit variant djnz is avoided and a standard 'DEC A' is used
2 points again:
a) DEC always influences the Zero flag
b) 'OR a,a' just repeats that setting or striking of the flag and removes al other Flag results aswell by resetting to 0

the compiler routine it self can be shorter and quicker if IN CASE OF no djnz the 'OR a,a' solution is removed
If coded manualy i would remove it.
so i think the compiler itself can be optimized IF the compiler does make that double instant of (re)setting the Zero-flag
And i will defenitly look at the C commands you suggest,since i need to learn C instead of asm
edit:
I think using 'OR a,a' is a type off patching on 'djnz' and only should be used sometimes
User avatar
dom
Well known member
Posts: 1188
Joined: Sun Jul 15, 2007 10:01 pm

Re: Optimize for speed? Unefficient code

Post by dom »

Yes, the or a is redundant, it's one of those things that would be taken out by the peepholer.

However my point stands - it's always best to use the library to achieve things if possible, both the standard library and the target specific library have a lot of functionality that should be used in preference to writing yourself.
Post Reply