Flipping Bytes & Bits!
Flipping Bytes & Bits!
Hello there my fellow z88dk'ers
I was wondering, after having a look around on the interwebs, on how to horizontally flip a bitmap image so that you get a mirror - now with the oodles of power a PC has this is quite a trivial matter and can easily be brute forced - however implement this ( even for a small 5x13 cells [ 520 bytes ] ) on the humble z80 takes too long using such a method.
I have looked at off-loading this by simply including an already flipped bitmap image - however this eats away at the memory so I was wondering if there is a either a clever method involving a few instructions ( possibly an assembly rotate or two ) that can help me flip a byte horizontally such that
0x"10011100" would yield "00111001" or perhaps a lookup table of sorts to off-load the amount of ram require as well as the amount of instructions.
Currently I have a combination of ASM and C where bitmaps are written to a buffer and then that buffer is written to the ZX Spectrum "Display File"
just wondering if anyone out there has any idle thoughts, ideas or insight on how I can reduce the Ram requirements and not overly increase the render to buffer time.
Thanks for reading.
I was wondering, after having a look around on the interwebs, on how to horizontally flip a bitmap image so that you get a mirror - now with the oodles of power a PC has this is quite a trivial matter and can easily be brute forced - however implement this ( even for a small 5x13 cells [ 520 bytes ] ) on the humble z80 takes too long using such a method.
I have looked at off-loading this by simply including an already flipped bitmap image - however this eats away at the memory so I was wondering if there is a either a clever method involving a few instructions ( possibly an assembly rotate or two ) that can help me flip a byte horizontally such that
0x"10011100" would yield "00111001" or perhaps a lookup table of sorts to off-load the amount of ram require as well as the amount of instructions.
Currently I have a combination of ASM and C where bitmaps are written to a buffer and then that buffer is written to the ZX Spectrum "Display File"
just wondering if anyone out there has any idle thoughts, ideas or insight on how I can reduce the Ram requirements and not overly increase the render to buffer time.
Thanks for reading.
Re: Flipping Bytes & Bits!
It is in my wish/todo list to do something similar on the monochrome sprites.
The ideal approach depends on your needs, it could be necessary to swap also the colour attributes, have an knowh number (or proportion) of bytes to flip, etc..
Supercode provides the routines to reflect the characters on the X or Y axis or rotate them:
https://github.com/z88dk/techdocs/blob/ ... .asm#L5339
The ideal approach depends on your needs, it could be necessary to swap also the colour attributes, have an knowh number (or proportion) of bytes to flip, etc..
Supercode provides the routines to reflect the characters on the X or Y axis or rotate them:
https://github.com/z88dk/techdocs/blob/ ... .asm#L5339
Re: Flipping Bytes & Bits!
That's really helpful!
Thank you stefano!
Its given me an idea with a 32 byte index, breaking a byte into nibbles - using the lower nibble ( 0 - 3 = bits 1,2,3,4 ) as the index value... hmm something like this...
using the nibble patern value it can be used as an index = this value can then be "left shifted" 4 times to become the high nibble within a new byte
We can use a similar reverse method to "right shit" the "high nibble" bits 5,6,7 and 8, to then be bits 1,2,3,4 to use the index again and add these to the flipped byte.....
could it be that easy i wonder..... hmm... looks like I am coding tonight lol
Thank you stefano!
Its given me an idea with a 32 byte index, breaking a byte into nibbles - using the lower nibble ( 0 - 3 = bits 1,2,3,4 ) as the index value... hmm something like this...
Code: Select all
Nibble X Mirror index flipped value Change State
0000 = 0000 0 0 NC*
0001 = 1000 1 8 +7
0010 = 0100 2 4 HV
0011 = 1100 3 12 +9
0100 = 0010 4 2 HV
0101 = 1010 5 10 HV
0110 = 0110 6 6 NC*
0111 = 1110 7 14 HV
1000 = 0001 8 1 -7
1001 = 1001 9 9 NC*
1010 = 0101 10 5 HV
1011 = 1101 11 13 +2
1100 = 0011 12 3 -9
1101 = 1011 13 11 -2
1110 = 0111 14 7 HV
1111 = 1111 15 15 NC*
NC = no change
HV = Half Value
We can use a similar reverse method to "right shit" the "high nibble" bits 5,6,7 and 8, to then be bits 1,2,3,4 to use the index again and add these to the flipped byte.....
could it be that easy i wonder..... hmm... looks like I am coding tonight lol
Re: Flipping Bytes & Bits!
I should point out this is based on simple black/white bitmap image data as that's what I am working with...
Re: Flipping Bytes & Bits!
The usual way of flipping all 8 bits in an image is to use a lookup table for all the 256 combinations. It costs 256 bytes + a few bytes of code.
The alternative way to do it is to use assembly. This is usually too slow if you want to flip sprites in real time. Uses less bytes but slower.
A third way to to store both sides separately. Some games do this.
A 4th way is to draw direction independent sprites.
PS. I was surprised that sp1 couldn't do this, as it already used so many bytes for storage already.
PPS. On machines like Game Boy people use hardware flipping.
The alternative way to do it is to use assembly. This is usually too slow if you want to flip sprites in real time. Uses less bytes but slower.
A third way to to store both sides separately. Some games do this.
A 4th way is to draw direction independent sprites.
PS. I was surprised that sp1 couldn't do this, as it already used so many bytes for storage already.
PPS. On machines like Game Boy people use hardware flipping.
Re: Flipping Bytes & Bits!
SP1 works on a fixed sprite size. The picture data is provided by the programmer, thus the flip/rotate routines I mentioned could be applied to such data to save space.
I'm not very good with SP1 but a data manipulation routine should be easy to implement.
EDIT: your idea is very interesting. Dealing with nibbles is not very fast on the Z80 (try using RLD on the entire screen) but often it's just a matter of finding the right way to do things..
I'm not very good with SP1 but a data manipulation routine should be easy to implement.
EDIT: your idea is very interesting. Dealing with nibbles is not very fast on the Z80 (try using RLD on the entire screen) but often it's just a matter of finding the right way to do things..
Re: Flipping Bytes & Bits!
There is a function available to mirror a byte, mainly used with the z180 CSIO for SPI interfaces.I was wondering if there is a either a clever method involving a few instructions ( possibly an assembly rotate or two ) that can help me flip a byte horizontally such that
0x"10011100" would yield "00111001" or perhaps a lookup table of sorts to off-load the amount of ram require as well as the amount of instructions.
https://github.com/z88dk/z88dk/blob/mas ... mirror.asm
It might be too slow for what you need though.
Re: Flipping Bytes & Bits!
For those people who are still skeptical about a lookup table, here's a link from a random disassembly of a commercial game back in the day:
https://ultrabolido.github.io/skRex/Rex ... 44381.html
I've dissected many games before, and there is a similar table in many commercial games. Usually there's the same pattern when you look at the bytes displayed as pixels.
This is really the fastest way to save many bytes especially when you have very large sprites (unfortunately people often won't use large sprites with SP1, but that could be because one has to store flipped versions of these sprites).
https://ultrabolido.github.io/skRex/Rex ... 44381.html
I've dissected many games before, and there is a similar table in many commercial games. Usually there's the same pattern when you look at the bytes displayed as pixels.
This is really the fastest way to save many bytes especially when you have very large sprites (unfortunately people often won't use large sprites with SP1, but that could be because one has to store flipped versions of these sprites).
Re: Flipping Bytes & Bits!
Just for your information, Alvin and I had a small discussion about the problem of flipped sprites before: viewtopic.php?t=8912stefano wrote: ↑Fri Dec 01, 2023 6:26 pm SP1 works on a fixed sprite size. The picture data is provided by the programmer, thus the flip/rotate routines I mentioned could be applied to such data to save space.
I'm not very good with SP1 but a data manipulation routine should be easy to implement.
EDIT: your idea is very interesting. Dealing with nibbles is not very fast on the Z80 (try using RLD on the entire screen) but often it's just a matter of finding the right way to do things..
My current (2023) opinion is that sp1 is brilliant and is optimised for speed and is _the_ library to use in most cases. Unfortunately, the focus on speed and versatility and 1 pixel movement means that every sprite takes a huge chunk of memory (for example the shifting tables). The fact that most games use mirrored sprites, and as a result the need for authors to store separate versions of it, means that all practical uses of sprites in SP1 will end up with few small 16x16 sprites.
Supporting Mirroring would help, but now that I understand that SP1 is basically a JIT sprite compiler, means that adding support for mirroring could be a lot of work, and is possibly not backward compatible.
This (=the huge memory footprint) is also one of the main reasons that I haven't used SP1 lately.
Re: Flipping Bytes & Bits!
Hello my fellow z88dk'ers
A big thanks for all the input!!
So i have been experimenting with various results - so here are some "solutions"
bare with me.... this is a long one ( you have been cautioned ).
Preprocessor
so first we need a function that can actually mirror the byte so that "10111011" = "11011101" / "10000000" = "00000001"
so its time to get shifty
*please note that I cast the return of byte. this is because "byte promotion to int" may exist here, this is more my habit than it being a requirement
So the first test is to create an index of all possible mirrored bytes, 256 entities to be sure.
Thats the retainer file, while initially 256bytes sounds like a lot for a 128k machine, truth be said this reduces the overall memory foot print considerably, so much so I can now have three times as much "tile data" in the same space.
so now "mbyte_buffer[]" filled with mirrored byte values so to get a mirrored byte value it is as simple as taking the value of the one you have and using this as an index.
the following function writes 8 bytes ( a cell ) to "*dest" which is a zx spectrum DISPLAY FILE pointer.
to use this there is a small dependency and index to help speed things up
With this I decided to break the "Display File" in to CELL's - 24 (0-23) x 32 ( 0 -31 ) respectively. Those that know the ZX Spectrum Display File will lso know that it is in fact segmented ( non linear indexing ) in its vector alignment, distributed over 3 fields.
So instead of having to pre-calculate an X/Y Cell coordinate for the Displayfile I am using 24 bytes to represent the starting "Y" coordinate of the Cell in the Display File. With this I can simply add the "X" coordinate to get to the CELL I want to manuipulate. One could use a 192 byte buffer to represent all the byte lines of the "Y" axis - but thats for another day
Thats the retainer
Indeed its neither pretty or elegant, but its does get the job done.
Now one more function so we can derive the X/Y coordinates of the DISPLAY FILE
yeah... i know I am lazy...
lets tie this together with an example
time for a little inline Z80 assembly
The above just puts 8 bytes of data into the DISPLAY FILE, starting from *scr_ptr, and incrementing it by 256 bytes to get to the next line down in the CELL.
back to the C code
the "write_cell_mirror()" function is very similar to the assembly routine, however, it uses the value of the "*src" byte as an index when writing it the Display File "*dest"
This all seems to work well and indeed performs as expected... however when I use this code mentioned above
this code essentially does the byte conversion / shifting on the fly not needing a 256 byte ( index ) buffer
So I have tested both methods here, there is a caveat however, over 50 cells to the Display File *directly* and there is very little difference in speed between the two solutions.
The caveat being is that my project uses a buffer to "put the images into" and then "draws" this to the screen using the ASM function seen above.
So here are some lingering questions -
Is there a way I can compare the compiled output of ZCC, i.e. to output to assembly?
Is there a faster way to do this, perhaps a z80 assembly version of the byte mirror shifting
is this post way too long?
Thanks for reading
Z.
A big thanks for all the input!!
So i have been experimenting with various results - so here are some "solutions"
bare with me.... this is a long one ( you have been cautioned ).
Preprocessor
Code: Select all
// typedef
typedef unsigned char ubyte;
typedef unsigned int uword;
so its time to get shifty
Code: Select all
ubyte mbyte( ubyte byte );
ubyte mbyte( ubyte byte )
{
// bit shift the byte
byte = ( byte & 0xF0) >> 4 | ( byte & 0x0F) << 4;
byte = ( byte & 0xCC) >> 2 | ( byte & 0x33) << 2;
byte = ( byte & 0xAA) >> 1 | ( byte & 0x55) << 1;
return ( ubyte ) byte;
}
So the first test is to create an index of all possible mirrored bytes, 256 entities to be sure.
Code: Select all
// global vectors
ubyte mbyte_buffer[256];
Code: Select all
// functions
void init_mbyte_buffer( void );
void init_mbyte_buffer( void )
{
register ubyte index;
for( index = 0; index < 255; index +=1 )
{
mbyte_buffer[ index ] = mbyte( index );
}
}
the following function writes 8 bytes ( a cell ) to "*dest" which is a zx spectrum DISPLAY FILE pointer.
Code: Select all
void write_mirror_cell( void *dest, ubyte *src );
void write_mirror_cell( void *dest, ubyte *src )
{
register ubyte index = 0;
for( index =0; index < 8; index += 1 )
{
*dest = mbyte( src[ index ]);
dest += 256;
}
}
Code: Select all
// system static defines
#define Screen_File 16384;
So instead of having to pre-calculate an X/Y Cell coordinate for the Displayfile I am using 24 bytes to represent the starting "Y" coordinate of the Cell in the Display File. With this I can simply add the "X" coordinate to get to the CELL I want to manuipulate. One could use a 192 byte buffer to represent all the byte lines of the "Y" axis - but thats for another day
Code: Select all
// Global Vectores
uword ScreenBitmap_24[ 24 ];
Thats the retainer
Code: Select all
void init_ScreenBitmap_24( void ); // initializes an array of Screen File pointer address in the Y dimension
void init_ScreenBitmap_24( void ) // Array size is 24 bytes, reduces add and offset opps by half
{
register uword index = 0, row = 0; // this implementation divides up the screen into 32 cells wide ( 0 - 31 ) and 24 Cells long ( 0 - 23 )
// As the spectrum doesn't have a linear bitmap to the screen a small translation of cell location
// based on the tirciary of the screen with a Y lookup table
while ( index < 24 )
{
ScreenBitmap_24[ index ] = Screen_File;
if( index > 7 ) // if index is in the 2nd or 3rd bank of the screen file
{
if( index > 15 )
{
ScreenBitmap_24[ index ] = ScreenBitmap_24[ index ] + 3584; // the third bank offset of the screen file (less 256)
}
else
{
ScreenBitmap_24[ index ] = ScreenBitmap_24[ index ] + 1792; // the second bank offset of the screen file (less 256)
}
}
ScreenBitmap_24[ index ] = ScreenBitmap_24[ index ] + ( index * 32 ); // the Y cell location of the Screen File Bank
index = index +1;
};
}
Now one more function so we can derive the X/Y coordinates of the DISPLAY FILE
Code: Select all
void *get_screen_cell_XY( ubyte x, ubyte y ); // returns the Screen File address of the cell
void *get_screen_cell_XY( ubyte x, ubyte y ) // on the screen
{
return (void *) ScreenBitmap_24[ y ] + x;
}
lets tie this together with an example
Code: Select all
uword main( void )
{
void *scr_ptr;
ubyte cdata[8];
// a simple left leaning triangle
cdata[ 0 ] = 128;
cdata[ 1 ] = 192;
cdata[ 2 ] = 224;
cdata[ 3 ] = 240;
cdata[ 4 ] = 248;
cdata[ 5 ] = 252;
cdata[ 6 ] = 254;
cdata[ 7 ] = 255;
// now lets put this to the screen
scr_ptr = get_screen_cell_XY( 5, 5 );
asm_Write_Screen_Cell( scr_ptr, &cdata[0]); // This is a fast ( and very simple assembly routine to draw a cell
Code: Select all
void asm_Write_Screen_Cell( void *scr_ptr, unsigned char *cell_data ); // Optimized [ 278 Ts ]
void asm_Write_Screen_Cell( void *scr_ptr, unsigned char *cell_data )
{
#asm
ld hl,2 ; // [ 10 Ts ] jump the return address of the stack
add hl,sp ; // [ 11 Ts ] by incrementing the stack ptr by 2
ld c,(hl) ; // [ 7 Ts ] load the address of the *byte ptr from the stack to bc
inc hl ; // [ 6 Ts ] "" *Remember Least Significant Byte First
ld b,(hl) ; // [ 7 Ts ] ""
ld a,(bc) ; // [ 7 Ts ] store the value of BC in a "cell_data"
ld hl,4 ; // [ 10 Ts ] add a four byte jump to the stack
add hl,sp ; // [ 11 Ts ] getting to scr_ptr
ld e,(hl) ; // [ 7 Ts ] get the address
inc hl ; // [ 6 Ts ] of the
ld d,(hl) ; // [ 7 Ts ] "*scr_ptr"
ex de,hl ; // [ 4 Ts ] exchange registers
// [ 93 Ts ] Initialization
; // rolled out for speed
ld (hl), a ; // [ 7 Ts ] put the byte in the screen file
// [ 24 Ts ]
inc h ; // [ 4 Ts ] add 256 to get to the next row in the cell
inc bc ; // [ 6 Ts ] add 1 to the address of the byte
ld a,(bc) ; // [ 7 Ts ] put the value of the pointer of BC into A
ld (hl), a ; // [ 7 Ts ] put the byte in the screen file
// [ 24 Ts ]
inc h ; // add 256 to get to the next row in the cell
inc bc ; // add 1 to the address of the byte
ld a,(bc) ; // put the value of the pointer of BC into A
ld (hl), a ; // put the byte in the screen file
// [ 24 Ts ]
inc h ; // add 256 to get to the next row in the cell
inc bc ; // add 1 to the address of the byte
ld a,(bc) ; // put the value of the pointer of BC into A
ld (hl), a ; // put the byte in the screen file
// [ 24 Ts ]
inc h ; // add 256 to get to the next row in the cell
inc bc ; // add 1 to the address of the byte
ld a,(bc) ; // put the value of the pointer of BC into A
ld (hl), a ; // put the byte in the screen file
// [ 24 Ts ]
inc h ; // add 256 to get to the next row in the cell
inc bc ; // add 1 to the address of the byte
ld a,(bc) ; // put the value of the pointer of BC into A
ld (hl), a ; // put the byte in the screen file
// [ 24 Ts ]
inc h ; // add 256 to get to the next row in the cell
inc bc ; // add 1 to the address of the byte
ld a,(bc) ; // put the value of the pointer of BC into A
ld (hl), a ; // put the byte in the screen file
// [ 24 Ts ]
inc h ; // add 256 to get to the next row in the cell
inc bc ; // add 1 to the address of the byte
ld a,(bc) ; // put the value of the pointer of BC into A
ld (hl), a ; // put the byte in the screen file
ret ; // We are done here
// [ 278 Ts ] Execution Time
#endasm
}
back to the C code
Code: Select all
scr_ptr = get_screen_cell_XY( 6, 6); // get the DISPLAY FILE address for CELL X/Y
write_mirror_cell( scr_ptr, &cdata[0] );
}
Code: Select all
void write_cell_mirror( ubyte *mbuffer, void *dest, ubyte *src );
void write_cell_mirror( ubyte *mbuffer, void *dest, ubyte *src )
{
ubyte index=0;
// l0
*dest = mbuffer[ src[ index ] ];
// update for Display File;
// l1
dest = dest + 256;
index += 1;
*dest = mbuffer[ src[ index ] ];
// l2
dest = dest + 256;
index += 1;
*dest = mbuffer[ src[ index ] ];
// l3
dest = dest + 256;
index += 1;
*dest = mbuffer[ src[ index ] ];
// l4
dest = dest + 256;
index += 1;
*dest = mbuffer[ src[ index ] ];
// l5
dest = dest + 256;
index += 1;
*dest = mbuffer[ src[ index ] ];
// l6
dest = dest + 256;
index += 1;
*dest = mbuffer[ src[ index ] ];
// l7
dest = dest + 256;
index += 1;
*dest = mbuffer[ src[ index ] ];
}
Code: Select all
void write_mirror_cell( void *dest, ubyte *src );
void write_mirror_cell( void *dest, ubyte *src )
{
register ubyte index = 0;
for( index =0; index < 8; index += 1 )
{
*dest = mbyte( src[ index ]);
dest += 256;
}
}
Code: Select all
scr_ptr = get_screen_cell_XY( 8, 6);
write_mirror_cell( scr_ptr, &cdata[0] );
return 0;
}
The caveat being is that my project uses a buffer to "put the images into" and then "draws" this to the screen using the ASM function seen above.
So here are some lingering questions -
Is there a way I can compare the compiled output of ZCC, i.e. to output to assembly?
Is there a faster way to do this, perhaps a z80 assembly version of the byte mirror shifting
is this post way too long?
Thanks for reading
Z.
Re: Flipping Bytes & Bits!
If you can limit your code to blind computation, then we have "ticks", it is part of the z88dk kit.
It is a modified CP/M emulator benchmarking the code for you.
Optimizing the code for speed usually involves assembly. I suppose that most of the optimization attempts will be something like "choosing the other way round" everytime ? You could try the Timmy's approach and compare the timing.
It is a modified CP/M emulator benchmarking the code for you.
Optimizing the code for speed usually involves assembly. I suppose that most of the optimization attempts will be something like "choosing the other way round" everytime ? You could try the Timmy's approach and compare the timing.
Re: Flipping Bytes & Bits!
If you are working in C instead of assembly, there are also interesting routines in spectrum.h, like , instead of writing your own get_screen_cell_XY(). Then again, you learn a lot when you trying writing them yourself.
More info: https://github.com/z88dk/z88dk/wiki/LIB ... nipulators
I personally will write assembly for these routines. Maybe I can try to write one soon.
Code: Select all
uchar *zx_cyx2saddr(uchar row, uchar col)
More info: https://github.com/z88dk/z88dk/wiki/LIB ... nipulators
I personally will write assembly for these routines. Maybe I can try to write one soon.
Re: Flipping Bytes & Bits!
I'm reviving this old thread to inform you that I've pulled a function to flip vertically the picture in a monochrome sprite, it's 8080 compatible.
I also updated the library to get a sprite plotting variant for the 8080.
The horizontal flipping is way more tricky, and surely slow but I'm trying
I also updated the library to get a sprite plotting variant for the 8080.
The horizontal flipping is way more tricky, and surely slow but I'm trying
Re: Flipping Bytes & Bits!
This is probably the best collection of "Bit Twiddling Hacks": https://graphics.stanford.edu/~seander/ ... rseObvious
This is in C, but should give you an idea for assembler. Though I agree with others in this thread, on many 8 or 16 bit platforms a lookup table is the best solution if you can spare the memory. (On more modern platforms the cache loads may invalidate any savings from the table lookup.)
This is in C, but should give you an idea for assembler. Though I agree with others in this thread, on many 8 or 16 bit platforms a lookup table is the best solution if you can spare the memory. (On more modern platforms the cache loads may invalidate any savings from the table lookup.)
Re: Flipping Bytes & Bits!
Wow, it is very exaustive !
My implementation is still being debugged but works fine already on big pictures (I'm fixing the "single byte" conditions).
The nice feature I'm getting is a horizontal flip algorithm working on any picture size, with an even or odd number of horizontal bytes or pixels.
What I noticed in assembler is that, taking apart the tricks like pre-computed tables and loop unrolling, the actual loop optimization matters even more: if you run out of spare registers or need slow instructions, then a different approach which in theory should be slower could provide smaller and faster code.
Also the apparent size paradox must be considered, those accumulator related instructions fly.. you can put many in sequence and probably you'll be faster than a single operation with 16bit registers.
My implementation is still being debugged but works fine already on big pictures (I'm fixing the "single byte" conditions).
The nice feature I'm getting is a horizontal flip algorithm working on any picture size, with an even or odd number of horizontal bytes or pixels.
What I noticed in assembler is that, taking apart the tricks like pre-computed tables and loop unrolling, the actual loop optimization matters even more: if you run out of spare registers or need slow instructions, then a different approach which in theory should be slower could provide smaller and faster code.
Also the apparent size paradox must be considered, those accumulator related instructions fly.. you can put many in sequence and probably you'll be faster than a single operation with 16bit registers.
Re: Flipping Bytes & Bits!
By the way, I'm putting the new sprite modification functions here:
https://github.com/z88dk/z88dk/tree/mas ... gfx/common
https://github.com/z88dk/z88dk/tree/mas ... gfx/common