[z88dk-dev] Option for automatic malloc() configuration

Bridge to the z88dk-developers mailing list
Post Reply
stefano
Well known member
Posts: 2137
Joined: Mon Jul 16, 2007 7:39 pm

[z88dk-dev] Option for automatic malloc() configuration

Post by stefano »

I think I finally found a way to use ASMTAIL to configure the heap for malloc() automatically !
the prototype at the moment is in cpm_crt0.asm and osca_crt0.asm.
It is optionally activated with the ‘-DAMALLOC’ option and takes 3/4 of the free memory (all the space between the ens of the program and the Stack Pointer) for the malloc heap, and leaves the reminder for stack.
It seems good to me... opinions ?
stefano
Well known member
Posts: 2137
Joined: Mon Jul 16, 2007 7:39 pm

Post by stefano »

I think I found an acceptable solution, the -DAMALLOC1, -DAMALLOC2, -DAMALLOC3 (or -DAMALLOC, which is the same) will respecively get 1/4, 2/4 or 3/4 of the free memory for malloc() leaving the remaining space to the stack ;)
The Alvin's lib is impressively well designed but I think it was lacking a simple way to use it ..
stefano
Well known member
Posts: 2137
Joined: Mon Jul 16, 2007 7:39 pm

Post by stefano »

The new option is ready !



Simplest way to use it:

zcc +target (...) -DAMALLOC -lmalloc program.c



In some case (i.e. the ZX Spectrum) it makes no sense to leave space for the stack, because it is normally placed before the compiled program: for such platforms any option will leave the whole memory for malloc(); if the stack is on top of memory then it is possible to roughly tell which portion of the free space is for malloc() and what has to be left for the stack with -DAMALLOC1.. -DAMALLOC3.

AMALLOC with no trailing numbers will act as -DAMALLOC3 where applicable.

ROM targets and FAR memory models are not supported, use the (very flexible) fuctions declared in malloc.h (see the wiki docs).



Since we have a lot of targets, I could not test it everywhere and it is possible that we have special cases which will not work (yet) with the new option.
siggi
Well known member
Posts: 344
Joined: Thu Jul 26, 2007 9:06 am

Post by siggi »

Hmm. somehow my message was lost, so a new attempt:
-----------------------------------------------------------------

Hi Stefano
what do I have to do when compiling for my ZX81
sometimes I compile for address 16514 (P file), having the stack behind the program.
sometimes I compile for address 32K or 40K, having the stack before the program.

Do I have to use different options?

Siggi
Last edited by siggi on Wed Jun 19, 2013 9:29 am, edited 1 time in total.
stefano
Well known member
Posts: 2137
Joined: Mon Jul 16, 2007 7:39 pm

Post by stefano »

Hmm. somehow my message was lost, so a new attempt:
bear with us, the 'developers' topic is bound to the sourceforge equivalent and we are not able anymore to keep the sync bi-directional.. emails are received, by the way.
what do I have to do when compiling for my ZX81>sometimes I compile for address 16514 (P file), having the stack behind the program.>sometimes I compile for address 32K or 40K, having the stack before the program.>>Do I have to use different options?
Siggi
Ciao Siggi, I was looking forward to your question :)
The -DAMALLOC option is just a trick to simplify your life and to help keeping the code portable when possible, it is perfect for a standard zx81, let's say with 16K.

We both know that some memory expansion does not permit code execution and that the zx81 stack does not go over the 16K space automatically, so some of the possible custom memory maps are for experienced zx81 programmers and they depend on how the zx81 is configured.
The Alvin's malloc library is a wonder, so let me show few possible ways to get the maximum from it:

1 - you have a normal z88dk style zx81 program, but you have more RAM in non-runnable locations, i.e. overlapping the ROM (supposing it is possible) or over the 48K
.. use -DAMALLOC1 to leave as much space as possible to the stack, then use sbrk to reserve extra block, i.e. sbrk(49152,16384)
Note that you can add more blocks if you have more space somewhere else to chain-in for malloc().

2 - you have a normal z88dk style zx81 program on HRG mode, but you moved the HRG page to the upper memory
.. use -DAMALLOC1 .. 3 depending on how much space you wish to leave for stack and add the free space left over the HRG page with sbrk

3 - you are Siggi
.. define your own PTR for heap (must be a public long variable), initialize it and add the memory blocks you wish with sbrk:
http://www.z88dk.org/wiki/doku.php?id=l ... allocation
siggi
Well known member
Posts: 344
Joined: Thu Jul 26, 2007 9:06 am

Post by siggi »

Ciao Stefano

thanks for the additional information :-)

Siggi

PS: I also used ASMTAIL (here: mytail) to find at runtime free space to store URLs in my Zeddyfox-Web-Browser, depending on the address of main():
#define FREE_MEM ((main >= 0x8000) ? mytail : 0xC000)



------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
stefano
Well known member
Posts: 2137
Joined: Mon Jul 16, 2007 7:39 pm

Post by stefano »

Good idea, I'm glad to know it works !


-----Messaggio originale-----
From: siggi (siggi@...)
Sent: Friday, June 21, 2013 10:15 PM
To: z88dk-developers@...
Subject: Re: [z88dk-dev] Option for automatic malloc() configuration

Ciao Stefano

thanks for the additional information :-)

Siggi

PS: I also used ASMTAIL (here: mytail) to find at runtime free space to
store URLs in my Zeddyfox-Web-Browser, depending on the address of main():
#define FREE_MEM ((main >= 0x8000) ? mytail : 0xC000)



------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev



------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
alvin
Well known member
Posts: 1872
Joined: Mon Jul 16, 2007 7:39 pm

Post by alvin »

I think it's good you've found a way to automatically set up the heap by default -- lots of people just want things to work out of the box but you can't really do that for a lot of situations on a small memory machine. I am moving more toward declarations of code and data segments and will probably do that for a default heap segment too.

There is a new version of malloc percolating that improves performance and adds a whole lot more besides. Part of the algorithm is similar to sdcc's allocator which allows O(1) realloc and free. This was one thing I wanted to revisit since some library functions needed to do realloc to accommodate incoming characters from a stream. So it's going to be:

(1) Again, support for multiple heaps; I think this is very important and z88dk's allocator is the only one that has this:

* different threads can have their own heaps so no lock is required to allocate memory (there are multithreading z80 systems out there!)
* different memory configurations can have their own heap so you can manage near memory allocation on bankswitched machines.
* different heaps can hold different kinds of memory. For example, the zx81 cannot execute code out of some memory areas so you could have one heap to allocate memory from a region that can execute code and another to allocate from a region that cannot.

(2) Again, any number of disjoint memory blocks can be added to a heap.

This allows scattered bits of available ram from around the memory map to be added to the heap

(3) New: allocate memory at a fixed address

You can request a memory allocation from a fixed address. It is not uncommon that a program needs to have memory at a specific address. Of course you're more likely to succeed in getting the block if the request is made at the start of the program.

(4) New: allocate memory from the largest available block in the heap

This would be done when the block will be constantly realloced to expand size as the program runs. By starting with the largest available block, the number of memory moves caused by realloc will be minimized, and likely zero if other memory is not freed while the realloc is going on. The new realloc algorithm is very fast when a memory block can be expanded in place.

(5) New: allocate aligned memory block

You can specify that the start address of the block needs to be aligned to some power of 2. So, eg, if you need a block of memory aligned on a 256-byte boundary for a table, you can now request one.


The allocation algorithm is still first fit currently but I have been thinking about first fit with hint where the hint would be a block following the one last allocated. Normally with a first fit algorithm you start at the beginning of the heap and linearly search for an empty spot big enough to hold it. The current z88dk implementation only holds a list of free blocks so this is not a problem. However this new implementation that allows O(1) realloc is similar to sdcc where you also traverse blocks that have already been allocated. So, if you've made 30 allocations (and no frees so that the allocations are adjacent) the first fit algorithm is searching fruitlessly through 30 blocks in the heap without any space available. This is potentially quite slow. The hint would tell the allocator to start search from the last block allocated rather than from the beginning of the heap each time.

I've been waffling on this since it would make malloc faster and I could make sure realloc hangs onto a large block for longer. But I'm not sure on such a small memory machine if there would be a lot of allocated blocks to search through. More importantly, I'm thinking the hint algorithm might lead to much worse fragmentation issues since it would distribute allocations throughout the heap's memory space. I don't know if you guys have any thoughts on that?


Anyway there are other means of memory allocation too. The balloc library is the very fast block memory allocator for small fixed size memory blocks. One byte overhead per block and no fragmentation if you can stick to fixed sizes.

There is also a compacted memory allocator coming up that will be used for environment variables. This one you provide a fixed memory space and you can add and remove items from it. The pointers and items themselves are stored in the block, with the data automatically moved around to keep allocations compacted. Out of block pointers can also be added (this is required by some environment C functions) which reference data outside the block that is not managed by the block. The overhead is two bytes per block for a pointer. Freeing would be slower as the block needs to be kept compacted but there is no fragmentation.

I'm not sure if the malloc+hint is worthwhile given the potential fragmentation problem and other options.



------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
alvin
Well known member
Posts: 1872
Joined: Mon Jul 16, 2007 7:39 pm

Post by alvin »

I settled on the following:

First fit algorithm. I do believe next fit has a much higher chance of fragmenting the heap into uselessness on small memory machines. The allocation algorithm has changed to allow O(1) realloc if the block can be resized in place and is the same as sdcc's. The heap now consists of a linked list of regions (contiguous memory added to the heap using sbrk), with each region containing internally linked memory blocks that have been allocated. The change in allocation algorithm means malloc must search through currently allocated blocks as well as free blocks but I don't think that is a high price to pay for the O(1) realloc. sdcc implements a doubly linked list of free blocks so has an O(1) free but I think the associated six byte overhead is too high, so I've chosen four byte overhead and O(n) free. So, O(1) realloc and O(n) malloc/free.

The functions:

malloc(size) :- first fit algorithm, O(n). Malloc(0) returns 0 without error indication.

realloc(p, size) :- O(1) if existing block can be resized, otherwise O(n). Allocation is *largest block first.* In particular, realloc(0, size) will get your memory request placed in the largest block in the heap as opposed to malloc(size) which would give you the first block large enough. You're guaranteed the returned block has maximum resizing potential. Realloc(0,0) returns a 0 sized memory block from the largest block in the heap.

calloc :- as malloc with allocated block initialized to zero

free(p) :- O(n)

aligned_alloc(alignment, size) :- O(n) allocation with resulting pointer aligned to power of 2 address. This is a proposed C1x function

fixed_alloc(p, size) :- O(n) attempt to allocate size bytes at address p. Not proposed by anyone but copies aligned_malloc naming style

sbrk(p, size) :- add size bytes at address p to the heap



The above functions use an implied process heap and actually call into these heap allocation functions which allows the program to specify a heap to allocate from.


HeapAlloc :- malloc
HeapRealloc :- realloc
HeapCalloc :- calloc
HeapFree :- free
HeapAllocAligned :- aligned_alloc
HeapAllocFixed :- fixed_alloc
HeapSbrk :- sbrk


The Heap* functions are using Win32's naming. I don't see any other standard names for allocating from different heaps. The closest I've found is HP's amalloc() which allocates out of an arena (heap in z88dk terminology) but it doesn't seem common at all. The funny thing is the more I look into far memory allocation, the more I realize the implementation has to follow what is done in Win32 rather than linux/unix. The linux/unix mmap for large blocks relies on an MMU to keep virtual memory contiguous whereas Win32's heaps and virtual alloc seems to have been invented before good MMUs, which is the path we have to take on the z80.



------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/ ... stg.clktrk
alvin
Well known member
Posts: 1872
Joined: Mon Jul 16, 2007 7:39 pm

Post by alvin »

aligned_alloc(alignment, size) :- O(n) allocation with resulting pointer aligned to power of 2 address. This is a proposed C1x function

fixed_alloc(p, size) :- O(n) attempt to allocate size bytes at address p. Not proposed by anyone but copies aligned_malloc naming style
Damn, I just realized aligned_alloc(a, size) requires size to be a multiple of the alignment. That pretty much makes it useless :P posix_memalign has an interface I don't like so maybe we should go to memalign(). Memalign is now deprecated in favour of posix_memalign, apparently because it doesn't necessarily require the pointer returned to be freeable with free.

memalign has the better interface, so I think replace the above with:

memalign(alignment, size) and make the pointer returned freeable

memfixed(p, size) for allocation at fixed address to copy the naming

Then add posix_memalign as option, which will do the same thing as memalign but with different interface. And add aligned_alloc as option since it appears to be coming out as part of C1x.



------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/ ... stg.clktrk
alvin
Well known member
Posts: 1872
Joined: Mon Jul 16, 2007 7:39 pm

Post by alvin »

Am 11.07.2013 09:40, schrieb alvin (alvin_albrecht@...):
aligned_alloc(alignment, size) :- O(n) allocation with resulting
pointer aligned to power of 2 address. This is a proposed C1x
function

fixed_alloc(p, size) :- O(n) attempt to allocate size bytes at
address p. Not proposed by anyone but copies aligned_malloc naming
style
Damn, I just realized aligned_alloc(a, size) requires size to be a
multiple of the alignment. That pretty much makes it useless :P
Why do you think the requirement on the size makes aligned_alloc() useless?
Philipp
P.S.: aligned_alloc() made it into the C11 standard, which has been
around for some time now; we already support some of the C11 keywords in
sdcc.
I'm not sure if this reply is making it into the forum since it's taking a while so I'll just quote it in.

I should be more careful -- aligned_alloc() is not useless but its utility is much diminished on a small memory machine. The requirement that size be a multiple of alignment can potentially eat up a lot of ram in cases when the user does not need a multiple of alignment bytes, so this is not really an embedded-friendly function. I readily acknowledge that a lot of the time you actually want to request a multiple of alignment bytes but to actually require it is an unnecessary restriction. I can't imagine why the standards committee would require this, except to optimize implementation. What I really want to have is memalign, which offers the same facility but allows the user to decide if he wants size to be a multiple of alignment. align_alloc can then be regarded as a specialization of memalign.

I hadn't realized C11 had become a standard -- the stuff that is on my hard drive must be more than two years old as I am consulting the "proposed C1x standard" there. I'll have to update those files to make sure I have the latest.



------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/ ... stg.clktrk
Post Reply