version 1.1, 2002/07/24 08:00:16 |
version 1.2, 2003/06/24 05:11:38 |
|
|
<HTML> |
<HTML> |
<HEAD> |
<HEAD> |
<TITLE> Conservative GC Algorithmic Overview </TITLE> |
<TITLE> Conservative GC Algorithmic Overview </TITLE> |
<AUTHOR> Hans-J. Boehm, Silicon Graphics</author> |
<AUTHOR> Hans-J. Boehm, HP Labs (Much of this was written at SGI)</author> |
</HEAD> |
</HEAD> |
<BODY> |
<BODY> |
<H1> <I>This is under construction</i> </h1> |
<H1> <I>This is under construction</i> </h1> |
Line 96 typically on the order of the page size. |
|
Line 96 typically on the order of the page size. |
|
<P> |
<P> |
Large block sizes are rounded up to |
Large block sizes are rounded up to |
the next multiple of <TT>HBLKSIZE</tt> and then allocated by |
the next multiple of <TT>HBLKSIZE</tt> and then allocated by |
<TT>GC_allochblk</tt>. This uses roughly what Paul Wilson has termed |
<TT>GC_allochblk</tt>. Recent versions of the collector |
a "next fit" algorithm, i.e. first-fit with a rotating pointer. |
use an approximate best fit algorithm by keeping free lists for |
The implementation does check for a better fitting immediately |
several large block sizes. |
adjacent block, which gives it somewhat better fragmentation characteristics. |
The actual |
I'm now convinced it should use a best fit algorithm. The actual |
|
implementation of <TT>GC_allochblk</tt> |
implementation of <TT>GC_allochblk</tt> |
is significantly complicated by black-listing issues |
is significantly complicated by black-listing issues |
(see below). |
(see below). |
<P> |
<P> |
Small blocks are allocated in blocks of size <TT>HBLKSIZE</tt>. |
Small blocks are allocated in chunks of size <TT>HBLKSIZE</tt>. |
Each block is |
Each chunk is |
dedicated to only one object size and kind. The allocator maintains |
dedicated to only one object size and kind. The allocator maintains |
separate free lists for each size and kind of object. |
separate free lists for each size and kind of object. |
<P> |
<P> |
|
Once a large block is split for use in smaller objects, it can only |
|
be used for objects of that size, unless the collector discovers a completely |
|
empty chunk. Completely empty chunks are restored to the appropriate |
|
large block free list. |
|
<P> |
In order to avoid allocating blocks for too many distinct object sizes, |
In order to avoid allocating blocks for too many distinct object sizes, |
the collector normally does not directly allocate objects of every possible |
the collector normally does not directly allocate objects of every possible |
request size. Instead request are rounded up to one of a smaller number |
request size. Instead request are rounded up to one of a smaller number |
Line 139 expand the heap. Otherwise, we initiate a garbage col |
|
Line 143 expand the heap. Otherwise, we initiate a garbage col |
|
that the amount of garbage collection work per allocated byte remains |
that the amount of garbage collection work per allocated byte remains |
constant. |
constant. |
<P> |
<P> |
The above is in fat an oversimplification of the real heap expansion |
The above is in fact an oversimplification of the real heap expansion |
heuristic, which adjusts slightly for root size and certain kinds of |
and GC triggering heuristic, which adjusts slightly for root size |
fragmentation. In particular, programs with a large root set size and |
and certain kinds of |
|
fragmentation. In particular: |
|
<UL> |
|
<LI> Programs with a large root set size and |
little live heap memory will expand the heap to amortize the cost of |
little live heap memory will expand the heap to amortize the cost of |
scanning the roots. |
scanning the roots. |
<P> |
<LI> Versions 5.x of the collector actually collect more frequently in |
Versions 5.x of the collector actually collect more frequently in |
|
nonincremental mode. The large block allocator usually refuses to split |
nonincremental mode. The large block allocator usually refuses to split |
large heap blocks once the garbage collection threshold is |
large heap blocks once the garbage collection threshold is |
reached. This often has the effect of collecting well before the |
reached. This often has the effect of collecting well before the |
heap fills up, thus reducing fragmentation and working set size at the |
heap fills up, thus reducing fragmentation and working set size at the |
expense of GC time. 6.x will chose an intermediate strategy depending |
expense of GC time. Versions 6.x choose an intermediate strategy depending |
on how much large object allocation has taken place in the past. |
on how much large object allocation has taken place in the past. |
(If the collector is configured to unmap unused pages, versions 6.x |
(If the collector is configured to unmap unused pages, versions 6.x |
will use the 5.x strategy.) |
use the 5.x strategy.) |
|
<LI> In calculating the amount of allocation since the last collection we |
|
give partial credit for objects we expect to be explicitly deallocated. |
|
Even if all objects are explicitly managed, it is often desirable to collect |
|
on rare occasion, since that is our only mechanism for coalescing completely |
|
empty chunks. |
|
</ul> |
<P> |
<P> |
(It has been suggested that this should be adjusted so that we favor |
It has been suggested that this should be adjusted so that we favor |
expansion if the resulting heap still fits into physical memory. |
expansion if the resulting heap still fits into physical memory. |
In many cases, that would no doubt help. But it is tricky to do this |
In many cases, that would no doubt help. But it is tricky to do this |
in a way that remains robust if multiple application are contending |
in a way that remains robust if multiple application are contending |
for a single pool of physical memory.) |
for a single pool of physical memory. |
|
|
<H2>Mark phase</h2> |
<H2>Mark phase</h2> |
|
|
|
|
<LI> <TT>MS_NONE</tt> indicating that reachable objects are marked. |
<LI> <TT>MS_NONE</tt> indicating that reachable objects are marked. |
</ol> |
</ol> |
|
|
The core mark routine <TT>GC_mark_from_mark_stack</tt>, is called |
The core mark routine <TT>GC_mark_from</tt>, is called |
repeatedly by several of the sub-phases when the mark stack starts to fill |
repeatedly by several of the sub-phases when the mark stack starts to fill |
up. It is also called repeatedly in <TT>MS_ROOTS_PUSHED</tt> state |
up. It is also called repeatedly in <TT>MS_ROOTS_PUSHED</tt> state |
to empty the mark stack. |
to empty the mark stack. |
Line 213 each call, so that it can also be used by the incremen |
|
Line 225 each call, so that it can also be used by the incremen |
|
It is fairly carefully tuned, since it usually consumes a large majority |
It is fairly carefully tuned, since it usually consumes a large majority |
of the garbage collection time. |
of the garbage collection time. |
<P> |
<P> |
|
The fact that it perform a only a small amount of work per call also |
|
allows it to be used as the core routine of the parallel marker. In that |
|
case it is normally invoked on thread-private mark stacks instead of the |
|
global mark stack. More details can be found in |
|
<A HREF="scale.html">scale.html</a> |
|
<P> |
The marker correctly handles mark stack overflows. Whenever the mark stack |
The marker correctly handles mark stack overflows. Whenever the mark stack |
overflows, the mark state is reset to <TT>MS_INVALID</tt>. |
overflows, the mark state is reset to <TT>MS_INVALID</tt>. |
Since there are already marked objects in the heap, |
Since there are already marked objects in the heap, |
Line 281 Unmarked large objects are immediately returned to the |
|
Line 299 Unmarked large objects are immediately returned to the |
|
Each small object page is checked to see if all mark bits are clear. |
Each small object page is checked to see if all mark bits are clear. |
If so, the entire page is returned to the large object free list. |
If so, the entire page is returned to the large object free list. |
Small object pages containing some reachable object are queued for later |
Small object pages containing some reachable object are queued for later |
sweeping. |
sweeping, unless we determine that the page contains very little free |
|
space, in which case it is not examined further. |
<P> |
<P> |
This initial sweep pass touches only block headers, not |
This initial sweep pass touches only block headers, not |
the blocks themselves. Thus it does not require significant paging, even |
the blocks themselves. Thus it does not require significant paging, even |
Line 341 object itself becomes marked, we have uncovered |
|
Line 360 object itself becomes marked, we have uncovered |
|
a cycle involving the object. This usually results in a warning from the |
a cycle involving the object. This usually results in a warning from the |
collector. Such objects are not finalized, since it may be |
collector. Such objects are not finalized, since it may be |
unsafe to do so. See the more detailed |
unsafe to do so. See the more detailed |
<A HREF="finalization.html"> discussion of finalization semantics</a>. |
<A HREF="http://www.hpl.hp.com/personal/Hans_Boehm/gc/finalization.html"> discussion of finalization semantics</a>. |
<P> |
<P> |
Any objects remaining unmarked at the end of this process are added to |
Any objects remaining unmarked at the end of this process are added to |
a queue of objects whose finalizers can be run. Depending on collector |
a queue of objects whose finalizers can be run. Depending on collector |
configuration, finalizers are dequeued and run either implicitly during |
configuration, finalizers are dequeued and run either implicitly during |
allocation calls, or explicitly in response to a user request. |
allocation calls, or explicitly in response to a user request. |
|
(Note that the former is unfortunately both the default and not generally safe. |
|
If finalizers perform synchronization, it may result in deadlocks. |
|
Nontrivial finalizers generally need to perform synchronization, and |
|
thus require a different collector configuration.) |
<P> |
<P> |
The collector provides a mechanism for replacing the procedure that is |
The collector provides a mechanism for replacing the procedure that is |
used to mark through objects. This is used both to provide support for |
used to mark through objects. This is used both to provide support for |
Line 354 Java-style unordered finalization, and to ignore certa |
|
Line 377 Java-style unordered finalization, and to ignore certa |
|
<I>e.g.</i> those arising from C++ implementations of virtual inheritance. |
<I>e.g.</i> those arising from C++ implementations of virtual inheritance. |
|
|
<H2>Generational Collection and Dirty Bits</h2> |
<H2>Generational Collection and Dirty Bits</h2> |
We basically use the parallel and generational GC algorithm described in |
We basically use the concurrent and generational GC algorithm described in |
<A HREF="papers/pldi91.ps.gz">"Mostly Parallel Garbage Collection"</a>, |
<A HREF="http://www.hpl.hp.com/personal/Hans_Boehm/gc/papers/pldi91.ps.Z">"Mostly Parallel Garbage Collection"</a>, |
by Boehm, Demers, and Shenker. |
by Boehm, Demers, and Shenker. |
<P> |
<P> |
The most significant modification is that |
The most significant modification is that |
the collector always runs in the allocating thread. |
the collector always starts running in the allocating thread. |
There is no separate garbage collector thread. |
There is no separate garbage collector thread. (If parallel GC is |
|
enabled, helper threads may also be woken up.) |
If an allocation attempt either requests a large object, or encounters |
If an allocation attempt either requests a large object, or encounters |
an empty small object free list, and notices that there is a collection |
an empty small object free list, and notices that there is a collection |
in progress, it immediately performs a small amount of marking work |
in progress, it immediately performs a small amount of marking work |
Line 389 cannot be satisfied from small object free lists. When |
|
Line 413 cannot be satisfied from small object free lists. When |
|
the set of modified pages is retrieved, and we mark once again from |
the set of modified pages is retrieved, and we mark once again from |
marked objects on those pages, this time with the mutator stopped. |
marked objects on those pages, this time with the mutator stopped. |
<P> |
<P> |
We keep track of modified pages using one of three distinct mechanisms: |
We keep track of modified pages using one of several distinct mechanisms: |
<OL> |
<OL> |
<LI> |
<LI> |
Through explicit mutator cooperation. Currently this requires |
Through explicit mutator cooperation. Currently this requires |
the use of <TT>GC_malloc_stubborn</tt>. |
the use of <TT>GC_malloc_stubborn</tt>, and is rarely used. |
<LI> |
<LI> |
By write-protecting physical pages and catching write faults. This is |
(<TT>MPROTECT_VDB</tt>) By write-protecting physical pages and |
|
catching write faults. This is |
implemented for many Unix-like systems and for win32. It is not possible |
implemented for many Unix-like systems and for win32. It is not possible |
in a few environments. |
in a few environments. |
<LI> |
<LI> |
By retrieving dirty bit information from /proc. (Currently only Sun's |
(<TT>PROC_VDB</tt>) By retrieving dirty bit information from /proc. |
|
(Currently only Sun's |
Solaris supports this. Though this is considerably cleaner, performance |
Solaris supports this. Though this is considerably cleaner, performance |
may actually be better with mprotect and signals.) |
may actually be better with mprotect and signals.) |
|
<LI> |
|
(<TT>PCR_VDB</tt>) By relying on an external dirty bit implementation, in this |
|
case the one in Xerox PCR. |
|
<LI> |
|
(<TT>DEFAULT_VDB</tt>) By treating all pages as dirty. This is the default if |
|
none of the other techniques is known to be usable, and |
|
<TT>GC_malloc_stubborn</tt> is not used. Practical only for testing, or if |
|
the vast majority of objects use <TT>GC_malloc_stubborn</tt>. |
</ol> |
</ol> |
|
|
|
<H2>Black-listing</h2> |
|
|
|
The collector implements <I>black-listing</i> of pages, as described |
|
in |
|
<A HREF="http://www.acm.org/pubs/citations/proceedings/pldi/155090/p197-boehm/"> |
|
Boehm, ``Space Efficient Conservative Collection'', PLDI '93</a>, also available |
|
<A HREF="papers/pldi93.ps.Z">here</a>. |
|
<P> |
|
During the mark phase, the collector tracks ``near misses'', i.e. attempts |
|
to follow a ``pointer'' to just outside the garbage-collected heap, or |
|
to a currently unallocated page inside the heap. Pages that have been |
|
the targets of such near misses are likely to be the targets of |
|
misidentified ``pointers'' in the future. To minimize the future |
|
damage caused by such misidentifications they will be allocated only to |
|
small pointerfree objects. |
|
<P> |
|
The collector understands two different kinds of black-listing. A |
|
page may be black listed for interior pointer references |
|
(<TT>GC_add_to_black_list_stack</tt>), if it was the target of a near |
|
miss from a location that requires interior pointer recognition, |
|
<I>e.g.</i> the stack, or the heap if <TT>GC_all_interior_pointers</tt> |
|
is set. In this case, we also avoid allocating large blocks that include |
|
this page. |
|
<P> |
|
If the near miss came from a source that did not require interior |
|
pointer recognition, it is black-listed with |
|
<TT>GC_add_to_black_list_normal</tt>. |
|
A page black-listed in this way may appear inside a large object, |
|
so long as it is not the first page of a large object. |
|
<P> |
|
The <TT>GC_allochblk</tt> routine respects black-listing when assigning |
|
a block to a particular object kind and size. It occasionally |
|
drops (i.e. allocates and forgets) blocks that are completely black-listed |
|
in order to avoid excessively long large block free lists containing |
|
only unusable blocks. This would otherwise become an issue |
|
if there is low demand for small pointerfree objects. |
|
|
<H2>Thread support</h2> |
<H2>Thread support</h2> |
We support several different threading models. Unfortunately Pthreads, |
We support several different threading models. Unfortunately Pthreads, |
the only reasonably well standardized thread model, supports too narrow |
the only reasonably well standardized thread model, supports too narrow |
an interface for conservative garbage collection. There appears to be |
an interface for conservative garbage collection. There appears to be |
no portable way to allow the collector to coexist with various Pthreads |
no completely portable way to allow the collector to coexist with various Pthreads |
implementations. Hence we currently support only a few of the more |
implementations. Hence we currently support only a few of the more |
common Pthreads implementations. |
common Pthreads implementations. |
<P> |
<P> |
In particular, it is very difficult for the collector to stop all other |
In particular, it is very difficult for the collector to stop all other |
threads in the system and examine the register contents. This is currently |
threads in the system and examine the register contents. This is currently |
accomplished with very different mechanisms for different Pthreads |
accomplished with very different mechanisms for some Pthreads |
implementations. The Solaris implementation temporarily disables much |
implementations. The Solaris implementation temporarily disables much |
of the user-level threads implementation by stopping kernel-level threads |
of the user-level threads implementation by stopping kernel-level threads |
("lwp"s). The Irix implementation sends signals to individual Pthreads |
("lwp"s). The Linux/HPUX/OSF1 and Irix implementations sends signals to |
and has them wait in the signal handler. The Linux implementation |
individual Pthreads and has them wait in the signal handler. |
is similar in spirit to the Irix one. |
|
<P> |
<P> |
The Irix implementation uses |
The Linux and Irix implementations use |
only documented Pthreads calls, but relies on extensions to their semantics, |
only documented Pthreads calls, but rely on extensions to their semantics. |
notably the use of mutexes and condition variables from signal |
The Linux implementation <TT>linux_threads.c</tt> relies on only very |
handlers. The Linux implementation should be far closer to |
mild extensions to the pthreads semantics, and already supports a large number |
portable, though impirically it is not completely portable. |
of other Unix-like pthreads implementations. Our goal is to make this the |
|
only pthread support in the collector. |
<P> |
<P> |
|
(The Irix implementation is separate only for historical reasons and should |
|
clearly be merged. The current Solaris implementation probably performs |
|
better in the uniprocessor case, but does not support thread operations in the |
|
collector. Hence it cannot support the parallel marker.) |
|
<P> |
All implementations must |
All implementations must |
intercept thread creation and a few other thread-specific calls to allow |
intercept thread creation and a few other thread-specific calls to allow |
enumeration of threads and location of thread stacks. This is current |
enumeration of threads and location of thread stacks. This is current |
accomplished with <TT># define</tt>'s in <TT>gc.h</tt>, or optionally |
accomplished with <TT># define</tt>'s in <TT>gc.h</tt> |
|
(really <TT>gc_pthread_redirects.h</tt>), or optionally |
by using ld's function call wrapping mechanism under Linux. |
by using ld's function call wrapping mechanism under Linux. |
<P> |
<P> |
Comments are appreciated. Please send mail to |
Comments are appreciated. Please send mail to |
<A HREF="mailto:boehm@acm.org"><TT>boehm@acm.org</tt></a> |
<A HREF="mailto:boehm@acm.org"><TT>boehm@acm.org</tt></a> or |
|
<A HREF="mailto:Hans.Boehm@hp.com"><TT>Hans.Boehm@hp.com</tt></a> |
|
<P> |
|
This is a modified copy of a page written while the author was at SGI. |
|
The original was <A HREF="http://reality.sgi.com/boehm/gcdescr.html">here</a>. |
</body> |
</body> |
|
</html> |