[BACK]Return to gcdescr.html CVS log [TXT][DIR] Up to [local] / OpenXM_contrib2 / asir2000 / gc / doc

Annotation of OpenXM_contrib2/asir2000/gc/doc/gcdescr.html, Revision 1.2

1.1       noro        1: <HTML>
                      2: <HEAD>
                      3:     <TITLE> Conservative GC Algorithmic Overview </TITLE>
1.2     ! noro        4:     <AUTHOR> Hans-J. Boehm, HP Labs (Much of this was written at SGI)</author>
1.1       noro        5: </HEAD>
                      6: <BODY>
                      7: <H1> <I>This is under construction</i> </h1>
                      8: <H1> Conservative GC Algorithmic Overview </h1>
                      9: <P>
                     10: This is a description of the algorithms and data structures used in our
                     11: conservative garbage collector.  I expect the level of detail to increase
                     12: with time.  For a survey of GC algorithms, see for example
                     13: <A HREF="ftp://ftp.cs.utexas.edu/pub/garbage/gcsurvey.ps"> Paul Wilson's
                     14: excellent paper</a>.  For an overview of the collector interface,
                     15: see <A HREF="gcinterface.html">here</a>.
                     16: <P>
                     17: This description is targeted primarily at someone trying to understand the
                     18: source code.  It specifically refers to variable and function names.
                     19: It may also be useful for understanding the algorithms at a higher level.
                     20: <P>
                     21: The description here assumes that the collector is used in default mode.
                     22: In particular, we assume that it used as a garbage collector, and not just
                     23: a leak detector.  We initially assume that it is used in stop-the-world,
                     24: non-incremental mode, though the presence of the incremental collector
                     25: will be apparent in the design.
                     26: We assume the default finalization model, but the code affected by that
                     27: is very localized.
                     28: <H2> Introduction </h2>
                     29: The garbage collector uses a modified mark-sweep algorithm.  Conceptually
                     30: it operates roughly in four phases:
                     31:
                     32: <OL>
                     33:
                     34: <LI>
                     35: <I>Preparation</i> Clear all mark bits, indicating that all objects
                     36: are potentially unreachable.
                     37:
                     38: <LI>
                     39: <I>Mark phase</i> Marks all objects that can be reachable via chains of
                     40: pointers from variables.  Normally the collector has no real information
                     41: about the location of pointer variables in the heap, so it
                     42: views all static data areas, stacks and registers as potentially containing
                     43: containing pointers.  Any bit patterns that represent addresses inside
                     44: heap objects managed by the collector are viewed as pointers.
                     45: Unless the client program has made heap object layout information
                     46: available to the collector, any heap objects found to be reachable from
                     47: variables are again scanned similarly.
                     48:
                     49: <LI>
                     50: <I>Sweep phase</i> Scans the heap for inaccessible, and hence unmarked,
                     51: objects, and returns them to an appropriate free list for reuse.  This is
                     52: not really a separate phase; even in non incremental mode this is operation
                     53: is usually performed on demand during an allocation that discovers an empty
                     54: free list.  Thus the sweep phase is very unlikely to touch a page that
                     55: would not have been touched shortly thereafter anyway.
                     56:
                     57: <LI>
                     58: <I>Finalization phase</i> Unreachable objects which had been registered
                     59: for finalization are enqueued for finalization outside the collector.
                     60:
                     61: </ol>
                     62:
                     63: <P>
                     64: The remaining sections describe the memory allocation data structures,
                     65: and then the last 3 collection phases in more detail. We conclude by
                     66: outlining some of the additional features implemented in the collector.
                     67:
                     68: <H2>Allocation</h2>
                     69: The collector includes its own memory allocator.  The allocator obtains
                     70: memory from the system in a platform-dependent way.  Under UNIX, it
                     71: uses either <TT>malloc</tt>, <TT>sbrk</tt>, or <TT>mmap</tt>.
                     72: <P>
                     73: Most static data used by the allocator, as well as that needed by the
                     74: rest of the garbage collector is stored inside the
                     75: <TT>_GC_arrays</tt> structure.
                     76: This allows the garbage collector to easily ignore the collectors own
                     77: data structures when it searches for root pointers.  Other allocator
                     78: and collector internal data structures are allocated dynamically
                     79: with <TT>GC_scratch_alloc</tt>. <TT>GC_scratch_alloc</tt> does not
                     80: allow for deallocation, and is therefore used only for permanent data
                     81: structures.
                     82: <P>
                     83: The allocator allocates objects of different <I>kinds</i>.
                     84: Different kinds are handled somewhat differently by certain parts
                     85: of the garbage collector.  Certain kinds are scanned for pointers,
                     86: others are not.  Some may have per-object type descriptors that
                     87: determine pointer locations.  Or a specific kind may correspond
                     88: to one specific object layout.  Two built-in kinds are uncollectable.
                     89: One (<TT>STUBBORN</tt>) is immutable without special precautions.
                     90: In spite of that, it is very likely that most applications currently
                     91: use at most two kinds: <TT>NORMAL</tt> and <TT>PTRFREE</tt> objects.
                     92: <P>
                     93: The collector uses a two level allocator.  A large block is defined to
                     94: be one larger than half of <TT>HBLKSIZE</tt>, which is a power of 2,
                     95: typically on the order of the page size.
                     96: <P>
                     97: Large block sizes are rounded up to
                     98: the next multiple of <TT>HBLKSIZE</tt> and then allocated by
1.2     ! noro       99: <TT>GC_allochblk</tt>.  Recent versions of the collector
        !           100: use an approximate best fit algorithm by keeping free lists for
        !           101: several large block sizes.
        !           102: The actual
1.1       noro      103: implementation of <TT>GC_allochblk</tt>
                    104: is significantly complicated by black-listing issues
                    105: (see below).
                    106: <P>
1.2     ! noro      107: Small blocks are allocated in chunks of size <TT>HBLKSIZE</tt>.
        !           108: Each chunk is
1.1       noro      109: dedicated to only one object size and kind.  The allocator maintains
                    110: separate free lists for each size and kind of object.
                    111: <P>
1.2     ! noro      112: Once a large block is split for use in smaller objects, it can only
        !           113: be used for objects of that size, unless the collector discovers a completely
        !           114: empty chunk.  Completely empty chunks are restored to the appropriate
        !           115: large block free list.
        !           116: <P>
1.1       noro      117: In order to avoid allocating blocks for too many distinct object sizes,
                    118: the collector normally does not directly allocate objects of every possible
                    119: request size.  Instead request are rounded up to one of a smaller number
                    120: of allocated sizes, for which free lists are maintained.  The exact
                    121: allocated sizes are computed on demand, but subject to the constraint
                    122: that they increase roughly in geometric progression.  Thus objects
                    123: requested early in the execution are likely to be allocated with exactly
                    124: the requested size, subject to alignment constraints.
                    125: See <TT>GC_init_size_map</tt> for details.
                    126: <P>
                    127: The actual size rounding operation during small object allocation is
                    128: implemented as a table lookup in <TT>GC_size_map</tt>.
                    129: <P>
                    130: Both collector initialization and computation of allocated sizes are
                    131: handled carefully so that they do not slow down the small object fast
                    132: allocation path.  An attempt to allocate before the collector is initialized,
                    133: or before the appropriate <TT>GC_size_map</tt> entry is computed,
                    134: will take the same path as an allocation attempt with an empty free list.
                    135: This results in a call to the slow path code (<TT>GC_generic_malloc_inner</tt>)
                    136: which performs the appropriate initialization checks.
                    137: <P>
                    138: In non-incremental mode, we make a decision about whether to garbage collect
                    139: whenever an allocation would otherwise have failed with the current heap size.
                    140: If the total amount of allocation since the last collection is less than
                    141: the heap size divided by <TT>GC_free_space_divisor</tt>, we try to
                    142: expand the heap.  Otherwise, we initiate a garbage collection.  This ensures
                    143: that the amount of garbage collection work per allocated byte remains
                    144: constant.
                    145: <P>
1.2     ! noro      146: The above is in fact an oversimplification of the real heap expansion
        !           147: and GC triggering heuristic, which adjusts slightly for root size
        !           148: and certain kinds of
        !           149: fragmentation.  In particular:
        !           150: <UL>
        !           151: <LI> Programs with a large root set size and
1.1       noro      152: little live heap memory will expand the heap to amortize the cost of
1.2     ! noro      153: scanning the roots.
        !           154: <LI> Versions 5.x of the collector actually collect more frequently in
1.1       noro      155: nonincremental mode.  The large block allocator usually refuses to split
                    156: large heap blocks once the garbage collection threshold is
                    157: reached.  This often has the effect of collecting well before the
                    158: heap fills up, thus reducing fragmentation and working set size at the
1.2     ! noro      159: expense of GC time.  Versions 6.x choose an intermediate strategy depending
1.1       noro      160: on how much large object allocation has taken place in the past.
                    161: (If the collector is configured to unmap unused pages, versions 6.x
1.2     ! noro      162: use the 5.x strategy.)
        !           163: <LI> In calculating the amount of allocation since the last collection we
        !           164: give partial credit for objects we expect to be explicitly deallocated.
        !           165: Even if all objects are explicitly managed, it is often desirable to collect
        !           166: on rare occasion, since that is our only mechanism for coalescing completely
        !           167: empty chunks.
        !           168: </ul>
1.1       noro      169: <P>
1.2     ! noro      170: It has been suggested that this should be adjusted so that we favor
1.1       noro      171: expansion if the resulting heap still fits into physical memory.
                    172: In many cases, that would no doubt help.  But it is tricky to do this
                    173: in a way that remains robust if multiple application are contending
1.2     ! noro      174: for a single pool of physical memory.
1.1       noro      175:
                    176: <H2>Mark phase</h2>
                    177:
                    178: The marker maintains an explicit stack of memory regions that are known
                    179: to be accessible, but that have not yet been searched for contained pointers.
                    180: Each stack entry contains the starting address of the block to be scanned,
                    181: as well as a descriptor of the block.  If no layout information is
                    182: available for the block, then the descriptor is simply a length.
                    183: (For other possibilities, see <TT>gc_mark.h</tt>.)
                    184: <P>
                    185: At the beginning of the mark phase, all root segments are pushed on the
                    186: stack by <TT>GC_push_roots</tt>.  If <TT>ALL_INTERIOR_PTRS</tt> is not
                    187: defined, then stack roots require special treatment.  In this case, the
                    188: normal marking code ignores interior pointers, but <TT>GC_push_all_stack</tt>
                    189: explicitly checks for interior pointers and pushes descriptors for target
                    190: objects.
                    191: <P>
                    192: The marker is structured to allow incremental marking.
                    193: Each call to <TT>GC_mark_some</tt> performs a small amount of
                    194: work towards marking the heap.
                    195: It maintains
                    196: explicit state in the form of <TT>GC_mark_state</tt>, which
                    197: identifies a particular sub-phase.  Some other pieces of state, most
                    198: notably the mark stack, identify how much work remains to be done
                    199: in each sub-phase.  The normal progression of mark states for
                    200: a stop-the-world collection is:
                    201: <OL>
                    202: <LI> <TT>MS_INVALID</tt> indicating that there may be accessible unmarked
                    203: objects.  In this case <TT>GC_objects_are_marked</tt> will simultaneously
                    204: be false, so the mark state is advanced to
                    205: <LI> <TT>MS_PUSH_UNCOLLECTABLE</tt> indicating that it suffices to push
                    206: uncollectable objects, roots, and then mark everything reachable from them.
                    207: <TT>Scan_ptr</tt> is advanced through the heap until all uncollectable
                    208: objects are pushed, and objects reachable from them are marked.
                    209: At that point, the next call to <TT>GC_mark_some</tt> calls
                    210: <TT>GC_push_roots</tt> to push the roots.  It the advances the
                    211: mark state to
                    212: <LI> <TT>MS_ROOTS_PUSHED</tt> asserting that once the mark stack is
                    213: empty, all reachable objects are marked.  Once in this state, we work
                    214: only on emptying the mark stack.  Once this is completed, the state
                    215: changes to
                    216: <LI> <TT>MS_NONE</tt> indicating that reachable objects are marked.
                    217: </ol>
                    218:
1.2     ! noro      219: The core mark routine <TT>GC_mark_from</tt>, is called
1.1       noro      220: repeatedly by several of the sub-phases when the mark stack starts to fill
                    221: up.  It is also called repeatedly in <TT>MS_ROOTS_PUSHED</tt> state
                    222: to empty the mark stack.
                    223: The routine is designed to only perform a limited amount of marking at
                    224: each call, so that it can also be used by the incremental collector.
                    225: It is fairly carefully tuned, since it usually consumes a large majority
                    226: of the garbage collection time.
                    227: <P>
1.2     ! noro      228: The fact that it perform a only a small amount of work per call also
        !           229: allows it to be used as the core routine of the parallel marker.  In that
        !           230: case it is normally invoked on thread-private mark stacks instead of the
        !           231: global mark stack.  More details can be found in
        !           232: <A HREF="scale.html">scale.html</a>
        !           233: <P>
1.1       noro      234: The marker correctly handles mark stack overflows.  Whenever the mark stack
                    235: overflows, the mark state is reset to <TT>MS_INVALID</tt>.
                    236: Since there are already marked objects in the heap,
                    237: this eventually forces a complete
                    238: scan of the heap, searching for pointers, during which any unmarked objects
                    239: referenced by marked objects are again pushed on the mark stack.  This
                    240: process is repeated until the mark phase completes without a stack overflow.
                    241: Each time the stack overflows, an attempt is made to grow the mark stack.
                    242: All pieces of the collector that push regions onto the mark stack have to be
                    243: careful to ensure forward progress, even in case of repeated mark stack
                    244: overflows.  Every mark attempt results in additional marked objects.
                    245: <P>
                    246: Each mark stack entry is processed by examining all candidate pointers
                    247: in the range described by the entry.  If the region has no associated
                    248: type information, then this typically requires that each 4-byte aligned
                    249: quantity (8-byte aligned with 64-bit pointers) be considered a candidate
                    250: pointer.
                    251: <P>
                    252: We determine whether a candidate pointer is actually the address of
                    253: a heap block.  This is done in the following steps:
                    254: <NL>
                    255: <LI> The candidate pointer is checked against rough heap bounds.
                    256: These heap bounds are maintained such that all actual heap objects
                    257: fall between them.  In order to facilitate black-listing (see below)
                    258: we also include address regions that the heap is likely to expand into.
                    259: Most non-pointers fail this initial test.
                    260: <LI> The candidate pointer is divided into two pieces; the most significant
                    261: bits identify a <TT>HBLKSIZE</tt>-sized page in the address space, and
                    262: the least significant bits specify an offset within that page.
                    263: (A hardware page may actually consist of multiple such pages.
                    264: HBLKSIZE is usually the page size divided by a small power of two.)
                    265: <LI>
                    266: The page address part of the candidate pointer is looked up in a
                    267: <A HREF="tree.html">table</a>.
                    268: Each table entry contains either 0, indicating that the page is not part
                    269: of the garbage collected heap, a small integer <I>n</i>, indicating
                    270: that the page is part of large object, starting at least <I>n</i> pages
                    271: back, or a pointer to a descriptor for the page.  In the first case,
                    272: the candidate pointer i not a true pointer and can be safely ignored.
                    273: In the last two cases, we can obtain a descriptor for the page containing
                    274: the beginning of the object.
                    275: <LI>
                    276: The starting address of the referenced object is computed.
                    277: The page descriptor contains the size of the object(s)
                    278: in that page, the object kind, and the necessary mark bits for those
                    279: objects.  The size information can be used to map the candidate pointer
                    280: to the object starting address.  To accelerate this process, the page header
                    281: also contains a pointer to a precomputed map of page offsets to displacements
                    282: from the beginning of an object.  The use of this map avoids a
                    283: potentially slow integer remainder operation in computing the object
                    284: start address.
                    285: <LI>
                    286: The mark bit for the target object is checked and set.  If the object
                    287: was previously unmarked, the object is pushed on the mark stack.
                    288: The descriptor is read from the page descriptor.  (This is computed
                    289: from information <TT>GC_obj_kinds</tt> when the page is first allocated.)
                    290: </nl>
                    291: <P>
                    292: At the end of the mark phase, mark bits for left-over free lists are cleared,
                    293: in case a free list was accidentally marked due to a stray pointer.
                    294:
                    295: <H2>Sweep phase</h2>
                    296:
                    297: At the end of the mark phase, all blocks in the heap are examined.
                    298: Unmarked large objects are immediately returned to the large object free list.
                    299: Each small object page is checked to see if all mark bits are clear.
                    300: If so, the entire page is returned to the large object free list.
                    301: Small object pages containing some reachable object are queued for later
1.2     ! noro      302: sweeping, unless we determine that the page contains very little free
        !           303: space, in which case it is not examined further.
1.1       noro      304: <P>
                    305: This initial sweep pass touches only block headers, not
                    306: the blocks themselves.  Thus it does not require significant paging, even
                    307: if large sections of the heap are not in physical memory.
                    308: <P>
                    309: Nonempty small object pages are swept when an allocation attempt
                    310: encounters an empty free list for that object size and kind.
                    311: Pages for the correct size and kind are repeatedly swept until at
                    312: least one empty block is found.  Sweeping such a page involves
                    313: scanning the mark bit array in the page header, and building a free
                    314: list linked through the first words in the objects themselves.
                    315: This does involve touching the appropriate data page, but in most cases
                    316: it will be touched only just before it is used for allocation.
                    317: Hence any paging is essentially unavoidable.
                    318: <P>
                    319: Except in the case of pointer-free objects, we maintain the invariant
                    320: that any object in a small object free list is cleared (except possibly
                    321: for the link field).  Thus it becomes the burden of the small object
                    322: sweep routine to clear objects.  This has the advantage that we can
                    323: easily recover from accidentally marking a free list, though that could
                    324: also be handled by other means.  The collector currently spends a fair
                    325: amount of time clearing objects, and this approach should probably be
                    326: revisited.
                    327: <P>
                    328: In most configurations, we use specialized sweep routines to handle common
                    329: small object sizes.  Since we allocate one mark bit per word, it becomes
                    330: easier to examine the relevant mark bits if the object size divides
                    331: the word length evenly.  We also suitably unroll the inner sweep loop
                    332: in each case.  (It is conceivable that profile-based procedure cloning
                    333: in the compiler could make this unnecessary and counterproductive.  I
                    334: know of no existing compiler to which this applies.)
                    335: <P>
                    336: The sweeping of small object pages could be avoided completely at the expense
                    337: of examining mark bits directly in the allocator.  This would probably
                    338: be more expensive, since each allocation call would have to reload
                    339: a large amount of state (e.g. next object address to be swept, position
                    340: in mark bit table) before it could do its work.  The current scheme
                    341: keeps the allocator simple and allows useful optimizations in the sweeper.
                    342:
                    343: <H2>Finalization</h2>
                    344: Both <TT>GC_register_disappearing_link</tt> and
                    345: <TT>GC_register_finalizer</tt> add the request to a corresponding hash
                    346: table.  The hash table is allocated out of collected memory, but
                    347: the reference to the finalizable object is hidden from the collector.
                    348: Currently finalization requests are processed non-incrementally at the
                    349: end of a mark cycle.
                    350: <P>
                    351: The collector makes an initial pass over the table of finalizable objects,
                    352: pushing the contents of unmarked objects onto the mark stack.
                    353: After pushing each object, the marker is invoked to mark all objects
                    354: reachable from it.  The object itself is not explicitly marked.
                    355: This assures that objects on which a finalizer depends are neither
                    356: collected nor finalized.
                    357: <P>
                    358: If in the process of marking from an object the
                    359: object itself becomes marked, we have uncovered
                    360: a cycle involving the object.  This usually results in a warning from the
                    361: collector.  Such objects are not finalized, since it may be
                    362: unsafe to do so.  See the more detailed
1.2     ! noro      363: <A HREF="http://www.hpl.hp.com/personal/Hans_Boehm/gc/finalization.html"> discussion of finalization semantics</a>.
1.1       noro      364: <P>
                    365: Any objects remaining unmarked at the end of this process are added to
                    366: a queue of objects whose finalizers can be run.  Depending on collector
                    367: configuration, finalizers are dequeued and run either implicitly during
                    368: allocation calls, or explicitly in response to a user request.
1.2     ! noro      369: (Note that the former is unfortunately both the default and not generally safe.
        !           370: If finalizers perform synchronization, it may result in deadlocks.
        !           371: Nontrivial finalizers generally need to perform synchronization, and
        !           372: thus require a different collector configuration.)
1.1       noro      373: <P>
                    374: The collector provides a mechanism for replacing the procedure that is
                    375: used to mark through objects.  This is used both to provide support for
                    376: Java-style unordered finalization, and to ignore certain kinds of cycles,
                    377: <I>e.g.</i> those arising from C++ implementations of virtual inheritance.
                    378:
                    379: <H2>Generational Collection and Dirty Bits</h2>
1.2     ! noro      380: We basically use the concurrent and generational GC algorithm described in
        !           381: <A HREF="http://www.hpl.hp.com/personal/Hans_Boehm/gc/papers/pldi91.ps.Z">"Mostly Parallel Garbage Collection"</a>,
1.1       noro      382: by Boehm, Demers, and Shenker.
                    383: <P>
                    384: The most significant modification is that
1.2     ! noro      385: the collector always starts running in the allocating thread.
        !           386: There is no separate garbage collector thread.  (If parallel GC is
        !           387: enabled, helper threads may also be woken up.)
1.1       noro      388: If an allocation attempt either requests a large object, or encounters
                    389: an empty small object free list, and notices that there is a collection
                    390: in progress, it immediately performs a small amount of marking work
                    391: as described above.
                    392: <P>
                    393: This change was made both because we wanted to easily accommodate
                    394: single-threaded environments, and because a separate GC thread requires
                    395: very careful control over the scheduler to prevent the mutator from
                    396: out-running the collector, and hence provoking unneeded heap growth.
                    397: <P>
                    398: In incremental mode, the heap is always expanded when we encounter
                    399: insufficient space for an allocation.  Garbage collection is triggered
                    400: whenever we notice that more than
                    401: <TT>GC_heap_size</tt>/2 * <TT>GC_free_space_divisor</tt>
                    402: bytes of allocation have taken place.
                    403: After <TT>GC_full_freq</tt> minor collections a major collection
                    404: is started.
                    405: <P>
                    406: All collections initially run interrupted until a predetermined
                    407: amount of time (50 msecs by default) has expired.  If this allows
                    408: the collection to complete entirely, we can avoid correcting
                    409: for data structure modifications during the collection.  If it does
                    410: not complete, we return control to the mutator, and perform small
                    411: amounts of additional GC work during those later allocations that
                    412: cannot be satisfied from small object free lists. When marking completes,
                    413: the set of modified pages is retrieved, and we mark once again from
                    414: marked objects on those pages, this time with the mutator stopped.
                    415: <P>
1.2     ! noro      416: We keep track of modified pages using one of several distinct mechanisms:
1.1       noro      417: <OL>
                    418: <LI>
                    419: Through explicit mutator cooperation.  Currently this requires
1.2     ! noro      420: the use of <TT>GC_malloc_stubborn</tt>, and is rarely used.
1.1       noro      421: <LI>
1.2     ! noro      422: (<TT>MPROTECT_VDB</tt>) By write-protecting physical pages and
        !           423: catching write faults.  This is
1.1       noro      424: implemented for many Unix-like systems and for win32.  It is not possible
                    425: in a few environments.
                    426: <LI>
1.2     ! noro      427: (<TT>PROC_VDB</tt>) By retrieving dirty bit information from /proc.
        !           428: (Currently only Sun's
1.1       noro      429: Solaris supports this.  Though this is considerably cleaner, performance
                    430: may actually be better with mprotect and signals.)
1.2     ! noro      431: <LI>
        !           432: (<TT>PCR_VDB</tt>) By relying on an external dirty bit implementation, in this
        !           433: case the one in Xerox PCR.
        !           434: <LI>
        !           435: (<TT>DEFAULT_VDB</tt>) By treating all pages as dirty.  This is the default if
        !           436: none of the other techniques is known to be usable, and
        !           437: <TT>GC_malloc_stubborn</tt> is not used.  Practical only for testing, or if
        !           438: the vast majority of objects use <TT>GC_malloc_stubborn</tt>.
1.1       noro      439: </ol>
                    440:
1.2     ! noro      441: <H2>Black-listing</h2>
        !           442:
        !           443: The collector implements <I>black-listing</i> of pages, as described
        !           444: in
        !           445: <A HREF="http://www.acm.org/pubs/citations/proceedings/pldi/155090/p197-boehm/">
        !           446: Boehm, ``Space Efficient Conservative Collection'', PLDI '93</a>, also available
        !           447: <A HREF="papers/pldi93.ps.Z">here</a>.
        !           448: <P>
        !           449: During the mark phase, the collector tracks ``near misses'', i.e. attempts
        !           450: to follow a ``pointer'' to just outside the garbage-collected heap, or
        !           451: to a currently unallocated page inside the heap.  Pages that have been
        !           452: the targets of such near misses are likely to be the targets of
        !           453: misidentified ``pointers'' in the future.  To minimize the future
        !           454: damage caused by such misidentifications they will be allocated only to
        !           455: small pointerfree objects.
        !           456: <P>
        !           457: The collector understands two different kinds of black-listing.  A
        !           458: page may be black listed for interior pointer references
        !           459: (<TT>GC_add_to_black_list_stack</tt>), if it was the target of a near
        !           460: miss from a location that requires interior pointer recognition,
        !           461: <I>e.g.</i> the stack, or the heap if <TT>GC_all_interior_pointers</tt>
        !           462: is set.  In this case, we also avoid allocating large blocks that include
        !           463: this page.
        !           464: <P>
        !           465: If the near miss came from a source that did not require interior
        !           466: pointer recognition, it is black-listed with
        !           467: <TT>GC_add_to_black_list_normal</tt>.
        !           468: A page black-listed in this way may appear inside a large object,
        !           469: so long as it is not the first page of a large object.
        !           470: <P>
        !           471: The <TT>GC_allochblk</tt> routine respects black-listing when assigning
        !           472: a block to a particular object kind and size.  It occasionally
        !           473: drops (i.e. allocates and forgets) blocks that are completely black-listed
        !           474: in order to avoid excessively long large block free lists containing
        !           475: only unusable blocks.  This would otherwise become an issue
        !           476: if there is low demand for small pointerfree objects.
        !           477:
1.1       noro      478: <H2>Thread support</h2>
                    479: We support several different threading models.  Unfortunately Pthreads,
                    480: the only reasonably well standardized thread model, supports too narrow
                    481: an interface for conservative garbage collection.  There appears to be
1.2     ! noro      482: no completely portable way to allow the collector to coexist with various Pthreads
1.1       noro      483: implementations.  Hence we currently support only a few of the more
                    484: common Pthreads implementations.
                    485: <P>
                    486: In particular, it is very difficult for the collector to stop all other
                    487: threads in the system and examine the register contents.  This is currently
1.2     ! noro      488: accomplished with very different mechanisms for some Pthreads
1.1       noro      489: implementations.  The Solaris implementation temporarily disables much
                    490: of the user-level threads implementation by stopping kernel-level threads
1.2     ! noro      491: ("lwp"s).  The Linux/HPUX/OSF1 and Irix implementations sends signals to
        !           492: individual Pthreads and has them wait in the signal handler.
        !           493: <P>
        !           494: The Linux and Irix implementations use
        !           495: only documented Pthreads calls, but rely on extensions to their semantics.
        !           496: The Linux implementation <TT>linux_threads.c</tt> relies on only very
        !           497: mild extensions to the pthreads semantics, and already supports a large number
        !           498: of other Unix-like pthreads implementations.  Our goal is to make this the
        !           499: only pthread support in the collector.
        !           500: <P>
        !           501: (The Irix implementation is separate only for historical reasons and should
        !           502: clearly be merged.  The current Solaris implementation probably performs
        !           503: better in the uniprocessor case, but does not support thread operations in the
        !           504: collector.  Hence it cannot support the parallel marker.)
1.1       noro      505: <P>
                    506: All implementations must
                    507: intercept thread creation and a few other thread-specific calls to allow
                    508: enumeration of threads and location of thread stacks.  This is current
1.2     ! noro      509: accomplished with <TT># define</tt>'s in <TT>gc.h</tt>
        !           510: (really <TT>gc_pthread_redirects.h</tt>), or optionally
1.1       noro      511: by using ld's function call wrapping mechanism under Linux.
                    512: <P>
                    513: Comments are appreciated.  Please send mail to
1.2     ! noro      514: <A HREF="mailto:boehm@acm.org"><TT>boehm@acm.org</tt></a> or
        !           515: <A HREF="mailto:Hans.Boehm@hp.com"><TT>Hans.Boehm@hp.com</tt></a>
        !           516: <P>
        !           517: This is a modified copy of a page written while the author was at SGI.
        !           518: The original was <A HREF="http://reality.sgi.com/boehm/gcdescr.html">here</a>.
1.1       noro      519: </body>
1.2     ! noro      520: </html>

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>