Annotation of OpenXM_contrib2/asir2000/gc/doc/gcdescr.html, Revision 1.1
1.1 ! noro 1: <HTML>
! 2: <HEAD>
! 3: <TITLE> Conservative GC Algorithmic Overview </TITLE>
! 4: <AUTHOR> Hans-J. Boehm, Silicon Graphics</author>
! 5: </HEAD>
! 6: <BODY>
! 7: <H1> <I>This is under construction</i> </h1>
! 8: <H1> Conservative GC Algorithmic Overview </h1>
! 9: <P>
! 10: This is a description of the algorithms and data structures used in our
! 11: conservative garbage collector. I expect the level of detail to increase
! 12: with time. For a survey of GC algorithms, see for example
! 13: <A HREF="ftp://ftp.cs.utexas.edu/pub/garbage/gcsurvey.ps"> Paul Wilson's
! 14: excellent paper</a>. For an overview of the collector interface,
! 15: see <A HREF="gcinterface.html">here</a>.
! 16: <P>
! 17: This description is targeted primarily at someone trying to understand the
! 18: source code. It specifically refers to variable and function names.
! 19: It may also be useful for understanding the algorithms at a higher level.
! 20: <P>
! 21: The description here assumes that the collector is used in default mode.
! 22: In particular, we assume that it used as a garbage collector, and not just
! 23: a leak detector. We initially assume that it is used in stop-the-world,
! 24: non-incremental mode, though the presence of the incremental collector
! 25: will be apparent in the design.
! 26: We assume the default finalization model, but the code affected by that
! 27: is very localized.
! 28: <H2> Introduction </h2>
! 29: The garbage collector uses a modified mark-sweep algorithm. Conceptually
! 30: it operates roughly in four phases:
! 31:
! 32: <OL>
! 33:
! 34: <LI>
! 35: <I>Preparation</i> Clear all mark bits, indicating that all objects
! 36: are potentially unreachable.
! 37:
! 38: <LI>
! 39: <I>Mark phase</i> Marks all objects that can be reachable via chains of
! 40: pointers from variables. Normally the collector has no real information
! 41: about the location of pointer variables in the heap, so it
! 42: views all static data areas, stacks and registers as potentially containing
! 43: containing pointers. Any bit patterns that represent addresses inside
! 44: heap objects managed by the collector are viewed as pointers.
! 45: Unless the client program has made heap object layout information
! 46: available to the collector, any heap objects found to be reachable from
! 47: variables are again scanned similarly.
! 48:
! 49: <LI>
! 50: <I>Sweep phase</i> Scans the heap for inaccessible, and hence unmarked,
! 51: objects, and returns them to an appropriate free list for reuse. This is
! 52: not really a separate phase; even in non incremental mode this is operation
! 53: is usually performed on demand during an allocation that discovers an empty
! 54: free list. Thus the sweep phase is very unlikely to touch a page that
! 55: would not have been touched shortly thereafter anyway.
! 56:
! 57: <LI>
! 58: <I>Finalization phase</i> Unreachable objects which had been registered
! 59: for finalization are enqueued for finalization outside the collector.
! 60:
! 61: </ol>
! 62:
! 63: <P>
! 64: The remaining sections describe the memory allocation data structures,
! 65: and then the last 3 collection phases in more detail. We conclude by
! 66: outlining some of the additional features implemented in the collector.
! 67:
! 68: <H2>Allocation</h2>
! 69: The collector includes its own memory allocator. The allocator obtains
! 70: memory from the system in a platform-dependent way. Under UNIX, it
! 71: uses either <TT>malloc</tt>, <TT>sbrk</tt>, or <TT>mmap</tt>.
! 72: <P>
! 73: Most static data used by the allocator, as well as that needed by the
! 74: rest of the garbage collector is stored inside the
! 75: <TT>_GC_arrays</tt> structure.
! 76: This allows the garbage collector to easily ignore the collectors own
! 77: data structures when it searches for root pointers. Other allocator
! 78: and collector internal data structures are allocated dynamically
! 79: with <TT>GC_scratch_alloc</tt>. <TT>GC_scratch_alloc</tt> does not
! 80: allow for deallocation, and is therefore used only for permanent data
! 81: structures.
! 82: <P>
! 83: The allocator allocates objects of different <I>kinds</i>.
! 84: Different kinds are handled somewhat differently by certain parts
! 85: of the garbage collector. Certain kinds are scanned for pointers,
! 86: others are not. Some may have per-object type descriptors that
! 87: determine pointer locations. Or a specific kind may correspond
! 88: to one specific object layout. Two built-in kinds are uncollectable.
! 89: One (<TT>STUBBORN</tt>) is immutable without special precautions.
! 90: In spite of that, it is very likely that most applications currently
! 91: use at most two kinds: <TT>NORMAL</tt> and <TT>PTRFREE</tt> objects.
! 92: <P>
! 93: The collector uses a two level allocator. A large block is defined to
! 94: be one larger than half of <TT>HBLKSIZE</tt>, which is a power of 2,
! 95: typically on the order of the page size.
! 96: <P>
! 97: Large block sizes are rounded up to
! 98: the next multiple of <TT>HBLKSIZE</tt> and then allocated by
! 99: <TT>GC_allochblk</tt>. This uses roughly what Paul Wilson has termed
! 100: a "next fit" algorithm, i.e. first-fit with a rotating pointer.
! 101: The implementation does check for a better fitting immediately
! 102: adjacent block, which gives it somewhat better fragmentation characteristics.
! 103: I'm now convinced it should use a best fit algorithm. The actual
! 104: implementation of <TT>GC_allochblk</tt>
! 105: is significantly complicated by black-listing issues
! 106: (see below).
! 107: <P>
! 108: Small blocks are allocated in blocks of size <TT>HBLKSIZE</tt>.
! 109: Each block is
! 110: dedicated to only one object size and kind. The allocator maintains
! 111: separate free lists for each size and kind of object.
! 112: <P>
! 113: In order to avoid allocating blocks for too many distinct object sizes,
! 114: the collector normally does not directly allocate objects of every possible
! 115: request size. Instead request are rounded up to one of a smaller number
! 116: of allocated sizes, for which free lists are maintained. The exact
! 117: allocated sizes are computed on demand, but subject to the constraint
! 118: that they increase roughly in geometric progression. Thus objects
! 119: requested early in the execution are likely to be allocated with exactly
! 120: the requested size, subject to alignment constraints.
! 121: See <TT>GC_init_size_map</tt> for details.
! 122: <P>
! 123: The actual size rounding operation during small object allocation is
! 124: implemented as a table lookup in <TT>GC_size_map</tt>.
! 125: <P>
! 126: Both collector initialization and computation of allocated sizes are
! 127: handled carefully so that they do not slow down the small object fast
! 128: allocation path. An attempt to allocate before the collector is initialized,
! 129: or before the appropriate <TT>GC_size_map</tt> entry is computed,
! 130: will take the same path as an allocation attempt with an empty free list.
! 131: This results in a call to the slow path code (<TT>GC_generic_malloc_inner</tt>)
! 132: which performs the appropriate initialization checks.
! 133: <P>
! 134: In non-incremental mode, we make a decision about whether to garbage collect
! 135: whenever an allocation would otherwise have failed with the current heap size.
! 136: If the total amount of allocation since the last collection is less than
! 137: the heap size divided by <TT>GC_free_space_divisor</tt>, we try to
! 138: expand the heap. Otherwise, we initiate a garbage collection. This ensures
! 139: that the amount of garbage collection work per allocated byte remains
! 140: constant.
! 141: <P>
! 142: The above is in fat an oversimplification of the real heap expansion
! 143: heuristic, which adjusts slightly for root size and certain kinds of
! 144: fragmentation. In particular, programs with a large root set size and
! 145: little live heap memory will expand the heap to amortize the cost of
! 146: scanning the roots.
! 147: <P>
! 148: Versions 5.x of the collector actually collect more frequently in
! 149: nonincremental mode. The large block allocator usually refuses to split
! 150: large heap blocks once the garbage collection threshold is
! 151: reached. This often has the effect of collecting well before the
! 152: heap fills up, thus reducing fragmentation and working set size at the
! 153: expense of GC time. 6.x will chose an intermediate strategy depending
! 154: on how much large object allocation has taken place in the past.
! 155: (If the collector is configured to unmap unused pages, versions 6.x
! 156: will use the 5.x strategy.)
! 157: <P>
! 158: (It has been suggested that this should be adjusted so that we favor
! 159: expansion if the resulting heap still fits into physical memory.
! 160: In many cases, that would no doubt help. But it is tricky to do this
! 161: in a way that remains robust if multiple application are contending
! 162: for a single pool of physical memory.)
! 163:
! 164: <H2>Mark phase</h2>
! 165:
! 166: The marker maintains an explicit stack of memory regions that are known
! 167: to be accessible, but that have not yet been searched for contained pointers.
! 168: Each stack entry contains the starting address of the block to be scanned,
! 169: as well as a descriptor of the block. If no layout information is
! 170: available for the block, then the descriptor is simply a length.
! 171: (For other possibilities, see <TT>gc_mark.h</tt>.)
! 172: <P>
! 173: At the beginning of the mark phase, all root segments are pushed on the
! 174: stack by <TT>GC_push_roots</tt>. If <TT>ALL_INTERIOR_PTRS</tt> is not
! 175: defined, then stack roots require special treatment. In this case, the
! 176: normal marking code ignores interior pointers, but <TT>GC_push_all_stack</tt>
! 177: explicitly checks for interior pointers and pushes descriptors for target
! 178: objects.
! 179: <P>
! 180: The marker is structured to allow incremental marking.
! 181: Each call to <TT>GC_mark_some</tt> performs a small amount of
! 182: work towards marking the heap.
! 183: It maintains
! 184: explicit state in the form of <TT>GC_mark_state</tt>, which
! 185: identifies a particular sub-phase. Some other pieces of state, most
! 186: notably the mark stack, identify how much work remains to be done
! 187: in each sub-phase. The normal progression of mark states for
! 188: a stop-the-world collection is:
! 189: <OL>
! 190: <LI> <TT>MS_INVALID</tt> indicating that there may be accessible unmarked
! 191: objects. In this case <TT>GC_objects_are_marked</tt> will simultaneously
! 192: be false, so the mark state is advanced to
! 193: <LI> <TT>MS_PUSH_UNCOLLECTABLE</tt> indicating that it suffices to push
! 194: uncollectable objects, roots, and then mark everything reachable from them.
! 195: <TT>Scan_ptr</tt> is advanced through the heap until all uncollectable
! 196: objects are pushed, and objects reachable from them are marked.
! 197: At that point, the next call to <TT>GC_mark_some</tt> calls
! 198: <TT>GC_push_roots</tt> to push the roots. It the advances the
! 199: mark state to
! 200: <LI> <TT>MS_ROOTS_PUSHED</tt> asserting that once the mark stack is
! 201: empty, all reachable objects are marked. Once in this state, we work
! 202: only on emptying the mark stack. Once this is completed, the state
! 203: changes to
! 204: <LI> <TT>MS_NONE</tt> indicating that reachable objects are marked.
! 205: </ol>
! 206:
! 207: The core mark routine <TT>GC_mark_from_mark_stack</tt>, is called
! 208: repeatedly by several of the sub-phases when the mark stack starts to fill
! 209: up. It is also called repeatedly in <TT>MS_ROOTS_PUSHED</tt> state
! 210: to empty the mark stack.
! 211: The routine is designed to only perform a limited amount of marking at
! 212: each call, so that it can also be used by the incremental collector.
! 213: It is fairly carefully tuned, since it usually consumes a large majority
! 214: of the garbage collection time.
! 215: <P>
! 216: The marker correctly handles mark stack overflows. Whenever the mark stack
! 217: overflows, the mark state is reset to <TT>MS_INVALID</tt>.
! 218: Since there are already marked objects in the heap,
! 219: this eventually forces a complete
! 220: scan of the heap, searching for pointers, during which any unmarked objects
! 221: referenced by marked objects are again pushed on the mark stack. This
! 222: process is repeated until the mark phase completes without a stack overflow.
! 223: Each time the stack overflows, an attempt is made to grow the mark stack.
! 224: All pieces of the collector that push regions onto the mark stack have to be
! 225: careful to ensure forward progress, even in case of repeated mark stack
! 226: overflows. Every mark attempt results in additional marked objects.
! 227: <P>
! 228: Each mark stack entry is processed by examining all candidate pointers
! 229: in the range described by the entry. If the region has no associated
! 230: type information, then this typically requires that each 4-byte aligned
! 231: quantity (8-byte aligned with 64-bit pointers) be considered a candidate
! 232: pointer.
! 233: <P>
! 234: We determine whether a candidate pointer is actually the address of
! 235: a heap block. This is done in the following steps:
! 236: <NL>
! 237: <LI> The candidate pointer is checked against rough heap bounds.
! 238: These heap bounds are maintained such that all actual heap objects
! 239: fall between them. In order to facilitate black-listing (see below)
! 240: we also include address regions that the heap is likely to expand into.
! 241: Most non-pointers fail this initial test.
! 242: <LI> The candidate pointer is divided into two pieces; the most significant
! 243: bits identify a <TT>HBLKSIZE</tt>-sized page in the address space, and
! 244: the least significant bits specify an offset within that page.
! 245: (A hardware page may actually consist of multiple such pages.
! 246: HBLKSIZE is usually the page size divided by a small power of two.)
! 247: <LI>
! 248: The page address part of the candidate pointer is looked up in a
! 249: <A HREF="tree.html">table</a>.
! 250: Each table entry contains either 0, indicating that the page is not part
! 251: of the garbage collected heap, a small integer <I>n</i>, indicating
! 252: that the page is part of large object, starting at least <I>n</i> pages
! 253: back, or a pointer to a descriptor for the page. In the first case,
! 254: the candidate pointer i not a true pointer and can be safely ignored.
! 255: In the last two cases, we can obtain a descriptor for the page containing
! 256: the beginning of the object.
! 257: <LI>
! 258: The starting address of the referenced object is computed.
! 259: The page descriptor contains the size of the object(s)
! 260: in that page, the object kind, and the necessary mark bits for those
! 261: objects. The size information can be used to map the candidate pointer
! 262: to the object starting address. To accelerate this process, the page header
! 263: also contains a pointer to a precomputed map of page offsets to displacements
! 264: from the beginning of an object. The use of this map avoids a
! 265: potentially slow integer remainder operation in computing the object
! 266: start address.
! 267: <LI>
! 268: The mark bit for the target object is checked and set. If the object
! 269: was previously unmarked, the object is pushed on the mark stack.
! 270: The descriptor is read from the page descriptor. (This is computed
! 271: from information <TT>GC_obj_kinds</tt> when the page is first allocated.)
! 272: </nl>
! 273: <P>
! 274: At the end of the mark phase, mark bits for left-over free lists are cleared,
! 275: in case a free list was accidentally marked due to a stray pointer.
! 276:
! 277: <H2>Sweep phase</h2>
! 278:
! 279: At the end of the mark phase, all blocks in the heap are examined.
! 280: Unmarked large objects are immediately returned to the large object free list.
! 281: Each small object page is checked to see if all mark bits are clear.
! 282: If so, the entire page is returned to the large object free list.
! 283: Small object pages containing some reachable object are queued for later
! 284: sweeping.
! 285: <P>
! 286: This initial sweep pass touches only block headers, not
! 287: the blocks themselves. Thus it does not require significant paging, even
! 288: if large sections of the heap are not in physical memory.
! 289: <P>
! 290: Nonempty small object pages are swept when an allocation attempt
! 291: encounters an empty free list for that object size and kind.
! 292: Pages for the correct size and kind are repeatedly swept until at
! 293: least one empty block is found. Sweeping such a page involves
! 294: scanning the mark bit array in the page header, and building a free
! 295: list linked through the first words in the objects themselves.
! 296: This does involve touching the appropriate data page, but in most cases
! 297: it will be touched only just before it is used for allocation.
! 298: Hence any paging is essentially unavoidable.
! 299: <P>
! 300: Except in the case of pointer-free objects, we maintain the invariant
! 301: that any object in a small object free list is cleared (except possibly
! 302: for the link field). Thus it becomes the burden of the small object
! 303: sweep routine to clear objects. This has the advantage that we can
! 304: easily recover from accidentally marking a free list, though that could
! 305: also be handled by other means. The collector currently spends a fair
! 306: amount of time clearing objects, and this approach should probably be
! 307: revisited.
! 308: <P>
! 309: In most configurations, we use specialized sweep routines to handle common
! 310: small object sizes. Since we allocate one mark bit per word, it becomes
! 311: easier to examine the relevant mark bits if the object size divides
! 312: the word length evenly. We also suitably unroll the inner sweep loop
! 313: in each case. (It is conceivable that profile-based procedure cloning
! 314: in the compiler could make this unnecessary and counterproductive. I
! 315: know of no existing compiler to which this applies.)
! 316: <P>
! 317: The sweeping of small object pages could be avoided completely at the expense
! 318: of examining mark bits directly in the allocator. This would probably
! 319: be more expensive, since each allocation call would have to reload
! 320: a large amount of state (e.g. next object address to be swept, position
! 321: in mark bit table) before it could do its work. The current scheme
! 322: keeps the allocator simple and allows useful optimizations in the sweeper.
! 323:
! 324: <H2>Finalization</h2>
! 325: Both <TT>GC_register_disappearing_link</tt> and
! 326: <TT>GC_register_finalizer</tt> add the request to a corresponding hash
! 327: table. The hash table is allocated out of collected memory, but
! 328: the reference to the finalizable object is hidden from the collector.
! 329: Currently finalization requests are processed non-incrementally at the
! 330: end of a mark cycle.
! 331: <P>
! 332: The collector makes an initial pass over the table of finalizable objects,
! 333: pushing the contents of unmarked objects onto the mark stack.
! 334: After pushing each object, the marker is invoked to mark all objects
! 335: reachable from it. The object itself is not explicitly marked.
! 336: This assures that objects on which a finalizer depends are neither
! 337: collected nor finalized.
! 338: <P>
! 339: If in the process of marking from an object the
! 340: object itself becomes marked, we have uncovered
! 341: a cycle involving the object. This usually results in a warning from the
! 342: collector. Such objects are not finalized, since it may be
! 343: unsafe to do so. See the more detailed
! 344: <A HREF="finalization.html"> discussion of finalization semantics</a>.
! 345: <P>
! 346: Any objects remaining unmarked at the end of this process are added to
! 347: a queue of objects whose finalizers can be run. Depending on collector
! 348: configuration, finalizers are dequeued and run either implicitly during
! 349: allocation calls, or explicitly in response to a user request.
! 350: <P>
! 351: The collector provides a mechanism for replacing the procedure that is
! 352: used to mark through objects. This is used both to provide support for
! 353: Java-style unordered finalization, and to ignore certain kinds of cycles,
! 354: <I>e.g.</i> those arising from C++ implementations of virtual inheritance.
! 355:
! 356: <H2>Generational Collection and Dirty Bits</h2>
! 357: We basically use the parallel and generational GC algorithm described in
! 358: <A HREF="papers/pldi91.ps.gz">"Mostly Parallel Garbage Collection"</a>,
! 359: by Boehm, Demers, and Shenker.
! 360: <P>
! 361: The most significant modification is that
! 362: the collector always runs in the allocating thread.
! 363: There is no separate garbage collector thread.
! 364: If an allocation attempt either requests a large object, or encounters
! 365: an empty small object free list, and notices that there is a collection
! 366: in progress, it immediately performs a small amount of marking work
! 367: as described above.
! 368: <P>
! 369: This change was made both because we wanted to easily accommodate
! 370: single-threaded environments, and because a separate GC thread requires
! 371: very careful control over the scheduler to prevent the mutator from
! 372: out-running the collector, and hence provoking unneeded heap growth.
! 373: <P>
! 374: In incremental mode, the heap is always expanded when we encounter
! 375: insufficient space for an allocation. Garbage collection is triggered
! 376: whenever we notice that more than
! 377: <TT>GC_heap_size</tt>/2 * <TT>GC_free_space_divisor</tt>
! 378: bytes of allocation have taken place.
! 379: After <TT>GC_full_freq</tt> minor collections a major collection
! 380: is started.
! 381: <P>
! 382: All collections initially run interrupted until a predetermined
! 383: amount of time (50 msecs by default) has expired. If this allows
! 384: the collection to complete entirely, we can avoid correcting
! 385: for data structure modifications during the collection. If it does
! 386: not complete, we return control to the mutator, and perform small
! 387: amounts of additional GC work during those later allocations that
! 388: cannot be satisfied from small object free lists. When marking completes,
! 389: the set of modified pages is retrieved, and we mark once again from
! 390: marked objects on those pages, this time with the mutator stopped.
! 391: <P>
! 392: We keep track of modified pages using one of three distinct mechanisms:
! 393: <OL>
! 394: <LI>
! 395: Through explicit mutator cooperation. Currently this requires
! 396: the use of <TT>GC_malloc_stubborn</tt>.
! 397: <LI>
! 398: By write-protecting physical pages and catching write faults. This is
! 399: implemented for many Unix-like systems and for win32. It is not possible
! 400: in a few environments.
! 401: <LI>
! 402: By retrieving dirty bit information from /proc. (Currently only Sun's
! 403: Solaris supports this. Though this is considerably cleaner, performance
! 404: may actually be better with mprotect and signals.)
! 405: </ol>
! 406:
! 407: <H2>Thread support</h2>
! 408: We support several different threading models. Unfortunately Pthreads,
! 409: the only reasonably well standardized thread model, supports too narrow
! 410: an interface for conservative garbage collection. There appears to be
! 411: no portable way to allow the collector to coexist with various Pthreads
! 412: implementations. Hence we currently support only a few of the more
! 413: common Pthreads implementations.
! 414: <P>
! 415: In particular, it is very difficult for the collector to stop all other
! 416: threads in the system and examine the register contents. This is currently
! 417: accomplished with very different mechanisms for different Pthreads
! 418: implementations. The Solaris implementation temporarily disables much
! 419: of the user-level threads implementation by stopping kernel-level threads
! 420: ("lwp"s). The Irix implementation sends signals to individual Pthreads
! 421: and has them wait in the signal handler. The Linux implementation
! 422: is similar in spirit to the Irix one.
! 423: <P>
! 424: The Irix implementation uses
! 425: only documented Pthreads calls, but relies on extensions to their semantics,
! 426: notably the use of mutexes and condition variables from signal
! 427: handlers. The Linux implementation should be far closer to
! 428: portable, though impirically it is not completely portable.
! 429: <P>
! 430: All implementations must
! 431: intercept thread creation and a few other thread-specific calls to allow
! 432: enumeration of threads and location of thread stacks. This is current
! 433: accomplished with <TT># define</tt>'s in <TT>gc.h</tt>, or optionally
! 434: by using ld's function call wrapping mechanism under Linux.
! 435: <P>
! 436: Comments are appreciated. Please send mail to
! 437: <A HREF="mailto:boehm@acm.org"><TT>boehm@acm.org</tt></a>
! 438: </body>
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>