[BACK]Return to gcdescr.html CVS log [TXT][DIR] Up to [local] / OpenXM_contrib2 / asir2000 / gc / doc

Annotation of OpenXM_contrib2/asir2000/gc/doc/gcdescr.html, Revision 1.1

1.1     ! noro        1: <HTML>
        !             2: <HEAD>
        !             3:     <TITLE> Conservative GC Algorithmic Overview </TITLE>
        !             4:     <AUTHOR> Hans-J. Boehm, Silicon Graphics</author>
        !             5: </HEAD>
        !             6: <BODY>
        !             7: <H1> <I>This is under construction</i> </h1>
        !             8: <H1> Conservative GC Algorithmic Overview </h1>
        !             9: <P>
        !            10: This is a description of the algorithms and data structures used in our
        !            11: conservative garbage collector.  I expect the level of detail to increase
        !            12: with time.  For a survey of GC algorithms, see for example
        !            13: <A HREF="ftp://ftp.cs.utexas.edu/pub/garbage/gcsurvey.ps"> Paul Wilson's
        !            14: excellent paper</a>.  For an overview of the collector interface,
        !            15: see <A HREF="gcinterface.html">here</a>.
        !            16: <P>
        !            17: This description is targeted primarily at someone trying to understand the
        !            18: source code.  It specifically refers to variable and function names.
        !            19: It may also be useful for understanding the algorithms at a higher level.
        !            20: <P>
        !            21: The description here assumes that the collector is used in default mode.
        !            22: In particular, we assume that it used as a garbage collector, and not just
        !            23: a leak detector.  We initially assume that it is used in stop-the-world,
        !            24: non-incremental mode, though the presence of the incremental collector
        !            25: will be apparent in the design.
        !            26: We assume the default finalization model, but the code affected by that
        !            27: is very localized.
        !            28: <H2> Introduction </h2>
        !            29: The garbage collector uses a modified mark-sweep algorithm.  Conceptually
        !            30: it operates roughly in four phases:
        !            31:
        !            32: <OL>
        !            33:
        !            34: <LI>
        !            35: <I>Preparation</i> Clear all mark bits, indicating that all objects
        !            36: are potentially unreachable.
        !            37:
        !            38: <LI>
        !            39: <I>Mark phase</i> Marks all objects that can be reachable via chains of
        !            40: pointers from variables.  Normally the collector has no real information
        !            41: about the location of pointer variables in the heap, so it
        !            42: views all static data areas, stacks and registers as potentially containing
        !            43: containing pointers.  Any bit patterns that represent addresses inside
        !            44: heap objects managed by the collector are viewed as pointers.
        !            45: Unless the client program has made heap object layout information
        !            46: available to the collector, any heap objects found to be reachable from
        !            47: variables are again scanned similarly.
        !            48:
        !            49: <LI>
        !            50: <I>Sweep phase</i> Scans the heap for inaccessible, and hence unmarked,
        !            51: objects, and returns them to an appropriate free list for reuse.  This is
        !            52: not really a separate phase; even in non incremental mode this is operation
        !            53: is usually performed on demand during an allocation that discovers an empty
        !            54: free list.  Thus the sweep phase is very unlikely to touch a page that
        !            55: would not have been touched shortly thereafter anyway.
        !            56:
        !            57: <LI>
        !            58: <I>Finalization phase</i> Unreachable objects which had been registered
        !            59: for finalization are enqueued for finalization outside the collector.
        !            60:
        !            61: </ol>
        !            62:
        !            63: <P>
        !            64: The remaining sections describe the memory allocation data structures,
        !            65: and then the last 3 collection phases in more detail. We conclude by
        !            66: outlining some of the additional features implemented in the collector.
        !            67:
        !            68: <H2>Allocation</h2>
        !            69: The collector includes its own memory allocator.  The allocator obtains
        !            70: memory from the system in a platform-dependent way.  Under UNIX, it
        !            71: uses either <TT>malloc</tt>, <TT>sbrk</tt>, or <TT>mmap</tt>.
        !            72: <P>
        !            73: Most static data used by the allocator, as well as that needed by the
        !            74: rest of the garbage collector is stored inside the
        !            75: <TT>_GC_arrays</tt> structure.
        !            76: This allows the garbage collector to easily ignore the collectors own
        !            77: data structures when it searches for root pointers.  Other allocator
        !            78: and collector internal data structures are allocated dynamically
        !            79: with <TT>GC_scratch_alloc</tt>. <TT>GC_scratch_alloc</tt> does not
        !            80: allow for deallocation, and is therefore used only for permanent data
        !            81: structures.
        !            82: <P>
        !            83: The allocator allocates objects of different <I>kinds</i>.
        !            84: Different kinds are handled somewhat differently by certain parts
        !            85: of the garbage collector.  Certain kinds are scanned for pointers,
        !            86: others are not.  Some may have per-object type descriptors that
        !            87: determine pointer locations.  Or a specific kind may correspond
        !            88: to one specific object layout.  Two built-in kinds are uncollectable.
        !            89: One (<TT>STUBBORN</tt>) is immutable without special precautions.
        !            90: In spite of that, it is very likely that most applications currently
        !            91: use at most two kinds: <TT>NORMAL</tt> and <TT>PTRFREE</tt> objects.
        !            92: <P>
        !            93: The collector uses a two level allocator.  A large block is defined to
        !            94: be one larger than half of <TT>HBLKSIZE</tt>, which is a power of 2,
        !            95: typically on the order of the page size.
        !            96: <P>
        !            97: Large block sizes are rounded up to
        !            98: the next multiple of <TT>HBLKSIZE</tt> and then allocated by
        !            99: <TT>GC_allochblk</tt>.  This uses roughly what Paul Wilson has termed
        !           100: a "next fit" algorithm, i.e. first-fit with a rotating pointer.
        !           101: The implementation does check for a better fitting immediately
        !           102: adjacent block, which gives it somewhat better fragmentation characteristics.
        !           103: I'm now convinced it should use a best fit algorithm.  The actual
        !           104: implementation of <TT>GC_allochblk</tt>
        !           105: is significantly complicated by black-listing issues
        !           106: (see below).
        !           107: <P>
        !           108: Small blocks are allocated in blocks of size <TT>HBLKSIZE</tt>.
        !           109: Each block is
        !           110: dedicated to only one object size and kind.  The allocator maintains
        !           111: separate free lists for each size and kind of object.
        !           112: <P>
        !           113: In order to avoid allocating blocks for too many distinct object sizes,
        !           114: the collector normally does not directly allocate objects of every possible
        !           115: request size.  Instead request are rounded up to one of a smaller number
        !           116: of allocated sizes, for which free lists are maintained.  The exact
        !           117: allocated sizes are computed on demand, but subject to the constraint
        !           118: that they increase roughly in geometric progression.  Thus objects
        !           119: requested early in the execution are likely to be allocated with exactly
        !           120: the requested size, subject to alignment constraints.
        !           121: See <TT>GC_init_size_map</tt> for details.
        !           122: <P>
        !           123: The actual size rounding operation during small object allocation is
        !           124: implemented as a table lookup in <TT>GC_size_map</tt>.
        !           125: <P>
        !           126: Both collector initialization and computation of allocated sizes are
        !           127: handled carefully so that they do not slow down the small object fast
        !           128: allocation path.  An attempt to allocate before the collector is initialized,
        !           129: or before the appropriate <TT>GC_size_map</tt> entry is computed,
        !           130: will take the same path as an allocation attempt with an empty free list.
        !           131: This results in a call to the slow path code (<TT>GC_generic_malloc_inner</tt>)
        !           132: which performs the appropriate initialization checks.
        !           133: <P>
        !           134: In non-incremental mode, we make a decision about whether to garbage collect
        !           135: whenever an allocation would otherwise have failed with the current heap size.
        !           136: If the total amount of allocation since the last collection is less than
        !           137: the heap size divided by <TT>GC_free_space_divisor</tt>, we try to
        !           138: expand the heap.  Otherwise, we initiate a garbage collection.  This ensures
        !           139: that the amount of garbage collection work per allocated byte remains
        !           140: constant.
        !           141: <P>
        !           142: The above is in fat an oversimplification of the real heap expansion
        !           143: heuristic, which adjusts slightly for root size and certain kinds of
        !           144: fragmentation.  In particular, programs with a large root set size and
        !           145: little live heap memory will expand the heap to amortize the cost of
        !           146: scanning the roots.
        !           147: <P>
        !           148: Versions 5.x of the collector actually collect more frequently in
        !           149: nonincremental mode.  The large block allocator usually refuses to split
        !           150: large heap blocks once the garbage collection threshold is
        !           151: reached.  This often has the effect of collecting well before the
        !           152: heap fills up, thus reducing fragmentation and working set size at the
        !           153: expense of GC time.  6.x will chose an intermediate strategy depending
        !           154: on how much large object allocation has taken place in the past.
        !           155: (If the collector is configured to unmap unused pages, versions 6.x
        !           156: will use the 5.x strategy.)
        !           157: <P>
        !           158: (It has been suggested that this should be adjusted so that we favor
        !           159: expansion if the resulting heap still fits into physical memory.
        !           160: In many cases, that would no doubt help.  But it is tricky to do this
        !           161: in a way that remains robust if multiple application are contending
        !           162: for a single pool of physical memory.)
        !           163:
        !           164: <H2>Mark phase</h2>
        !           165:
        !           166: The marker maintains an explicit stack of memory regions that are known
        !           167: to be accessible, but that have not yet been searched for contained pointers.
        !           168: Each stack entry contains the starting address of the block to be scanned,
        !           169: as well as a descriptor of the block.  If no layout information is
        !           170: available for the block, then the descriptor is simply a length.
        !           171: (For other possibilities, see <TT>gc_mark.h</tt>.)
        !           172: <P>
        !           173: At the beginning of the mark phase, all root segments are pushed on the
        !           174: stack by <TT>GC_push_roots</tt>.  If <TT>ALL_INTERIOR_PTRS</tt> is not
        !           175: defined, then stack roots require special treatment.  In this case, the
        !           176: normal marking code ignores interior pointers, but <TT>GC_push_all_stack</tt>
        !           177: explicitly checks for interior pointers and pushes descriptors for target
        !           178: objects.
        !           179: <P>
        !           180: The marker is structured to allow incremental marking.
        !           181: Each call to <TT>GC_mark_some</tt> performs a small amount of
        !           182: work towards marking the heap.
        !           183: It maintains
        !           184: explicit state in the form of <TT>GC_mark_state</tt>, which
        !           185: identifies a particular sub-phase.  Some other pieces of state, most
        !           186: notably the mark stack, identify how much work remains to be done
        !           187: in each sub-phase.  The normal progression of mark states for
        !           188: a stop-the-world collection is:
        !           189: <OL>
        !           190: <LI> <TT>MS_INVALID</tt> indicating that there may be accessible unmarked
        !           191: objects.  In this case <TT>GC_objects_are_marked</tt> will simultaneously
        !           192: be false, so the mark state is advanced to
        !           193: <LI> <TT>MS_PUSH_UNCOLLECTABLE</tt> indicating that it suffices to push
        !           194: uncollectable objects, roots, and then mark everything reachable from them.
        !           195: <TT>Scan_ptr</tt> is advanced through the heap until all uncollectable
        !           196: objects are pushed, and objects reachable from them are marked.
        !           197: At that point, the next call to <TT>GC_mark_some</tt> calls
        !           198: <TT>GC_push_roots</tt> to push the roots.  It the advances the
        !           199: mark state to
        !           200: <LI> <TT>MS_ROOTS_PUSHED</tt> asserting that once the mark stack is
        !           201: empty, all reachable objects are marked.  Once in this state, we work
        !           202: only on emptying the mark stack.  Once this is completed, the state
        !           203: changes to
        !           204: <LI> <TT>MS_NONE</tt> indicating that reachable objects are marked.
        !           205: </ol>
        !           206:
        !           207: The core mark routine <TT>GC_mark_from_mark_stack</tt>, is called
        !           208: repeatedly by several of the sub-phases when the mark stack starts to fill
        !           209: up.  It is also called repeatedly in <TT>MS_ROOTS_PUSHED</tt> state
        !           210: to empty the mark stack.
        !           211: The routine is designed to only perform a limited amount of marking at
        !           212: each call, so that it can also be used by the incremental collector.
        !           213: It is fairly carefully tuned, since it usually consumes a large majority
        !           214: of the garbage collection time.
        !           215: <P>
        !           216: The marker correctly handles mark stack overflows.  Whenever the mark stack
        !           217: overflows, the mark state is reset to <TT>MS_INVALID</tt>.
        !           218: Since there are already marked objects in the heap,
        !           219: this eventually forces a complete
        !           220: scan of the heap, searching for pointers, during which any unmarked objects
        !           221: referenced by marked objects are again pushed on the mark stack.  This
        !           222: process is repeated until the mark phase completes without a stack overflow.
        !           223: Each time the stack overflows, an attempt is made to grow the mark stack.
        !           224: All pieces of the collector that push regions onto the mark stack have to be
        !           225: careful to ensure forward progress, even in case of repeated mark stack
        !           226: overflows.  Every mark attempt results in additional marked objects.
        !           227: <P>
        !           228: Each mark stack entry is processed by examining all candidate pointers
        !           229: in the range described by the entry.  If the region has no associated
        !           230: type information, then this typically requires that each 4-byte aligned
        !           231: quantity (8-byte aligned with 64-bit pointers) be considered a candidate
        !           232: pointer.
        !           233: <P>
        !           234: We determine whether a candidate pointer is actually the address of
        !           235: a heap block.  This is done in the following steps:
        !           236: <NL>
        !           237: <LI> The candidate pointer is checked against rough heap bounds.
        !           238: These heap bounds are maintained such that all actual heap objects
        !           239: fall between them.  In order to facilitate black-listing (see below)
        !           240: we also include address regions that the heap is likely to expand into.
        !           241: Most non-pointers fail this initial test.
        !           242: <LI> The candidate pointer is divided into two pieces; the most significant
        !           243: bits identify a <TT>HBLKSIZE</tt>-sized page in the address space, and
        !           244: the least significant bits specify an offset within that page.
        !           245: (A hardware page may actually consist of multiple such pages.
        !           246: HBLKSIZE is usually the page size divided by a small power of two.)
        !           247: <LI>
        !           248: The page address part of the candidate pointer is looked up in a
        !           249: <A HREF="tree.html">table</a>.
        !           250: Each table entry contains either 0, indicating that the page is not part
        !           251: of the garbage collected heap, a small integer <I>n</i>, indicating
        !           252: that the page is part of large object, starting at least <I>n</i> pages
        !           253: back, or a pointer to a descriptor for the page.  In the first case,
        !           254: the candidate pointer i not a true pointer and can be safely ignored.
        !           255: In the last two cases, we can obtain a descriptor for the page containing
        !           256: the beginning of the object.
        !           257: <LI>
        !           258: The starting address of the referenced object is computed.
        !           259: The page descriptor contains the size of the object(s)
        !           260: in that page, the object kind, and the necessary mark bits for those
        !           261: objects.  The size information can be used to map the candidate pointer
        !           262: to the object starting address.  To accelerate this process, the page header
        !           263: also contains a pointer to a precomputed map of page offsets to displacements
        !           264: from the beginning of an object.  The use of this map avoids a
        !           265: potentially slow integer remainder operation in computing the object
        !           266: start address.
        !           267: <LI>
        !           268: The mark bit for the target object is checked and set.  If the object
        !           269: was previously unmarked, the object is pushed on the mark stack.
        !           270: The descriptor is read from the page descriptor.  (This is computed
        !           271: from information <TT>GC_obj_kinds</tt> when the page is first allocated.)
        !           272: </nl>
        !           273: <P>
        !           274: At the end of the mark phase, mark bits for left-over free lists are cleared,
        !           275: in case a free list was accidentally marked due to a stray pointer.
        !           276:
        !           277: <H2>Sweep phase</h2>
        !           278:
        !           279: At the end of the mark phase, all blocks in the heap are examined.
        !           280: Unmarked large objects are immediately returned to the large object free list.
        !           281: Each small object page is checked to see if all mark bits are clear.
        !           282: If so, the entire page is returned to the large object free list.
        !           283: Small object pages containing some reachable object are queued for later
        !           284: sweeping.
        !           285: <P>
        !           286: This initial sweep pass touches only block headers, not
        !           287: the blocks themselves.  Thus it does not require significant paging, even
        !           288: if large sections of the heap are not in physical memory.
        !           289: <P>
        !           290: Nonempty small object pages are swept when an allocation attempt
        !           291: encounters an empty free list for that object size and kind.
        !           292: Pages for the correct size and kind are repeatedly swept until at
        !           293: least one empty block is found.  Sweeping such a page involves
        !           294: scanning the mark bit array in the page header, and building a free
        !           295: list linked through the first words in the objects themselves.
        !           296: This does involve touching the appropriate data page, but in most cases
        !           297: it will be touched only just before it is used for allocation.
        !           298: Hence any paging is essentially unavoidable.
        !           299: <P>
        !           300: Except in the case of pointer-free objects, we maintain the invariant
        !           301: that any object in a small object free list is cleared (except possibly
        !           302: for the link field).  Thus it becomes the burden of the small object
        !           303: sweep routine to clear objects.  This has the advantage that we can
        !           304: easily recover from accidentally marking a free list, though that could
        !           305: also be handled by other means.  The collector currently spends a fair
        !           306: amount of time clearing objects, and this approach should probably be
        !           307: revisited.
        !           308: <P>
        !           309: In most configurations, we use specialized sweep routines to handle common
        !           310: small object sizes.  Since we allocate one mark bit per word, it becomes
        !           311: easier to examine the relevant mark bits if the object size divides
        !           312: the word length evenly.  We also suitably unroll the inner sweep loop
        !           313: in each case.  (It is conceivable that profile-based procedure cloning
        !           314: in the compiler could make this unnecessary and counterproductive.  I
        !           315: know of no existing compiler to which this applies.)
        !           316: <P>
        !           317: The sweeping of small object pages could be avoided completely at the expense
        !           318: of examining mark bits directly in the allocator.  This would probably
        !           319: be more expensive, since each allocation call would have to reload
        !           320: a large amount of state (e.g. next object address to be swept, position
        !           321: in mark bit table) before it could do its work.  The current scheme
        !           322: keeps the allocator simple and allows useful optimizations in the sweeper.
        !           323:
        !           324: <H2>Finalization</h2>
        !           325: Both <TT>GC_register_disappearing_link</tt> and
        !           326: <TT>GC_register_finalizer</tt> add the request to a corresponding hash
        !           327: table.  The hash table is allocated out of collected memory, but
        !           328: the reference to the finalizable object is hidden from the collector.
        !           329: Currently finalization requests are processed non-incrementally at the
        !           330: end of a mark cycle.
        !           331: <P>
        !           332: The collector makes an initial pass over the table of finalizable objects,
        !           333: pushing the contents of unmarked objects onto the mark stack.
        !           334: After pushing each object, the marker is invoked to mark all objects
        !           335: reachable from it.  The object itself is not explicitly marked.
        !           336: This assures that objects on which a finalizer depends are neither
        !           337: collected nor finalized.
        !           338: <P>
        !           339: If in the process of marking from an object the
        !           340: object itself becomes marked, we have uncovered
        !           341: a cycle involving the object.  This usually results in a warning from the
        !           342: collector.  Such objects are not finalized, since it may be
        !           343: unsafe to do so.  See the more detailed
        !           344: <A HREF="finalization.html"> discussion of finalization semantics</a>.
        !           345: <P>
        !           346: Any objects remaining unmarked at the end of this process are added to
        !           347: a queue of objects whose finalizers can be run.  Depending on collector
        !           348: configuration, finalizers are dequeued and run either implicitly during
        !           349: allocation calls, or explicitly in response to a user request.
        !           350: <P>
        !           351: The collector provides a mechanism for replacing the procedure that is
        !           352: used to mark through objects.  This is used both to provide support for
        !           353: Java-style unordered finalization, and to ignore certain kinds of cycles,
        !           354: <I>e.g.</i> those arising from C++ implementations of virtual inheritance.
        !           355:
        !           356: <H2>Generational Collection and Dirty Bits</h2>
        !           357: We basically use the parallel and generational GC algorithm described in
        !           358: <A HREF="papers/pldi91.ps.gz">"Mostly Parallel Garbage Collection"</a>,
        !           359: by Boehm, Demers, and Shenker.
        !           360: <P>
        !           361: The most significant modification is that
        !           362: the collector always runs in the allocating thread.
        !           363: There is no separate garbage collector thread.
        !           364: If an allocation attempt either requests a large object, or encounters
        !           365: an empty small object free list, and notices that there is a collection
        !           366: in progress, it immediately performs a small amount of marking work
        !           367: as described above.
        !           368: <P>
        !           369: This change was made both because we wanted to easily accommodate
        !           370: single-threaded environments, and because a separate GC thread requires
        !           371: very careful control over the scheduler to prevent the mutator from
        !           372: out-running the collector, and hence provoking unneeded heap growth.
        !           373: <P>
        !           374: In incremental mode, the heap is always expanded when we encounter
        !           375: insufficient space for an allocation.  Garbage collection is triggered
        !           376: whenever we notice that more than
        !           377: <TT>GC_heap_size</tt>/2 * <TT>GC_free_space_divisor</tt>
        !           378: bytes of allocation have taken place.
        !           379: After <TT>GC_full_freq</tt> minor collections a major collection
        !           380: is started.
        !           381: <P>
        !           382: All collections initially run interrupted until a predetermined
        !           383: amount of time (50 msecs by default) has expired.  If this allows
        !           384: the collection to complete entirely, we can avoid correcting
        !           385: for data structure modifications during the collection.  If it does
        !           386: not complete, we return control to the mutator, and perform small
        !           387: amounts of additional GC work during those later allocations that
        !           388: cannot be satisfied from small object free lists. When marking completes,
        !           389: the set of modified pages is retrieved, and we mark once again from
        !           390: marked objects on those pages, this time with the mutator stopped.
        !           391: <P>
        !           392: We keep track of modified pages using one of three distinct mechanisms:
        !           393: <OL>
        !           394: <LI>
        !           395: Through explicit mutator cooperation.  Currently this requires
        !           396: the use of <TT>GC_malloc_stubborn</tt>.
        !           397: <LI>
        !           398: By write-protecting physical pages and catching write faults.  This is
        !           399: implemented for many Unix-like systems and for win32.  It is not possible
        !           400: in a few environments.
        !           401: <LI>
        !           402: By retrieving dirty bit information from /proc.  (Currently only Sun's
        !           403: Solaris supports this.  Though this is considerably cleaner, performance
        !           404: may actually be better with mprotect and signals.)
        !           405: </ol>
        !           406:
        !           407: <H2>Thread support</h2>
        !           408: We support several different threading models.  Unfortunately Pthreads,
        !           409: the only reasonably well standardized thread model, supports too narrow
        !           410: an interface for conservative garbage collection.  There appears to be
        !           411: no portable way to allow the collector to coexist with various Pthreads
        !           412: implementations.  Hence we currently support only a few of the more
        !           413: common Pthreads implementations.
        !           414: <P>
        !           415: In particular, it is very difficult for the collector to stop all other
        !           416: threads in the system and examine the register contents.  This is currently
        !           417: accomplished with very different mechanisms for different Pthreads
        !           418: implementations.  The Solaris implementation temporarily disables much
        !           419: of the user-level threads implementation by stopping kernel-level threads
        !           420: ("lwp"s).  The Irix implementation sends signals to individual Pthreads
        !           421: and has them wait in the signal handler.  The Linux implementation
        !           422: is similar in spirit to the Irix one.
        !           423: <P>
        !           424: The Irix implementation uses
        !           425: only documented Pthreads calls, but relies on extensions to their semantics,
        !           426: notably the use of mutexes and condition variables from signal
        !           427: handlers.  The Linux implementation should be far closer to
        !           428: portable, though impirically it is not completely portable.
        !           429: <P>
        !           430: All implementations must
        !           431: intercept thread creation and a few other thread-specific calls to allow
        !           432: enumeration of threads and location of thread stacks.  This is current
        !           433: accomplished with <TT># define</tt>'s in <TT>gc.h</tt>, or optionally
        !           434: by using ld's function call wrapping mechanism under Linux.
        !           435: <P>
        !           436: Comments are appreciated.  Please send mail to
        !           437: <A HREF="mailto:boehm@acm.org"><TT>boehm@acm.org</tt></a>
        !           438: </body>

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>