[BACK]Return to debugging.html CVS log [TXT][DIR] Up to [local] / OpenXM_contrib2 / asir2000 / gc / doc

Annotation of OpenXM_contrib2/asir2000/gc/doc/debugging.html, Revision 1.1

1.1     ! noro        1: <HTML>
        !             2: <HEAD>
        !             3: <TITLE>Debugging Garbage Collector Related Problems</title>
        !             4: </head>
        !             5: <BODY>
        !             6: <H1>Debugging Garbage Collector Related Problems</h1>
        !             7: This page contains some hints on
        !             8: debugging issues specific to
        !             9: the Boehm-Demers-Weiser conservative garbage collector.
        !            10: It applies both to debugging issues in client code that manifest themselves
        !            11: as collector misbehavior, and to debugging the collector itself.
        !            12: <P>
        !            13: If you suspect a bug in the collector itself, it is strongly recommended
        !            14: that you try the latest collector release, even if it is labelled as "alpha",
        !            15: before proceeding.
        !            16: <H2>Bus Errors and Segmentation Violations</h2>
        !            17: <P>
        !            18: If the fault occurred in GC_find_limit, or with incremental collection enabled,
        !            19: this is probably normal.  The collector installs handlers to take care of
        !            20: these.  You will not see these unless you are using a debugger.
        !            21: Your debugger <I>should</i> allow you to continue.
        !            22: It's often preferable to tell the debugger to ignore SIGBUS and SIGSEGV
        !            23: ("<TT>handle SIGSEGV SIGBUS nostop noprint</tt>" in gdb,
        !            24: "<TT>ignore SIGSEGV SIGBUS</tt>" in most versions of dbx)
        !            25: and set a breakpoint in <TT>abort</tt>.
        !            26: The collector will call abort if the signal had another cause,
        !            27: and there was not other handler previously installed.
        !            28: <P>
        !            29: We recommend debugging without incremental collection if possible.
        !            30: (This applies directly to UNIX systems.
        !            31: Debugging with incremental collection under win32 is worse.  See README.win32.)
        !            32: <P>
        !            33: If the application generates an unhandled SIGSEGV or equivalent, it may
        !            34: often be easiest to set the environment variable GC_LOOP_ON_ABORT.  On many
        !            35: platforms, this will cause the collector to loop in a handler when the
        !            36: SIGSEGV is encountered (or when the collector aborts for some other reason),
        !            37: and a debugger can then be attached to the looping
        !            38: process.  This sidesteps common operating system problems related
        !            39: to incomplete core files for multithreaded applications, etc.
        !            40: <H2>Other Signals</h2>
        !            41: On most platforms, the multithreaded version of the collector needs one or
        !            42: two other signals for internal use by the collector in stopping threads.
        !            43: It is normally wise to tell the debugger to ignore these.  On Linux,
        !            44: the collector currently uses SIGPWR and SIGXCPU by default.
        !            45: <H2>Warning Messages About Needing to Allocate Blacklisted Blocks</h2>
        !            46: The garbage collector generates warning messages of the form
        !            47: <PRE>
        !            48: Needed to allocate blacklisted block at 0x...
        !            49: </pre>
        !            50: when it needs to allocate a block at a location that it knows to be
        !            51: referenced by a false pointer.  These false pointers can be either permanent
        !            52: (<I>e.g.</i> a static integer variable that never changes) or temporary.
        !            53: In the latter case, the warning is largely spurious, and the block will
        !            54: eventually be reclaimed normally.
        !            55: In the former case, the program will still run correctly, but the block
        !            56: will never be reclaimed.  Unless the block is intended to be
        !            57: permanent, the warning indicates a memory leak.
        !            58: <OL>
        !            59: <LI>Ignore these warnings while you are using GC_DEBUG.  Some of the routines
        !            60: mentioned below don't have debugging equivalents.  (Alternatively, write
        !            61: the missing routines and send them to me.)
        !            62: <LI>Replace allocator calls that request large blocks with calls to
        !            63: <TT>GC_malloc_ignore_off_page</tt> or
        !            64: <TT>GC_malloc_atomic_ignore_off_page</tt>.  You may want to set a
        !            65: breakpoint in <TT>GC_default_warn_proc</tt> to help you identify such calls.
        !            66: Make sure that a pointer to somewhere near the beginning of the resulting block
        !            67: is maintained in a (preferably volatile) variable as long as
        !            68: the block is needed.
        !            69: <LI>
        !            70: If the large blocks are allocated with realloc, we suggest instead allocating
        !            71: them with something like the following.  Note that the realloc size increment
        !            72: should be fairly large (e.g. a factor of 3/2) for this to exhibit reasonable
        !            73: performance.  But we all know we should do that anyway.
        !            74: <PRE>
        !            75: void * big_realloc(void *p, size_t new_size)
        !            76: {
        !            77:     size_t old_size = GC_size(p);
        !            78:     void * result;
        !            79:
        !            80:     if (new_size <= 10000) return(GC_realloc(p, new_size));
        !            81:     if (new_size <= old_size) return(p);
        !            82:     result = GC_malloc_ignore_off_page(new_size);
        !            83:     if (result == 0) return(0);
        !            84:     memcpy(result,p,old_size);
        !            85:     GC_free(p);
        !            86:     return(result);
        !            87: }
        !            88: </pre>
        !            89:
        !            90: <LI> In the unlikely case that even relatively small object
        !            91: (&lt;20KB) allocations are triggering these warnings, then your address
        !            92: space contains lots of "bogus pointers", i.e. values that appear to
        !            93: be pointers but aren't.  Usually this can be solved by using GC_malloc_atomic
        !            94: or the routines in gc_typed.h to allocate large pointer-free regions of bitmaps, etc.  Sometimes the problem can be solved with trivial changes of encoding
        !            95: in certain values.  It is possible, to identify the source of the bogus
        !            96: pointers by building the collector with <TT>-DPRINT_BLACK_LIST</tt>,
        !            97: which will cause it to print the "bogus pointers", along with their location.
        !            98:
        !            99: <LI> If you get only a fixed number of these warnings, you are probably only
        !           100: introducing a bounded leak by ignoring them.  If the data structures being
        !           101: allocated are intended to be permanent, then it is also safe to ignore them.
        !           102: The warnings can be turned off by calling GC_set_warn_proc with a procedure
        !           103: that ignores these warnings (e.g. by doing absolutely nothing).
        !           104: </ol>
        !           105:
        !           106: <H2>The Collector References a Bad Address in <TT>GC_malloc</tt></h2>
        !           107:
        !           108: This typically happens while the collector is trying to remove an entry from
        !           109: its free list, and the free list pointer is bad because the free list link
        !           110: in the last allocated object was bad.
        !           111: <P>
        !           112: With &gt; 99% probability, you wrote past the end of an allocated object.
        !           113: Try setting <TT>GC_DEBUG</tt> before including <TT>gc.h</tt> and
        !           114: allocating with <TT>GC_MALLOC</tt>.  This will try to detect such
        !           115: overwrite errors.
        !           116:
        !           117: <H2>Unexpectedly Large Heap</h2>
        !           118:
        !           119: Unexpected heap growth can be due to one of the following:
        !           120: <OL>
        !           121: <LI> Data structures that are being unintentionally retained.  This
        !           122: is commonly caused by data structures that are no longer being used,
        !           123: but were not cleared, or by caches growing without bounds.
        !           124: <LI> Pointer misidentification.  The garbage collector is interpreting
        !           125: integers or other data as pointers and retaining the "referenced"
        !           126: objects.
        !           127: <LI> Heap fragmentation.  This should never result in unbounded growth,
        !           128: but it may account for larger heaps.  This is most commonly caused
        !           129: by allocation of large objects.  On some platforms it can be reduced
        !           130: by building with -DUSE_MUNMAP, which will cause the collector to unmap
        !           131: memory corresponding to pages that have not been recently used.
        !           132: <LI> Per object overhead.  This is usually a relatively minor effect, but
        !           133: it may be worth considering.  If the collector recognizes interior
        !           134: pointers, object sizes are increased, so that one-past-the-end pointers
        !           135: are correctly recognized.  The collector can be configured not to do this
        !           136: (<TT>-DDONT_ADD_BYTE_AT_END</tt>).
        !           137: <P>
        !           138: The collector rounds up object sizes so the result fits well into the
        !           139: chunk size (<TT>HBLKSIZE</tt>, normally 4K on 32 bit machines, 8K
        !           140: on 64 bit machines) used by the collector.   Thus it may be worth avoiding
        !           141: objects of size 2K + 1 (or 2K if a byte is being added at the end.)
        !           142: </ol>
        !           143: The last two cases can often be identified by looking at the output
        !           144: of a call to <TT>GC_dump()</tt>.  Among other things, it will print the
        !           145: list of free heap blocks, and a very brief description of all chunks in
        !           146: the heap, the object sizes they correspond to, and how many live objects
        !           147: were found in the chunk at the last collection.
        !           148: <P>
        !           149: Growing data structures can usually be identified by
        !           150: <OL>
        !           151: <LI> Building the collector with <TT>-DKEEP_BACK_PTRS</tt>,
        !           152: <LI> Preferably using debugging allocation (defining <TT>GC_DEBUG</tt>
        !           153: before including <TT>gc.h</tt> and allocating with <TT>GC_MALLOC</tt>),
        !           154: so that objects will be identified by their allocation site,
        !           155: <LI> Running the application long enough so
        !           156: that most of the heap is composed of "leaked" memory, and
        !           157: <LI> Then calling <TT>GC_generate_random_backtrace()</tt> from backptr.h
        !           158: a few times to determine why some randomly sampled objects in the heap are
        !           159: being retained.
        !           160: </ol>
        !           161: <P>
        !           162: The same technique can often be used to identify problems with false
        !           163: pointers, by noting whether the reference chains printed by
        !           164: <TT>GC_generate_random_backtrace()</tt> involve any misidentified pointers.
        !           165: An alternate technique is to build the collector with
        !           166: <TT>-DPRINT_BLACK_LIST</tt> which will cause it to report values that
        !           167: are almost, but not quite, look like heap pointers.  It is very likely that
        !           168: actual false pointers will come from similar sources.
        !           169: <P>
        !           170: In the unlikely case that false pointers are an issue, it can usually
        !           171: be resolved using one or more of the following techniques:
        !           172: <OL>
        !           173: <LI> Use <TT>GC_malloc_atomic</tt> for objects containing no pointers.
        !           174: This is especially important for large arrays containing compressed data,
        !           175: pseudo-random numbers, and the like.  It is also likely to improve GC
        !           176: performance, perhaps drastically so if the application is paging.
        !           177: <LI> If you allocate large objects containing only
        !           178: one or two pointers at the beginning, either try the typed allocation
        !           179: primitives is <TT>gc_typed.h</tt>, or separate out the pointerfree component.
        !           180: <LI> Consider using <TT>GC_malloc_ignore_off_page()</tt>
        !           181: to allocate large objects.  (See <TT>gc.h</tt> and above for details.
        !           182: Large means &gt; 100K in most environments.)
        !           183: </ol>
        !           184: <H2>Prematurely Reclaimed Objects</h2>
        !           185: The usual symptom of this is a segmentation fault, or an obviously overwritten
        !           186: value in a heap object.  This should, of course, be impossible.  In practice,
        !           187: it may happen for reasons like the following:
        !           188: <OL>
        !           189: <LI> The collector did not intercept the creation of threads correctly in
        !           190: a multithreaded application, <I>e.g.</i> because the client called
        !           191: <TT>pthread_create</tt> without including <TT>gc.h</tt>, which redefines it.
        !           192: <LI> The last pointer to an object in the garbage collected heap was stored
        !           193: somewhere were the collector couldn't see it, <I>e.g.</i> in an
        !           194: object allocated with system <TT>malloc</tt>, in certain types of
        !           195: <TT>mmap</tt>ed files,
        !           196: or in some data structure visible only to the OS.  (On some platforms,
        !           197: thread-local storage is one of these.)
        !           198: <LI> The last pointer to an object was somehow disguised, <I>e.g.</i> by
        !           199: XORing it with another pointer.
        !           200: <LI> Incorrect use of <TT>GC_malloc_atomic</tt> or typed allocation.
        !           201: <LI> An incorrect <TT>GC_free</tt> call.
        !           202: <LI> The client program overwrote an internal garbage collector data structure.
        !           203: <LI> A garbage collector bug.
        !           204: <LI> (Empirically less likely than any of the above.) A compiler optimization
        !           205: that disguised the last pointer.
        !           206: </ol>
        !           207: The following relatively simple techniques should be tried first to narrow
        !           208: down the problem:
        !           209: <OL>
        !           210: <LI> If you are using the incremental collector try turning it off for
        !           211: debugging.
        !           212: <LI> If you are using shared libraries, try linking statically.  If that works,
        !           213: ensure that DYNAMIC_LOADING is defined on your platform.
        !           214: <LI> Try to reproduce the problem with fully debuggable unoptimized code.
        !           215: This will eliminate the last possibility, as well as making debugging easier.
        !           216: <LI> Try replacing any suspect typed allocation and <TT>GC_malloc_atomic</tt>
        !           217: calls with calls to <TT>GC_malloc</tt>.
        !           218: <LI> Try removing any GC_free calls (<I>e.g.</i> with a suitable
        !           219: <TT>#define</tt>).
        !           220: <LI> Rebuild the collector with <TT>-DGC_ASSERTIONS</tt>.
        !           221: <LI> If the following works on your platform (i.e. if gctest still works
        !           222: if you do this), try building the collector with
        !           223: <TT>-DREDIRECT_MALLOC=GC_malloc_uncollectable</tt>.  This will cause
        !           224: the collector to scan memory allocated with malloc.
        !           225: </ol>
        !           226: If all else fails, you will have to attack this with a debugger.
        !           227: Suggested steps:
        !           228: <OL>
        !           229: <LI> Call <TT>GC_dump()</tt> from the debugger around the time of the failure.  Verify
        !           230: that the collectors idea of the root set (i.e. static data regions which
        !           231: it should scan for pointers) looks plausible.  If not, i.e. if it doesn't
        !           232: include some static variables, report this as
        !           233: a collector bug.  Be sure to describe your platform precisely, since this sort
        !           234: of problem is nearly always very platform dependent.
        !           235: <LI> Especially if the failure is not deterministic, try to isolate it to
        !           236: a relatively small test case.
        !           237: <LI> Set a break point in <TT>GC_finish_collection</tt>.  This is a good
        !           238: point to examine what has been marked, i.e. found reachable, by the
        !           239: collector.
        !           240: <LI> If the failure is deterministic, run the process
        !           241: up to the last collection before the failure.
        !           242: Note that the variable <TT>GC_gc_no</tt> counts collections and can be used
        !           243: to set a conditional breakpoint in the right one.  It is incremented just
        !           244: before the call to GC_finish_collection.
        !           245: If object <TT>p</tt> was prematurely recycled, it may be helpful to
        !           246: look at <TT>*GC_find_header(p)</tt> at the failure point.
        !           247: The <TT>hb_last_reclaimed</tt> field will identify the collection number
        !           248: during which its block was last swept.
        !           249: <LI> Verify that the offending object still has its correct contents at
        !           250: this point.
        !           251: The call <TT>GC_is_marked(p)</tt> from the debugger to verify that the
        !           252: object has not been marked, and is about to be reclaimed.
        !           253: <LI> Determine a path from a root, i.e. static variable, stack, or
        !           254: register variable,
        !           255: to the reclaimed object.  Call <TT>GC_is_marked(q)</tt> for each object
        !           256: <TT>q</tt> along the path, trying to locate the first unmarked object, say
        !           257: <TT>r</tt>.
        !           258: <LI> If <TT>r</tt> is pointed to by a static root,
        !           259: verify that the location
        !           260: pointing to it is part of the root set printed by <TT>GC_dump()</tt>.  If it
        !           261: is on the stack in the main (or only) thread, verify that
        !           262: <TT>GC_stackbottom</tt> is set correctly to the base of the stack.  If it is
        !           263: in another thread stack, check the collector's thread data structure
        !           264: (<TT>GC_thread[]</tt> on several platforms) to make sure that stack bounds
        !           265: are set correctly.
        !           266: <LI> If <TT>r</tt> is pointed to by heap object <TT>s</tt>, check that the
        !           267: collector's layout description for <TT>s</tt> is such that the pointer field
        !           268: will be scanned.  Call <TT>*GC_find_header(s)</tt> to look at the descriptor
        !           269: for the heap chunk.  The <TT>hb_descr</tt> field specifies the layout
        !           270: of objects in that chunk.  See gc_mark.h for the meaning of the descriptor.
        !           271: (If it's low order 2 bits are zero, then it is just the length of the
        !           272: object prefix to be scanned.  This form is always used for objects allocated
        !           273: with <TT>GC_malloc</tt> or <TT>GC_malloc_atomic</tt>.)
        !           274: <LI> If the failure is not deterministic, you may still be able to apply some
        !           275: of the above technique at the point of failure.  But remember that objects
        !           276: allocated since the last collection will not have been marked, even if the
        !           277: collector is functioning properly.  On some platforms, the collector
        !           278: can be configured to save call chains in objects for debugging.
        !           279: Enabling this feature will also cause it to save the call stack at the
        !           280: point of the last GC in GC_arrays._last_stack.
        !           281: <LI> When looking at GC internal data structures remember that a number
        !           282: of <TT>GC_</tt><I>xxx</i> variables are really macro defined to
        !           283: <TT>GC_arrays._</tt><I>xxx</i>, so that
        !           284: the collector can avoid scanning them.
        !           285: </ol>
        !           286: </body>
        !           287: </html>
        !           288:
        !           289:
        !           290:
        !           291:

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>