Re: VM: killing... (and Oops!)

From: Greg Baker (greg.baker@amd.com)
Date: Fri Apr 21 2000 - 17:17:32 EDT

  • Next message: Alexander Oelzant: "[PATCH] symlink-hack for smbfs under linux"

    I'd like to thank everyone for their comments and offer current status:

    It seems according to memtest (http://reality.sgi.com/
    cbrady_denver/memtest86/) I do have some bad memory. I had originally
    thought of this, and used the VA Linux burn-in software
    (ftp://ftp.varesearch.com/pub/software/Cerberus) to test (which
    returned no error results). I guess don't trust www.crucial.com for
    memory. Should have stuck to mushkin.

    Unfortunately I'll be in Washington DC for a week starting today, so
    won't have a chance to get my hands-on for further debugging until
    then.

    Thanks for pointing out memtest. Hopefully I can isolate at least 1
    good chip so I can do my benchmarks/testing. If anybody else has done
    large (~ process > 150MB, 4 hour+ run-time) Mentor Calibre jobs on
    Linux, please let me know about success, failures, and caveats.

    Thanks,

    --Greg

    FYI, during the heavy load I introduced today I got 4 Oops too!

    OOPS 1:
    pr 21 12:26:48 case kernel: Unable to handle kernel paging request at
    virtual address 00ddff28
    Apr 21 12:26:48 case kernel: current->tss.cr3 = 2bddb000, %cr3 =
    2bddb000
    Apr 21 12:26:48 case kernel: *pde = 00000000
    Apr 21 12:26:48 case kernel: Oops: 0000
    Apr 21 12:26:48 case kernel: CPU: 0
    Apr 21 12:26:48 case kernel: EIP: 0010:[del_timer+10/59]
    Apr 21 12:26:48 case kernel: EFLAGS: 00010046
    Apr 21 12:26:48 case kernel: eax: 2bddb000 ebx: 00000246
    ecx: 00ddff24 edx: ecd249c0
    Apr 21 12:26:48 case kernel: esi: 000250bf edi: 00000007
    ebp: ebddff0c esp: ebddff10
    Apr 21 12:26:48 case kernel: ds: 0018 es: 0018 ss: 0018
    Apr 21 12:26:48 case kernel: Process sshd (pid: 977, process nr: 165,
    stackpage=ebddf000)
    Apr 21 12:26:48 case kernel: Stack: c0111294 00ddff24 ebddff24
    00000000 00000040 00000000 00000000 000250bf
    Apr 21 12:26:48 case kernel: ebdde000 c0110f04 00000000
    c012e480 00000004 00000026 00000007 ed4fefa8
    Apr 21 12:26:48 case kernel: 00000104 00000007 ebdde000
    00000001 00000000 d5b73000 c012e927 00000007
    Apr 21 12:26:48 case kernel: Call Trace: [schedule_timeout+108/134]
    [process_timeout+0/15] [do_select+154/529] [sys_select+816/1134]
    [system_call+52/56]
    Apr 21 12:26:48 case kernel: Code: 8b 51 04 85 d2 74 12 8b 01 89 02 85
    c0 74 03 89 50 04 b8 01
    Apr 21 12:49:10 case kernel: md: md1: sync done.

    OOPS 2:

    Apr 21 14:57:32 case kernel: Unable to handle kernel NULL pointer
    dereference at virtual address 00000350
    Apr 21 14:57:32 case kernel: current->tss.cr3 = 00101000, %cr3 =
    00101000
    Apr 21 14:57:32 case kernel: *pde = 00000000
    Apr 21 14:57:32 case kernel: Oops: 0002
    Apr 21 14:57:32 case kernel: CPU: 0
    Apr 21 14:57:32 case kernel: EIP: 0010:[kmem_cache_free+205/368]
    Apr 21 14:57:32 case kernel: EFLAGS: 00010046
    Apr 21 14:57:32 case kernel: eax: 00000340 ebx: e4983fd0
    ecx: dc008fe0 edx: 00000340
    Apr 21 14:57:32 case kernel: esi: efeff740 edi: 00000286
    ebp: 00000031 esp: efed9f74
    Apr 21 14:57:32 case kernel: ds: 0018 es: 0018 ss: 0018
    Apr 21 14:57:32 case kernel: Process kswapd (pid: 5, process nr: 5,
    stackpage=efed9000)
    Apr 21 14:57:32 case kernel: Stack: dc008f90 c06d1f98 dc008fdc
    efed9fac c0129069 efeff740 dc008f90 dc008f90
    Apr 21 14:57:32 case kernel: dc008f90 c0129dab dc008f90
    dc008f90 c06d1f98 00000bfb 00000030 00000008
    Apr 21 14:57:32 case kernel: c011e2b2 c06d1f98 00000010
    00000006 c012365a 00000006 00000030 efed8000
    Apr 21 14:57:32 case kernel: Call
    Trace: [put_unused_buffer_head+33/76] [try_to_free_buffers+71/128]
    [shrink_mmap+218/300] [do_try_to_free_pages+42/124] [tvecs+7278/13856]
    [kswapd+107/164] [get_options+0/112]
    Apr 21 14:57:32 case kernel: [kernel_thread+35/48]
    Apr 21 14:57:32 case kernel: Code: 89 48 10 89 0e eb 9c 8d 74 26 00 57
    9d 56 53 68 67 5f 1e c0

    OOPS 3 & 4 happened crashed the system and I didn't copy the screen
    dump down.

    On Fri, 21 Apr 2000, bert hubert wrote:

    |I took the liberty to forward your message to linux-raid:
    |
    |----- Forwarded message from "Georg P. Israel" <georg@web0.redwave.net> -----
    |
    |Date: Fri, 21 Apr 2000 22:00:35 +0200
    |From: "Georg P. Israel" <georg@web0.redwave.net>
    |To: linux-raid@vger.rutgers.edu, vince@digex.net
    |Subject: Re: [gbaker@hendrix.amd.com: Re: VM: killing...]
    |
    |Vince,
    |
    |I'm pretty sure you have some bad memory modules in you machine.
    |Make a mem test e.g. memtest86
    |to be sure that your memory is ok.
    |
    |
    |Georg
    |<g.israel@ieee.org>

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.rutgers.edu
    Please read the FAQ at http://www.tux.org/lkml/



    This archive was generated by hypermail 2b29 : Fri Apr 21 2000 - 17:22:34 EDT