Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP


 
Open Source @ HP all of HP US
HP.com home

Open Source and Linux from HP

Primed for Business Advantage
» 

Large Enterprise Business

» Products
» Business & IT services
» Solutions
»

Open source & Linux

» Platforms & printers
» Linux distributions
» Indemnity
» Support matrices
» Security certifications
» Solutions portfolio
» HP Open Source Middleware Stacks
» Documentation
» Services & education
» Open source at HP
» Partner programs
1-888-475-4689
Content starts here

HP TechBriefs

 
Dilip Daya Using Kexec and Kdump
by Dilip Daya
 
This TechBrief provides information for configuring/enabling kdump as a crash dumping solution.

» Overview
» Setup/Installation and brief dump analysis (SLES10 and Fedora Core 6 - RHEL5)
» References and Bonus section » About the author


Overview

This article describes the setup process of enabling kdump as a crash dumping solution in SLES10 and Fedora Core 6 (RHEL5) environments. Acceptance of kexec (set of system calls: kexec_load(), ...) into the base 2.6.13 kernel enabled the creation of a powerful system debug facility, i.e. kdump. It forms the basis for doing production time debugging as well as being an invaluable aid to the developer and also customer production environments. Kdump is a kexec based crash dumping mechanism for Linux.

Recent 2.6 kernels (2.6.13 onwards) can set aside some memory for a "dump-capture / crash-dump kernel" which we soft-boot when we crash. /sbin/kexec is a user space utility for loading another kernel (dump-capture / crash-dump kernel) and asking the currently running kernel (crash kernel) to do something with it. The crash-dump kernel re-initializes all the hardware it needs, boots into the system, and writes the dump. The dump file is a standard ELF core file with some annotations. A currently running crash kernel may be asked to start the loaded kernel on reboot, or to start the loaded kernel after it panics. The panic case is useful for having an intact kernel for writing crash dumps.

Kexec is a fastboot mechanism that allows booting a Linux kernel from the context of an already running kernel without going through system boot firmware. System boot firmware can be very time consuming, especially on big servers with numerous peripherals. This can save a lot of time for developers who end up booting a machine numerous times.

Kdump is a new kernel crash dumping mechanism and is intended (or expected) to be more reliable than previous dump implementations. The crash dump is captured from the context of a freshly booted kernel and not from the context of the crashed kernel. Kdump uses kexec to boot into a second kernel (dump-capture / crash-dump kernel) whenever the system crashes. This second kernel, often called a capture kernel, boots with very little memory and captures the dump image. The first kernel reserves a section of memory that the second kernel uses to boot. Kexec enables booting the capture kernel without going through system boot firmware hence the contents of the first kernel's memory are preserved, which is essentially the kernel crash dump.

The exec kernel system call breaks up into three pieces:
  1. A generic part which loads the new kernel from the current address space, and very carefully places the data in the allocated pages.
  2. A generic part that interacts with the kernel and tells all of the devices to shut down. Preventing on-going DMAs, and placing the devices in a consistent state so a later kernel can reinitialize them.
  3. A machine specific part that includes the syscall number and the copies the image to its final destination. And jumps into the image at entry.
Figure 1

After successfully loading the crash-dump-capture kernel, the system will reboot into the crash-dump-capture kernel if a system crash is triggered. Trigger points are located in:
  • panic()
  • die() - If die() is called, and it happens to be a thread with pid 0 or 1, or die() is called inside interrupt context or die() is called and panic_on_oops is set, the system will boot into the dump-capture kernel.
  • die_nmi() - If a hard lockup is detected and "NMI watchdog" is configured, the system will boot into the dump-capture kernel.
  • SysRq handler (ALT-SysRq-c)
Issues:
  • kexec does not sync, or unmount filesystems so if you need that to happen you need to do that yourself.
  • Device drivers -- some drivers may simply ignore the request to shutdown, others may be overzealous, and deactivate the device in question completely, and some may leave the device in a state from which it cannot be brought back to life, be this either because the state itself is incorrect or irrecoverable, or because the driver simply does not know how to resume from this specific state. This might result in a driver initialization failure in capture kernel.
  • Some corner case hangs/lockups may not trigger a crash dump, i.e. interesting reading:
    Evaluating Linux Kernel Crash Dumping Mechanisms:
    - http://lkdtt.sourceforge.net/docs/ols2006_lkdtt.pdf
    - http://lkdtt.sourceforge.net/results/kdump/kdump_results.html
Future work (TODO) for kexec-tools-1.101:
/usr/share/doc/kexec-tools-1.101/TODO has:
  • Restore enough state that DOS/arbitrary BIOS calls can be run on some platforms. Currently disk- related calls are quite likely to fail.
  • Merge reboot via kexec functionality into /sbin/reboot
  • In the kexec-on-panic case preserving memory the both kernels must use.
  • Finish the kexec-on-panic case.
  • Improve the documentation
  • Add support for loading a boot sector
  • Autobuilding of initramfs
    ###
  • Provide a kernel pages filtering mechanism, so core file size is not extreme on systems with huge memory banks.
  • Relocatable kernel can help in maintaining multiple kernels for crash_dump, and the same kernel as the system kernel can be used to capture the dump. Currently, the standard kernel and capture kernel (kernel-kdump) are two different entities, but work is underway to make the standard kernel relocatable (within memory), and thus usable as a capture kernel, eliminating the need for a separate kdump kernel.
kexec/kdump support for ia64 is also work-in-progress.

Partial man(8) page for kexec - directly boot into a new kernel
SYNOPSIS

/sbin/kexec [-v (--version)] [-f (--force)] [-x (--no-ifdown)] [-l (--load)] [-p (--load-panic)] [-u (--unload)] [-e (--exec)] [-t (--type)] [--mem-min=addr] [--mem-max=addr]

DESCRIPTION
kexec is a system call that enables you to load and boot into another kernel from the currently running kernel. kexec performs the function of the boot loader from within the kernel. The primary difference between a standard system boot and a kexec boot is that the hardware initialization normally performed by the BIOS or firmware (depending on architecture) is not performed during a kexec boot. This has the effect of reducing the time required for a reboot.

Make sure you have selected CONFIG_KEXEC=y when configuring the kernel.
The CONFIG_KEXEC option enables the kexec system call.

USAGE
Using kexec consists of
  1. loading the kernel to be rebooted to into memory, and
  2. actually rebooting to the pre-loaded kernel.
To load a kernel, the syntax is as follows:
kexec -l kernel-image --append=command-line-options -initrd=initrd-image
where kernel-image is the kernel file that you intend to reboot to.
Note: Compressed kernel images such as bzImage are not supported by kexec. Use the uncompressed vmlinux.
....
Setup and installation (unfortunately due to time restraints, my testing was performed on a laptop)

SLES 10 kexec/kdump install/setup and crash dump analysis

sles10~# cat /etc/SuSE-release
SUSE Linux Enterprise Server 10 (i586)
VERSION = 10

sles10:~ # uname -a
Linux sles10 2.6.16.21-0.8-default #1 Mon Jul 3 18:25:39 UTC 2006 i686 i686 i386 GNU/Linux

Be sure that you have installed the kexec-tools, kernel-kdump and crash rpm. It is not necessary to have a kdump kernel revision that matches the running kernel. The kdump kernel can be any revision. In my case below I had kernel-kdump-2.6.16.21-0.8 and found that newer running kernels worked with kernel-kdump-2.6.16.21-0.8.

sles10:~ # rpm -qa | grep kexec kexec-tools-1.101-32.14 sles10:~ # rpm -qa | grep kernel kernel-default-2.6.16.21-0.8 kernel-xen-2.6.16.21-0.8 kernel-syms-2.6.16.21-0.8 kernel-debug-2.6.16.21-0.8 kernel-source-2.6.16.21-0.8 kernel-kdump-2.6.16.21-0.8 sles10:~ # rpm -qa | grep crash crash-4.0-25.4 sles10:~ # chkconfig kdump on sles10:~ # chkconfig --list | grep kdump kdump 0:off 1:on 2:on 3:on 4:off 5:on 6:off

To enable a crash dump, you need to add an option to the boot loader to specify the size and offset of the recovery kernel memory area. An example of this boot loader option is "crashkernel=64M@16M". The 64M shows the reserved space for the Kdump recovery kernel, and the 16M is the address of the reserved area. You can add this option either with the YaST boot loader module, or by manually editing the boot loader configuration file.

The recommended values by architecture for the "crashkernel" option are:
i386: crashkernel=64M@16M
x86_64: crashkernel=64M@16M

sles10:~ # cat /proc/cmdline root=/dev/hda1 vga=0x314 resume=/dev/hda2 splash=silent showopts crashkernel=64M@16M sles10:~ # cat /proc/iomem 00000000-0009fbff : System RAM 0009fc00-0009ffff : reserved 000a0000-000bffff : Video RAM area 000c0000-000cffff : Video ROM 000d0000-000d17ff : Adapter ROM 000f0000-000fffff : System ROM 00100000-2ffcffff : System RAM 00100000-0027478d : Kernel code 0027478e-0030cc47 : Kernel data 01000000-04ffffff : Crash kernel <<< Crash-dump-capture kernel location 2ffd0000-2fff0bff : reserved ....

/etc/init.d/kdump script:
The kdump init script provides the support necessary for loading a kdump capture-kernel into memory at system bootup time, and for copying away a vmcore (/proc/vmcore to /var/log/dump//vmcore) at system panic time. /etc/init.d/kdump script extract:

coredir="${KDUMP_SAVEDIR}/`date +"%Y-%m-%d-%H:%M"`" mkdir -p $coredir echo -n "Saving crash dump to $coredir" cp --sparse=always /proc/vmcore $coredir/vmcore

Crashdump can be triggered by sysrq-c. Core files end up in /var/log/dump/ Example: /var/log/dump/2006-10-30-19:01/vmcore

sles10~# sync;sync;sync
sles10~# echo c > /proc/sysrq-trigger

When a crash happens in a graphical environment, you will likely have no GUI in the second kernel boot. If you used a VGA console, you might still have visual output from the secondary kernel. The default behavior of the Kdump script is to save the old vmcore image, and then reboot the system immediately.

The crash-dump-capture kernel soft-boots, saves /proc/vmcore and reboots as follows:

...
Saving crash dump to /var/log/dump/2006-10-30-19:01    done
INIT: Switching to runlevel: 6
INIT: Sending processes the TERM signal
...

..after system reboot...

sles10~# pwd /var/log/dump/2006-10-30-19:01 sles10:/var/log/dump/2006-10-30-19:01 # ll -h total 694M -r-------- 1 root root 704M Oct 30 19:02 vmcore sles10:/var/log/dump/2006-10-30-19:01 # crash /boot/System.map-2.6.16.21-0.8-default \ > /boot/vmlinux-2.6.16.21-0.8-debug vmcore ... crash: /boot/vmlinux-2.6.16.21-0.8-debug: no debugging data available ...same for: crash: /boot/vmlinux-2.6.16.21-0.8-default: no debugging data available

Since I did not have a SLES10 debuginfo kernel, I compiled one as follows:

sles10:/usr/src # ll linux lrwxrwxrwx 1 root root 19 2006-08-15 10:20 linux -> linux-2.6.16.21-0.8 sles10:/var/log/dump/2006-10-30-19:01 # cd /usr/src/linux sles10:/usr/src/linux # sles10:/usr/src/linux # make mrproper sles10:/usr/src/linux # cp /boot/config-2.6.16.21-0.8-default .config sles10:/usr/src/linux # cp Makefile Makefile_orig ...edit the Makefile, add "-g" in the CFLAGS line... sles10:/usr/src/linux # diff Makefile Makefile_orig 308c308 < CFLAGS := -g -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \ --- > CFLAGS := -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \ sles10:/usr/src/linux # make oldconfig sles10:/usr/src/linux # make vmlinux <<< Build the bare kernel, i.e. /usr/src/linux/vmlinux

If you do locate the appropriate debuginfo kernel rpm: kernel-default-debuginfo-2.6.16.21-0.8.i586.rpm, install this debuginfo rpm and then use /boot/vmlinux-2.6.16.21-0.8-default in your crash command below.
sles10:/var/log/dump/2006-10-30-19:01 # crash -s /usr/src/linux/vmlinux vmcore
... or ...

sles10:/var/log/dump/2006-10-30-19:01 # crash /boot/System.map-2.6.16.21-0.8-default \ > /usr/src/linux/vmlinux vmcore crash 4.0-25.4 Copyright (C) 2002, 2003, 2004, 2005, 2006 Red Hat, Inc. Copyright (C) 2004, 2005, 2006 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co ... ... GNU gdb 6.1 ... ... This GDB was configured as "i686-pc-linux-gnu"... SYSTEM MAP: /boot/System.map-2.6.16.21-0.8-default DEBUG KERNEL: /usr/src/linux/vmlinux (2.6.16.21-0.8-default) DUMPFILE: vmcore CPUS: 1 DATE: Mon Oct 30 19:01:22 2006 UPTIME: 02:36:33 LOAD AVERAGE: 0.04, 0.03, 0.00 TASKS: 109 NODENAME: sles10 RELEASE: 2.6.16.21-0.8-default VERSION: #1 Mon Jul 3 18:25:39 UTC 2006 MACHINE: i686 (1196 Mhz) MEMORY: 767.8 MB PANIC: "SysRq : Trigger a crashdump" PID: 7104 COMMAND: "bash" TASK: efa66ab0 [THREAD_INFO: e80e8000] CPU: 0 STATE: TASK_RUNNING (SYSRQ) crash> bt PID: 7104 TASK: efa66ab0 CPU: 0 COMMAND: "bash" #0 [e80e9f10] crash_kexec at c012f230 #1 [e80e9f54] __handle_sysrq at c01f4fd1 #2 [e80e9f78] write_sysrq_trigger at c0175bee #3 [e80e9f84] vfs_write at c014b202 #4 [e80e9f9c] sys_write at c014b71b #5 [e80e9fb8] sysenter_entry at c0102994 EAX: 00000004 EBX: 00000001 ECX: b7ced000 EDX: 00000002 DS: 007b ESI: 00000002 ES: 007b EDI: b7ced000 SS: 007b ESP: bfaaa490 EBP: bfaaa4bc CS: 0073 EIP: ffffe410 ERR: 00000004 EFLAGS: 00000246 crash> ? * files mod runq union alias foreach mount search vm ascii fuser net set vtop bt gdb p sig waitq btop help ps struct whatis dev irq pte swap wr dis kmem ptob sym q eval list ptov sys exit log rd task extend mach repeat timer crash version: 4.0-25.4 gdb version: 6.1 For help on any command above, enter "help <command>". For help on input options, enter "help input". For help on output options, enter "help output".

Fedora Core 6 (RHEL5) kexec/kdump install/setup and crash dump analysis

[root@localhost ~]# cat /etc/redhat-release Fedora Core release 6 (Zod) [root@localhost ~]# uname -a Linux localhost.localdomain 2.6.18-1.2798.fc6 #1 SMP Mon Oct 16 14:37:32 EDT 2006 i686 i686 i386 GNU/Linux [root@localhost ~]# cat /proc/cmdline ro root=LABEL=/ crashkernel=64M@16M [root@localhost ~]# rpm -qa | grep kernel kernel-devel-2.6.18-1.2798.fc6 kernel-2.6.18-1.2798.fc6 kernel-headers-2.6.18-1.2798.fc6 kernel-kdump-2.6.18-1.2798.fc6 kernel-debuginfo-2.6.18-1.2798.fc6 kernel-debuginfo-common-2.6.18-1.2798.fc6 [root@localhost ~]# rpm -qa | grep kexec kexec-tools-1.101-51.fc6 [root@localhost ~]# rpm -qa | grep crash crash-4.0-3.3 [root@localhost sysconfig]# chkconfig --list | grep kdump kdump 0:off 1:off 2:on 3:on 4:on 5:on 6:off [root@localhost sysconfig]# service kdump status Kdump is operational [root@localhost ~]# cat /proc/iomem 00000000-0009fbff : System RAM 0009fc00-0009ffff : reserved 000a0000-000bffff : Video RAM area 000c0000-000cffff : Video ROM 000d0000-000d17ff : Adapter ROM 000f0000-000fffff : System ROM 00100000-2ffcffff : System RAM 00400000-006168ba : Kernel code 006168bb-006ef8cb : Kernel data 01000000-04ffffff : Crash kernel <<< Crash-dump-capture kernel location 2ffd0000-2fff0bff : reserved ... [root@localhost ~]# cat /sys/kernel/kexec_crash_loaded 1 <<< Shows kexec crash-dump-capture kernel is loaded [root@localhost ~]# cat /proc/sys/kernel/sysrq 1

/etc/init.d/kdump script:
The kdump init script provides the support necessary for loading a kdump capture-kernel into memory at system bootup time, and for copying away a vmcore (/proc/vmcore to /var/crash//vmcore) at system panic time. /etc/init.d/kdump script extract:

... coredir="/var/crash/`date +"%Y-%m-%d-%H:%M"`" mkdir -p $coredir cp /proc/vmcore $coredir/vmcore ...

[root@localhost 2006-10-31-19:14]# echo "c" > /proc/sysrq-trigger
This causes the kernel to panic, followed by the soft-boot system restarting into the kdump capture-kernel. When the boot process gets to the point where it starts the kdump service, the vmcore should be automatically copied, from /proc/vmcore, out to disk (by default, to /var/crash//vmcore). The system then reboots back into the normal kernel.

...system reboots...

[root@localhost 2006-10-31-19:14]# pwd /var/crash/2006-10-31-19:14 [root@localhost 2006-10-31-19:14]# ll -h total 678M -r-------- 1 root root 704M Oct 31 19:14 vmcore

In the normal kernel, the previously installed crash kernel can be used in conjunction with the previously installed kernel-debuginfo to perform postmortem analysis.

[root@localhost 2006-10-31-19:14]# crash -s /usr/lib/debug/lib/modules/2.6.18-1.2798.fc6/vmlinux > vmcore

... or ...

[root@localhost 2006-10-31-19:14]# crash /boot/System.map-2.6.18-1.2798.fc6 \ > /usr/lib/debug/lib/modules/2.6.18-1.2798.fc6/vmlinux vmcore crash 4.0-3.3 ... Copyright (C) 1999-2006 Hewlett-Packard Co ... GNU gdb 6.1 ... This GDB was configured as "i686-pc-linux-gnu"... SYSTEM MAP: /boot/System.map-2.6.18-1.2798.fc6 DEBUG KERNEL: /usr/lib/debug/lib/modules/2.6.18-1.2798.fc6/vmlinux (2.6.18-1.2798.fc6) DUMPFILE: vmcore CPUS: 1 DATE: Tue Oct 31 19:13:23 2006 UPTIME: 00:11:34 LOAD AVERAGE: 0.05, 0.30, 0.32 TASKS: 121 NODENAME: localhost.localdomain RELEASE: 2.6.18-1.2798.fc6 VERSION: #1 SMP Mon Oct 16 14:37:32 EDT 2006 MACHINE: i686 (1196 Mhz) MEMORY: 767.8 MB PANIC: "SysRq : Trigger a crashdump" PID: 2779 COMMAND: "bash" TASK: efddf100 [THREAD_INFO: cf9e4000] CPU: 0 STATE: TASK_RUNNING (SYSRQ) crash> bt PID: 2779 TASK: efddf100 CPU: 0 COMMAND: "bash" #0 [cf9e4f00] crash_kexec at c0445f97 #1 [cf9e4f44] sysrq_handle_crashdump at c053ea04 #2 [cf9e4f4c] __handle_sysrq at c053e93a #3 [cf9e4f74] write_sysrq_trigger at c04a1f5c #4 [cf9e4f80] vfs_write at c046f803 #5 [cf9e4f9c] sys_write at c046fe2d #6 [cf9e4fb8] system_call at c040400c EAX: 00000004 EBX: 00000001 ECX: b7fe5000 EDX: 00000002 DS: 007b ESI: 00000002 ES: 007b EDI: b7fe5000 SS: 007b ESP: bfff88ec EBP: bfff890c CS: 0073 EIP: 002cc402 ERR: 00000004 EFLAGS: 00000246 crash> ? * files mod runq union alias foreach mount search vm ascii fuser net set vtop bt gdb p sig waitq btop help ps struct whatis dev irq pte swap wr dis kmem ptob sym q eval list ptov sys exit log rd task extend mach repeat timer crash version: 4.0-3.3 gdb version: 6.1 For help on any command above, enter "help <command>". For help on input options, enter "help input". For help on output options, enter "help output".

Debian GNU/Linux 'etch' release

The code name for the next major Debian GNU/Linux release after "sarge" is "etch". This release started as a copy of "sarge", and is currently in a state called "testing".

The following testing is done for kexec/kdump on "etch". kexec/kdump automation support is not yet integrated in "etch" as shown above in SLES10 and FC6/RHEL5, therefore I have manual steps listed:

debian:~# cat /etc/issue Debian GNU/Linux testing/unstable \n \l debian:~# uname -a Linux debian 2.6.17-2-686 #1 SMP Wed Sep 13 16:34:10 UTC 2006 i686 GNU/Linux debian:~# cat /proc/version Linux version 2.6.17-2-686 (Debian 2.6.17-9) (waldi@debian.org) (gcc version 4.1.2 20060901 (prerelease) (Debian 4.1.1-13)) #1 SMP Wed Sep 13 16:34:10 UTC 2006 debian:~# cat /proc/cmdline root=/dev/hdb1 ro crashkernel=64M@16M debian:~# cat /proc/iomem | grep Crash 01000000-04ffffff : Crash kernel debian:~# dpkg -l | grep linux-image ii linux-image-2.6-686 2.6.17+2 Linux kernel 2.6 image on PPro/Celeron/PII/P ii linux-image-2.6.16-2-686 2.6.16-18 Linux kernel 2.6.16 image on PPro/Celeron/PI ii linux-image-2.6.17-2-686 2.6.17-9 Linux 2.6.17 image on PPro/Celeron/PII/PIII/ debian:~# dpkg -l | grep linux-source ii linux-source-2.6.17 2.6.17-9 Linux kernel source for version 2.6.17 with debian:~# dpkg -l | grep kexec ii kexec-tools 1.101-kdump10-2 kexec tool debian:~# dpkg -l | grep crash ii crash 4.0-3.4-1 kernel debugging utility, allowing gdb like debian:/usr/src# ll -h total 39M lrwxrwxrwx 1 root src 20 2006-11-02 18:22 linux -> linux-source-2.6.17/ drwxr-xr-x 17 root root 4.0K 2006-11-02 14:14 linux-headers-2.6.17-2/ drwxr-xr-x 4 root root 4.0K 2006-11-02 14:14 linux-headers-2.6.17-2-686/ drwxr-xr-x 3 root root 4.0K 2006-11-02 14:14 linux-kbuild-2.6.17/ drwxr-xr-x 19 root root 4.0K 2006-11-04 09:45 linux-source-2.6.17/ -rw-r--r-- 1 root root 39M 2006-09-13 11:52 linux-source-2.6.17.tar.bz2

SLES10 and FC6/RHEL5 provides a crash-dump-kernel (kernel-kdump-???), whereas it does not yet exist for Debian "etch", therefore I had to manually compile one for use with kexec/kdump as follows:

debian:/usr/src# cd linux debian:/usr/src/linux# debian:/usr/src/linux# make mrproper debian:/usr/src/linux# cp /boot/config-2.6.17-2-686 .config

...make the following changes to .config via make menuconfig...

debian:/usr/src/linux# diff .config /boot/config-2.6.17-2-686 4c4 < # Sat Nov 4 09:24:00 2006 --- > # Wed Sep 13 15:43:05 2006 201,202c201,202 < CONFIG_CRASH_DUMP=y < CONFIG_PHYSICAL_START=0x1000000 --- > # CONFIG_CRASH_DUMP is not set > CONFIG_PHYSICAL_START=0x100000 3158d3157 < CONFIG_PROC_VMCORE=y

debian:/usr/src/linux# cp Makefile Makefile_orig
...make the following changes to Makefile...

debian:/usr/src/linux# diff Makefile Makefile_orig 310c310 < CFLAGS := -g -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \ --- > CFLAGS := -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \ debian:/usr/src/linux# make kernelversion 2.6.17 debian:/usr/src/linux# make kernelrelease 2.6.17 debian:/usr/src/linux# make oldconfig debian:/usr/src/linux# make vmlinux debian:/usr/src/linux# make modules debian:/usr/src/linux# make modules_install debian:/usr/src/linux# mkinitrd -o /boot/initrd.img-2.6.17 2.6.17 debian:/usr/src/linux# cp vmlinux /boot/vmlinux-2.6.17-kdump

The kernel /boot/vmlinux-2.6.17-kdump can now be used with kexec as a crash-dump-capture kernel.

I created a script to kexec load the kdump kernel...

debian:~# cat kexec.sh kexec -p /boot/vmlinux-2.6.17-kdump --args-linux \ --append="root=/dev/hdb1 init 1 irqpoll maxcpus=1" \ --initrd=/boot/initrd.img-2.6.17 debian:~# sh kexec.sh debian:~#

If no error messages, then kdump kernel kexec loaded.
We are now ready to perform test crash dump.

debian:~# cat /proc/sys/kernel/sysrq 1 debian:~# mkdir /var/log/dump debian:~# sync;sync;sync debian:~# echo c > /proc/sysrq-trigger

...system soft-boots into kdump kernel into runlevel 1 where you enter a password for maintenance...

...on the physical console...
debian:~# cp /proc/vmcore /var/log/dump/vmcore
...may take a while to copy since its dependent on size of RAM...
...once cp is complete you can 'ls -lh /var/log/dump' to confirm.
...reboot the system to load original kernel...

debian:~# exit
...exit out kdump kernel and then reboot system into original kernel...
...system reboots...
debian:~# cd /var/log/dump/ debian:/var/log/dump# ll -h total 320M -r-------- 1 root root 320M 2006-11-04 11:28 vmcore debian:/var/log/dump# crash /boot/System.map-2.6.17-2-686 /boot/vmlinux-2.6.17-kdump vmcore crash 4.0-3.4 ... GNU gdb 6.1 ... This GDB was configured as "i686-pc-linux-gnu"... SYSTEM MAP: /boot/System.map-2.6.17-2-686 DEBUG KERNEL: /boot/vmlinux-2.6.17-kdump (2.6.17) DUMPFILE: vmcore CPUS: 1 DATE: Sat Nov 4 11:24:29 2006 UPTIME: 00:40:17 LOAD AVERAGE: 0.01, 0.01, 0.01 TASKS: 120 NODENAME: debian RELEASE: 2.6.17-2-686 VERSION: #1 SMP Wed Sep 13 16:34:10 UTC 2006 MACHINE: i686 (800 Mhz) MEMORY: 384 MB PANIC: "<6>SysRq : Trigger a crashdump" PID: 4617 COMMAND: "bash" TASK: b8527030 [THREAD_INFO: b9358000] CPU: 0 STATE: TASK_RUNNING (SYSRQ) crash> bt PID: 4617 TASK: b8527030 CPU: 0 COMMAND: "bash" #0 [b9359ec4] crash_kexec at b01374f0 #1 [b9359f04] crash_kexec at b01374f9 #2 [b9359f50] __handle_sysrq at b01ff183 #3 [b9359f78] write_sysrq_trigger at b0180bd7 #4 [b9359f84] vfs_write at b0153377 #5 [b9359f9c] sys_write at b015395e #6 [b9359fb8] system_call at b0102b48 EAX: 00000004 EBX: 00000001 ECX: 080f5c08 EDX: 00000002 DS: 007b ESI: 00000002 ES: 007b EDI: 080f5c08 SS: 007b ESP: aff597b0 EBP: aff597cc CS: 0073 EIP: a7f22d2e ERR: 00000004 EFLAGS: 00000246 crash> ? * files mod runq union alias foreach mount search vm ascii fuser net set vtop bt gdb p sig waitq btop help ps struct whatis dev irq pte swap wr dis kmem ptob sym q eval list ptov sys exit log rd task extend mach repeat timer crash version: 4.0-3.4 gdb version: 6.1 For help on any command above, enter "help <command>". For help on input options, enter "help input".

Conclusion

All in all, it can be said that as far as kernel crash dumping is concerned Linux is heading in the right direction. Kdump is robust and most of the remaining issues are being dealt with.

References

[0] /usr/share/doc/kernel-doc-2.6.../Documentation/kdump/kdump.txt (Fedora Core 6)
[1] http://www.xmission.com/~ebiederm/files/kexec/README
[2] Kexec history: class="udrline"http://lwn.net/Articles/15468/
[3] http://lse.sourceforge.net/kdump/
[4] /usr/share/doc/packages/kexec-tools/README.SUSE (SLES10)
[5] http://www.novell.com/support/search.do?cmd=displayKC&docType=kc&externalId= 3374462&sliceId=SAL_Public (SLES10)
[6] http://ftp.suse.com/pub/people/tiwai/kdump-training/kdump-training.pdf (SLES10)
[7] Evaluating Linux Kernel Crash Dumping Mechanisms:
    - http://lkdtt.sourceforge.net/docs/ols2006_lkdtt.pdf
    - http://lkdtt.sourceforge.net/results/kdump/kdump_results.html
[8] Crash analysis utility: http://people.redhat.com/anderson/crash_whitepaper/

About the author

Dilip Daya is a Project Lead and Lead Technical Engineer in HP Services, Worldwide Linux Level 3 support. Dilip enjoys working on Linux solutions and is part of the Worldwide Linux Ambassadors team which is built around key HP solutions, technologies and innovation that are worldwide partnerships of field, region and division experts. Dilip is based in Atlanta, Georgia.

Was this article useful? Tell us what you think!
Printable version
Privacy statement Using this site means you accept its terms Feedback to Webmaster
© 2007 Hewlett-Packard Development Company, L.P.