 |
» |
|
|
 |
HP TechBriefs |
 |
 |
| |
 |
Using Kexec and Kdump
by Dilip Daya |
|
|
This TechBrief provides information for configuring/enabling kdump as a
crash dumping solution.
Overview
This article describes the setup process of enabling kdump as a crash dumping solution in SLES10 and Fedora Core 6 (RHEL5) environments. Acceptance of kexec (set of system calls: kexec_load(), ...) into the base 2.6.13 kernel enabled the creation of a powerful system debug facility, i.e. kdump. It forms the basis for doing production time debugging as well as being an invaluable aid to the developer and also customer production environments. Kdump is a kexec based crash dumping mechanism for Linux.
Recent 2.6 kernels (2.6.13 onwards) can set aside some memory for a "dump-capture / crash-dump kernel" which we soft-boot when we crash. /sbin/kexec is a user space utility for loading another kernel (dump-capture / crash-dump kernel) and asking the currently running kernel (crash kernel) to do something with it. The crash-dump kernel re-initializes all the hardware it needs, boots into the system, and writes the dump. The dump file is a standard ELF core file with some annotations. A currently running crash kernel may be asked to start the loaded kernel on reboot, or to start the loaded kernel after it panics. The panic case is useful for having an intact kernel for writing crash dumps.
Kexec is a fastboot mechanism that allows booting a Linux kernel from the context of an already running kernel without going through system boot firmware. System boot firmware can be very time consuming, especially on big servers with numerous peripherals. This can save a lot of time for developers who end up booting a machine numerous times.
Kdump is a new kernel crash dumping mechanism and is intended (or expected) to be more reliable than previous dump implementations. The crash dump is captured from the context of a freshly booted kernel and not from the context of the crashed kernel. Kdump uses kexec to boot into a second kernel (dump-capture / crash-dump kernel) whenever the system crashes. This second kernel, often called a capture kernel, boots with very little memory and captures the dump image. The first kernel reserves a section of memory that the second kernel uses to boot. Kexec enables booting the capture kernel without going through system boot firmware hence the contents of the first kernel's memory are preserved, which is essentially the kernel crash dump.
The exec kernel system call breaks up into three pieces:
- A generic part which loads the new kernel from the current address space, and very carefully places the data in the allocated pages.
- A generic part that interacts with the kernel and tells all of the devices to shut down. Preventing on-going DMAs, and placing the devices in a consistent state so a later kernel can reinitialize them.
- A machine specific part that includes the syscall number and the copies the image to its final destination. And jumps into the image at entry.

After successfully loading the crash-dump-capture kernel, the system will reboot into the crash-dump-capture kernel if a system crash is triggered. Trigger points are located in:
- panic()
- die() - If die() is called, and it happens to be a thread with pid 0 or 1, or die() is called inside interrupt context or die() is called and panic_on_oops is set, the system will boot into the dump-capture kernel.
- die_nmi() - If a hard lockup is detected and "NMI watchdog" is configured, the system will boot into the dump-capture kernel.
- SysRq handler (ALT-SysRq-c)
Issues:
- kexec does not sync, or unmount filesystems so if you need that to happen you need to do that yourself.
- Device drivers -- some drivers may simply ignore the request to shutdown, others may be overzealous, and deactivate the device in question completely, and some may leave the device in a state from which it cannot be brought back to life, be this either because the state itself is incorrect or irrecoverable, or because the driver simply does not know how to resume from this specific state. This might result in a driver initialization failure in capture kernel.
- Some corner case hangs/lockups may not trigger a crash dump, i.e. interesting reading:
Evaluating Linux Kernel Crash Dumping Mechanisms:
- http://lkdtt.sourceforge.net/docs/ols2006_lkdtt.pdf
- http://lkdtt.sourceforge.net/results/kdump/kdump_results.html
Future work (TODO) for kexec-tools-1.101:
/usr/share/doc/kexec-tools-1.101/TODO has:
- Restore enough state that DOS/arbitrary BIOS calls can be run on some platforms. Currently disk- related calls are quite likely to fail.
- Merge reboot via kexec functionality into /sbin/reboot
- In the kexec-on-panic case preserving memory the both kernels must use.
- Finish the kexec-on-panic case.
- Improve the documentation
- Add support for loading a boot sector
- Autobuilding of initramfs
###
- Provide a kernel pages filtering mechanism, so core file size is not extreme on systems with huge memory banks.
- Relocatable kernel can help in maintaining multiple kernels for crash_dump, and the same kernel as the system kernel can be used to capture the dump. Currently, the standard kernel and capture kernel (kernel-kdump) are two different entities, but work is underway to make the standard kernel relocatable (within memory), and thus usable as a capture kernel, eliminating the need for a separate kdump kernel.
kexec/kdump support for ia64 is also work-in-progress.
Partial man(8) page for kexec - directly boot into a new kernel
SYNOPSIS
|
/sbin/kexec [-v (--version)] [-f (--force)] [-x (--no-ifdown)]
[-l (--load)] [-p (--load-panic)] [-u (--unload)] [-e (--exec)]
[-t (--type)] [--mem-min=addr] [--mem-max=addr]
|
DESCRIPTION
kexec is a system call that enables you to load and boot into another kernel from the currently running kernel. kexec performs the function of the boot loader from within the kernel. The primary difference between a standard system boot and a kexec boot is that the hardware initialization normally performed by the BIOS or firmware (depending on architecture) is not performed during a kexec boot. This has the effect of reducing the time required for a reboot.
Make sure you have selected CONFIG_KEXEC=y when configuring the kernel.
The CONFIG_KEXEC option enables the kexec system call.
USAGE
Using kexec consists of
- loading the kernel to be rebooted to into memory, and
- actually rebooting to the pre-loaded kernel.
To load a kernel, the syntax is as follows:
kexec -l kernel-image --append=command-line-options -initrd=initrd-image
where kernel-image is the kernel file that you intend to reboot to.
Note: Compressed kernel images such as bzImage are not supported by kexec. Use the uncompressed vmlinux.
....
Setup and installation (unfortunately due to time restraints, my testing was performed on a laptop)
SLES 10 kexec/kdump install/setup and crash dump analysis
sles10~# cat /etc/SuSE-release
SUSE Linux Enterprise Server 10 (i586)
VERSION = 10
sles10:~ # uname -a
Linux sles10 2.6.16.21-0.8-default #1 Mon Jul 3 18:25:39 UTC 2006 i686 i686 i386 GNU/Linux
|
Be sure that you have installed the kexec-tools, kernel-kdump and crash rpm. It is not necessary to have a kdump kernel revision that matches the running kernel. The kdump kernel can be any revision. In my case below I had kernel-kdump-2.6.16.21-0.8 and found that newer running kernels worked with kernel-kdump-2.6.16.21-0.8.
|
sles10:~ # rpm -qa | grep kexec
kexec-tools-1.101-32.14
sles10:~ # rpm -qa | grep kernel
kernel-default-2.6.16.21-0.8
kernel-xen-2.6.16.21-0.8
kernel-syms-2.6.16.21-0.8
kernel-debug-2.6.16.21-0.8
kernel-source-2.6.16.21-0.8
kernel-kdump-2.6.16.21-0.8
sles10:~ # rpm -qa | grep crash
crash-4.0-25.4
sles10:~ # chkconfig kdump on
sles10:~ # chkconfig --list | grep kdump
kdump 0:off 1:on 2:on 3:on 4:off 5:on 6:off
|
To enable a crash dump, you need to add an option to the boot loader to specify the size and offset of the recovery kernel memory area. An example of this boot loader option is "crashkernel=64M@16M". The 64M shows the reserved space for the Kdump recovery kernel, and the 16M is the address of the reserved area. You can add this option either with the YaST boot loader module, or by manually editing the boot loader configuration file.
The recommended values by architecture for the "crashkernel" option are:
i386: crashkernel=64M@16M
x86_64: crashkernel=64M@16M
|
sles10:~ # cat /proc/cmdline
root=/dev/hda1 vga=0x314 resume=/dev/hda2 splash=silent showopts crashkernel=64M@16M
sles10:~ # cat /proc/iomem
00000000-0009fbff : System RAM
0009fc00-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000cffff : Video ROM
000d0000-000d17ff : Adapter ROM
000f0000-000fffff : System ROM
00100000-2ffcffff : System RAM
00100000-0027478d : Kernel code
0027478e-0030cc47 : Kernel data
01000000-04ffffff : Crash kernel <<< Crash-dump-capture kernel location
2ffd0000-2fff0bff : reserved
....
|
/etc/init.d/kdump script:
The kdump init script provides the support necessary for loading a kdump capture-kernel into memory at system bootup time, and for copying away a vmcore (/proc/vmcore to /var/log/dump//vmcore) at system panic time. /etc/init.d/kdump script extract:
|
coredir="${KDUMP_SAVEDIR}/`date +"%Y-%m-%d-%H:%M"`"
mkdir -p $coredir
echo -n "Saving crash dump to $coredir"
cp --sparse=always /proc/vmcore $coredir/vmcore
|
Crashdump can be triggered by sysrq-c. Core files end up in /var/log/dump/
Example: /var/log/dump/2006-10-30-19:01/vmcore
sles10~# sync;sync;sync
sles10~# echo c > /proc/sysrq-trigger
|
When a crash happens in a graphical environment, you will likely have no GUI in the second kernel boot. If you used a VGA console, you might still have visual output from the secondary kernel. The default behavior of the Kdump script is to save the old vmcore image, and then reboot the system immediately.
The crash-dump-capture kernel soft-boots, saves /proc/vmcore and reboots as follows:
...
Saving crash dump to /var/log/dump/2006-10-30-19:01 done
INIT: Switching to runlevel: 6
INIT: Sending processes the TERM signal
...
|
..after system reboot...
|
sles10~# pwd
/var/log/dump/2006-10-30-19:01
sles10:/var/log/dump/2006-10-30-19:01 # ll -h
total 694M -r-------- 1 root root 704M Oct 30 19:02 vmcore
sles10:/var/log/dump/2006-10-30-19:01 # crash /boot/System.map-2.6.16.21-0.8-default \
> /boot/vmlinux-2.6.16.21-0.8-debug vmcore
...
crash: /boot/vmlinux-2.6.16.21-0.8-debug: no debugging data available
...same for:
crash: /boot/vmlinux-2.6.16.21-0.8-default: no debugging data available
|
Since I did not have a SLES10 debuginfo kernel, I compiled one as follows:
|
sles10:/usr/src # ll linux
lrwxrwxrwx 1 root root 19 2006-08-15 10:20 linux -> linux-2.6.16.21-0.8
sles10:/var/log/dump/2006-10-30-19:01 # cd /usr/src/linux
sles10:/usr/src/linux #
sles10:/usr/src/linux # make mrproper
sles10:/usr/src/linux # cp /boot/config-2.6.16.21-0.8-default .config
sles10:/usr/src/linux # cp Makefile Makefile_orig
...edit the Makefile, add "-g" in the CFLAGS line...
sles10:/usr/src/linux # diff Makefile Makefile_orig
308c308
< CFLAGS := -g -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \
---
> CFLAGS := -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \
sles10:/usr/src/linux # make oldconfig
sles10:/usr/src/linux # make vmlinux <<< Build the bare kernel, i.e. /usr/src/linux/vmlinux
|
If you do locate the appropriate debuginfo kernel rpm:
kernel-default-debuginfo-2.6.16.21-0.8.i586.rpm, install this debuginfo rpm and then use /boot/vmlinux-2.6.16.21-0.8-default in your crash command below.
sles10:/var/log/dump/2006-10-30-19:01 # crash -s /usr/src/linux/vmlinux vmcore
... or ...
|
sles10:/var/log/dump/2006-10-30-19:01 # crash /boot/System.map-2.6.16.21-0.8-default \
> /usr/src/linux/vmlinux vmcore
crash 4.0-25.4
Copyright (C) 2002, 2003, 2004, 2005, 2006 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
...
...
GNU gdb 6.1
...
...
This GDB was configured as "i686-pc-linux-gnu"...
SYSTEM MAP: /boot/System.map-2.6.16.21-0.8-default
DEBUG KERNEL: /usr/src/linux/vmlinux (2.6.16.21-0.8-default)
DUMPFILE: vmcore
CPUS: 1
DATE: Mon Oct 30 19:01:22 2006
UPTIME: 02:36:33
LOAD AVERAGE: 0.04, 0.03, 0.00
TASKS: 109
NODENAME: sles10
RELEASE: 2.6.16.21-0.8-default
VERSION: #1 Mon Jul 3 18:25:39 UTC 2006
MACHINE: i686 (1196 Mhz)
MEMORY: 767.8 MB
PANIC: "SysRq : Trigger a crashdump"
PID: 7104
COMMAND: "bash"
TASK: efa66ab0 [THREAD_INFO: e80e8000]
CPU: 0
STATE: TASK_RUNNING (SYSRQ)
crash> bt
PID: 7104 TASK: efa66ab0 CPU: 0 COMMAND: "bash"
#0 [e80e9f10] crash_kexec at c012f230
#1 [e80e9f54] __handle_sysrq at c01f4fd1
#2 [e80e9f78] write_sysrq_trigger at c0175bee
#3 [e80e9f84] vfs_write at c014b202
#4 [e80e9f9c] sys_write at c014b71b
#5 [e80e9fb8] sysenter_entry at c0102994
EAX: 00000004 EBX: 00000001 ECX: b7ced000 EDX: 00000002
DS: 007b ESI: 00000002 ES: 007b EDI: b7ced000
SS: 007b ESP: bfaaa490 EBP: bfaaa4bc
CS: 0073 EIP: ffffe410 ERR: 00000004 EFLAGS: 00000246
crash> ?
* files mod runq union
alias foreach mount search vm
ascii fuser net set vtop
bt gdb p sig waitq
btop help ps struct whatis
dev irq pte swap wr
dis kmem ptob sym q
eval list ptov sys
exit log rd task
extend mach repeat timer
crash version: 4.0-25.4 gdb version: 6.1
For help on any command above, enter "help ".
For help on input options, enter "help input".
For help on output options, enter "help output".
|
Fedora Core 6 (RHEL5) kexec/kdump install/setup and crash dump analysis
|
[root@localhost ~]# cat /etc/redhat-release
Fedora Core release 6 (Zod)
[root@localhost ~]# uname -a
Linux localhost.localdomain 2.6.18-1.2798.fc6 #1 SMP Mon Oct 16 14:37:32 EDT 2006 i686 i686 i386 GNU/Linux
[root@localhost ~]# cat /proc/cmdline
ro root=LABEL=/ crashkernel=64M@16M
[root@localhost ~]# rpm -qa | grep kernel
kernel-devel-2.6.18-1.2798.fc6
kernel-2.6.18-1.2798.fc6
kernel-headers-2.6.18-1.2798.fc6
kernel-kdump-2.6.18-1.2798.fc6
kernel-debuginfo-2.6.18-1.2798.fc6
kernel-debuginfo-common-2.6.18-1.2798.fc6
[root@localhost ~]# rpm -qa | grep kexec
kexec-tools-1.101-51.fc6
[root@localhost ~]# rpm -qa | grep crash
crash-4.0-3.3
[root@localhost sysconfig]# chkconfig --list | grep kdump
kdump 0:off 1:off 2:on 3:on 4:on 5:on 6:off
[root@localhost sysconfig]# service kdump status
Kdump is operational
[root@localhost ~]# cat /proc/iomem
00000000-0009fbff : System RAM
0009fc00-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000cffff : Video ROM
000d0000-000d17ff : Adapter ROM
000f0000-000fffff : System ROM
00100000-2ffcffff : System RAM
00400000-006168ba : Kernel code
006168bb-006ef8cb : Kernel data
01000000-04ffffff : Crash kernel <<< Crash-dump-capture kernel location
2ffd0000-2fff0bff : reserved
...
[root@localhost ~]# cat /sys/kernel/kexec_crash_loaded
1 <<< Shows kexec crash-dump-capture kernel is loaded
[root@localhost ~]# cat /proc/sys/kernel/sysrq
1
|
/etc/init.d/kdump script:
The kdump init script provides the support necessary for loading a kdump capture-kernel into memory at system bootup time, and for copying away a vmcore (/proc/vmcore to /var/crash//vmcore) at system panic time. /etc/init.d/kdump script extract:
|
...
coredir="/var/crash/`date +"%Y-%m-%d-%H:%M"`"
mkdir -p $coredir
cp /proc/vmcore $coredir/vmcore
...
|
[root@localhost 2006-10-31-19:14]# echo "c" > /proc/sysrq-trigger
This causes the kernel to panic, followed by the soft-boot system restarting into the kdump capture-kernel. When the boot process gets to the point where it starts the kdump service, the vmcore should be automatically copied, from /proc/vmcore, out to disk (by default, to /var/crash//vmcore). The system then reboots back into the normal kernel.
...system reboots...
|
[root@localhost 2006-10-31-19:14]# pwd
/var/crash/2006-10-31-19:14
[root@localhost 2006-10-31-19:14]# ll -h
total 678M -r-------- 1 root root 704M Oct 31 19:14 vmcore
|
In the normal kernel, the previously installed crash kernel can be used in conjunction with the previously installed kernel-debuginfo to perform postmortem analysis.
|
[root@localhost 2006-10-31-19:14]# crash
-s /usr/lib/debug/lib/modules/2.6.18-1.2798.fc6/vmlinux > vmcore
|
... or ...
|
[root@localhost 2006-10-31-19:14]# crash /boot/System.map-2.6.18-1.2798.fc6 \
> /usr/lib/debug/lib/modules/2.6.18-1.2798.fc6/vmlinux vmcore
crash 4.0-3.3
...
Copyright (C) 1999-2006 Hewlett-Packard Co
...
GNU gdb 6.1
...
This GDB was configured as "i686-pc-linux-gnu"...
SYSTEM MAP: /boot/System.map-2.6.18-1.2798.fc6
DEBUG KERNEL: /usr/lib/debug/lib/modules/2.6.18-1.2798.fc6/vmlinux (2.6.18-1.2798.fc6)
DUMPFILE: vmcore
CPUS: 1
DATE: Tue Oct 31 19:13:23 2006
UPTIME: 00:11:34
LOAD AVERAGE: 0.05, 0.30, 0.32
TASKS: 121
NODENAME: localhost.localdomain
RELEASE: 2.6.18-1.2798.fc6
VERSION: #1 SMP Mon Oct 16 14:37:32 EDT 2006
MACHINE: i686 (1196 Mhz)
MEMORY: 767.8 MB
PANIC: "SysRq : Trigger a crashdump"
PID: 2779
COMMAND: "bash"
TASK: efddf100 [THREAD_INFO: cf9e4000]
CPU: 0
STATE: TASK_RUNNING (SYSRQ)
crash> bt
PID: 2779 TASK: efddf100 CPU: 0 COMMAND: "bash"
#0 [cf9e4f00] crash_kexec at c0445f97
#1 [cf9e4f44] sysrq_handle_crashdump at c053ea04
#2 [cf9e4f4c] __handle_sysrq at c053e93a
#3 [cf9e4f74] write_sysrq_trigger at c04a1f5c
#4 [cf9e4f80] vfs_write at c046f803
#5 [cf9e4f9c] sys_write at c046fe2d
#6 [cf9e4fb8] system_call at c040400c
EAX: 00000004 EBX: 00000001 ECX: b7fe5000 EDX: 00000002
DS: 007b ESI: 00000002 ES: 007b EDI: b7fe5000
SS: 007b ESP: bfff88ec EBP: bfff890c
CS: 0073 EIP: 002cc402 ERR: 00000004 EFLAGS: 00000246
crash> ?
* files mod runq union
alias foreach mount search vm
ascii fuser net set vtop
bt gdb p sig waitq
btop help ps struct whatis
dev irq pte swap wr
dis kmem ptob sym q
eval list ptov sys
exit log rd task
extend mach repeat timer
crash version: 4.0-3.3 gdb version: 6.1
For help on any command above, enter "help ".
For help on input options, enter "help input".
For help on output options, enter "help output".
|
Debian GNU/Linux 'etch' release
The code name for the next major Debian GNU/Linux release after "sarge" is "etch". This release started as a copy of "sarge", and is currently in a state called "testing".
The following testing is done for kexec/kdump on "etch". kexec/kdump automation support is not yet integrated in "etch" as shown above in SLES10 and FC6/RHEL5, therefore I have manual steps listed:
|
debian:~# cat /etc/issue
Debian GNU/Linux testing/unstable \n \l
debian:~# uname -a
Linux debian 2.6.17-2-686 #1 SMP Wed Sep 13 16:34:10 UTC 2006 i686 GNU/Linux
debian:~# cat /proc/version
Linux version 2.6.17-2-686 (Debian 2.6.17-9) (waldi@debian.org) (gcc version 4.1.2 20060901 (prerelease) (Debian 4.1.1-13)) #1 SMP Wed Sep 13 16:34:10 UTC 2006
debian:~# cat /proc/cmdline
root=/dev/hdb1 ro crashkernel=64M@16M
debian:~# cat /proc/iomem | grep Crash
01000000-04ffffff : Crash kernel
debian:~# dpkg -l | grep linux-image
ii linux-image-2.6-686 2.6.17+2 Linux kernel 2.6 image on PPro/Celeron/PII/P
ii linux-image-2.6.16-2-686 2.6.16-18 Linux kernel 2.6.16 image on PPro/Celeron/PI
ii linux-image-2.6.17-2-686 2.6.17-9 Linux 2.6.17 image on PPro/Celeron/PII/PIII/
debian:~# dpkg -l | grep linux-source
ii linux-source-2.6.17 2.6.17-9 Linux kernel source for version 2.6.17 with
debian:~# dpkg -l | grep kexec
ii kexec-tools 1.101-kdump10-2 kexec tool
debian:~# dpkg -l | grep crash
ii crash 4.0-3.4-1 kernel debugging utility, allowing gdb like
debian:/usr/src# ll -h
total 39M
lrwxrwxrwx 1 root src 20 2006-11-02 18:22 linux -> linux-source-2.6.17/
drwxr-xr-x 17 root root 4.0K 2006-11-02 14:14 linux-headers-2.6.17-2/
drwxr-xr-x 4 root root 4.0K 2006-11-02 14:14 linux-headers-2.6.17-2-686/
drwxr-xr-x 3 root root 4.0K 2006-11-02 14:14 linux-kbuild-2.6.17/
drwxr-xr-x 19 root root 4.0K 2006-11-04 09:45 linux-source-2.6.17/
-rw-r--r-- 1 root root 39M 2006-09-13 11:52 linux-source-2.6.17.tar.bz2
|
SLES10 and FC6/RHEL5 provides a crash-dump-kernel (kernel-kdump-???), whereas it does not yet exist for Debian "etch", therefore I had to manually compile one for use with kexec/kdump as follows:
|
debian:/usr/src# cd linux
debian:/usr/src/linux#
debian:/usr/src/linux# make mrproper
debian:/usr/src/linux# cp /boot/config-2.6.17-2-686 .config
|
...make the following changes to .config via make menuconfig...
|
debian:/usr/src/linux# diff .config /boot/config-2.6.17-2-686
4c4
< # Sat Nov 4 09:24:00 2006
---
> # Wed Sep 13 15:43:05 2006
201,202c201,202
< CONFIG_CRASH_DUMP=y
< CONFIG_PHYSICAL_START=0x1000000
---
> # CONFIG_CRASH_DUMP is not set
> CONFIG_PHYSICAL_START=0x100000
3158d3157
< CONFIG_PROC_VMCORE=y
|
debian:/usr/src/linux# cp Makefile Makefile_orig
...make the following changes to Makefile...
|
debian:/usr/src/linux# diff Makefile Makefile_orig
310c310
< CFLAGS := -g -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \
---
> CFLAGS := -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \
debian:/usr/src/linux# make kernelversion
2.6.17
debian:/usr/src/linux# make kernelrelease
2.6.17
debian:/usr/src/linux# make oldconfig
debian:/usr/src/linux# make vmlinux
debian:/usr/src/linux# make modules
debian:/usr/src/linux# make modules_install
debian:/usr/src/linux# mkinitrd -o /boot/initrd.img-2.6.17 2.6.17
debian:/usr/src/linux# cp vmlinux /boot/vmlinux-2.6.17-kdump
|
The kernel /boot/vmlinux-2.6.17-kdump can now be used with kexec as a crash-dump-capture kernel.
I created a script to kexec load the kdump kernel...
|
debian:~# cat kexec.sh
kexec -p /boot/vmlinux-2.6.17-kdump --args-linux \
--append="root=/dev/hdb1 init 1 irqpoll maxcpus=1" \
--initrd=/boot/initrd.img-2.6.17
debian:~# sh kexec.sh
debian:~#
|
If no error messages, then kdump kernel kexec loaded.
We are now ready to perform test crash dump.
|
debian:~# cat /proc/sys/kernel/sysrq
1
debian:~# mkdir /var/log/dump
debian:~# sync;sync;sync
debian:~# echo c > /proc/sysrq-trigger
|
...system soft-boots into kdump kernel into runlevel 1 where you enter
a password for maintenance...
...on the physical console...
debian:~# cp /proc/vmcore /var/log/dump/vmcore
...may take a while to copy since its dependent on size of RAM...
...once cp is complete you can 'ls -lh /var/log/dump' to confirm.
...reboot the system to load original kernel...
debian:~# exit
...exit out kdump kernel and then reboot system into original kernel...
...system reboots...
|
debian:~# cd /var/log/dump/
debian:/var/log/dump# ll -h
total 320M
-r-------- 1 root root 320M 2006-11-04 11:28 vmcore
debian:/var/log/dump# crash /boot/System.map-2.6.17-2-686 /boot/vmlinux-2.6.17-kdump vmcore
crash 4.0-3.4
...
GNU gdb 6.1
...
This GDB was configured as "i686-pc-linux-gnu"...
SYSTEM MAP: /boot/System.map-2.6.17-2-686
DEBUG KERNEL: /boot/vmlinux-2.6.17-kdump (2.6.17)
DUMPFILE: vmcore
CPUS: 1
DATE: Sat Nov 4 11:24:29 2006
UPTIME: 00:40:17
LOAD AVERAGE: 0.01, 0.01, 0.01
TASKS: 120
NODENAME: debian
RELEASE: 2.6.17-2-686
VERSION: #1 SMP Wed Sep 13 16:34:10 UTC 2006
MACHINE: i686 (800 Mhz)
MEMORY: 384 MB
PANIC: "<6>SysRq : Trigger a crashdump"
PID: 4617
COMMAND: "bash"
TASK: b8527030 [THREAD_INFO: b9358000]
CPU: 0
STATE: TASK_RUNNING (SYSRQ)
crash> bt
PID: 4617 TASK: b8527030 CPU: 0 COMMAND: "bash"
#0 [b9359ec4] crash_kexec at b01374f0
#1 [b9359f04] crash_kexec at b01374f9
#2 [b9359f50] __handle_sysrq at b01ff183
#3 [b9359f78] write_sysrq_trigger at b0180bd7
#4 [b9359f84] vfs_write at b0153377
#5 [b9359f9c] sys_write at b015395e
#6 [b9359fb8] system_call at b0102b48
EAX: 00000004 EBX: 00000001 ECX: 080f5c08 EDX: 00000002
DS: 007b ESI: 00000002 ES: 007b EDI: 080f5c08
SS: 007b ESP: aff597b0 EBP: aff597cc
CS: 0073 EIP: a7f22d2e ERR: 00000004 EFLAGS: 00000246
crash> ?
* files mod runq union
alias foreach mount search vm
ascii fuser net set vtop
bt gdb p sig waitq
btop help ps struct whatis
dev irq pte swap wr
dis kmem ptob sym q
eval list ptov sys
exit log rd task
extend mach repeat timer
crash version: 4.0-3.4 gdb version: 6.1
For help on any command above, enter "help ".
For help on input options, enter "help input".
|
Conclusion
All in all, it can be said that as far as kernel crash dumping is concerned Linux is heading in the right direction. Kdump is robust and most of the remaining issues are being dealt with.
References
[0] /usr/share/doc/kernel-doc-2.6.../Documentation/kdump/kdump.txt (Fedora Core 6)
[1] http://www.xmission.com/~ebiederm/files/kexec/README
[2] Kexec history: class="udrline"http://lwn.net/Articles/15468/
[3] http://lse.sourceforge.net/kdump/
[4] /usr/share/doc/packages/kexec-tools/README.SUSE (SLES10)
[5] http://www.novell.com/support/search.do?cmd=displayKC&docType=kc&externalId=
3374462&sliceId=SAL_Public (SLES10)
[6] http://ftp.suse.com/pub/people/tiwai/kdump-training/kdump-training.pdf (SLES10)
[7] Evaluating Linux Kernel Crash Dumping Mechanisms:
- http://lkdtt.sourceforge.net/docs/ols2006_lkdtt.pdf
- http://lkdtt.sourceforge.net/results/kdump/kdump_results.html
[8] Crash analysis utility: http://people.redhat.com/anderson/crash_whitepaper/
About the author
Dilip Daya is a Project Lead and Lead Technical Engineer in HP Services,
Worldwide Linux Level 3 support. Dilip enjoys working on Linux solutions
and is part of the Worldwide Linux Ambassadors team which is built
around key HP solutions, technologies and innovation that are worldwide
partnerships of field, region and division experts. Dilip is based in Atlanta, Georgia.
Was this article useful? Tell us what you think!
|
 |
|
 |
|