 |
» |
|
|
 |
HP TechBriefs |
 |
 |
| |
 |
Writing Extensions for Crash
by Alex Sidorenko |
|
|
This TechBrief provides information for adding scripting functionality to Crash.
Crash is an open source tool (developed mainly by Red Hat) used for working with kernel dumps; it can be used on live kernel too. There are two sets of commands:
- general commands, such as printing variable values or structures
- kernel-specific, such as displaying a list of processes or sockets used by a process
The first set is rather stable and usually does not need changing crash sources when a new kernel is released. The second set embeds knowledge about Linux kernel internals. As Linux kernel adds new features, these structures often change and 'crash' sources should be changed either. To maintain backwards compatibility, we need to do multiple checks in code to use different algorithms and structures to get info properly from different kernels.
Why do we need extensions?
Let us compare briefly the tools available for dump-analysis on a mature Unix (HPUX) and Linux. On Linux we have jusr crash/lcrash with very basic GDB-scripting. There is no clean powerful API to write extensions easily.
Based on our practical experience with some HPUX tools, there are several reasons why we need extensions and scripting for dump-analysis:
- We do not know where exactly is the problem and would like to do a massive sanity check of internal structures. For example we have 20,000 TCP connections and would like to report anything unusual. In this case even if basic commands are available in 'crash', we need to process the output programatically and based on it issue some other 'crash' commands
- There is no needed command in 'crash' tool yet and we need to extract this information from the dump. For example, there are no commands to print the routing tables or NETFILTER tables. Doing this manually would be very time-consuming as routing tables are implemented as 3-level hashtables (FIB).
- We need to write a special test for our problem and do not want to repeat the same sequence of commands manually for several dumps.
How scripting can be added
A well-known approach for adding scripting functionality to already existing tools is to drive them externally, sending commands as text, intercepting tool output as text, parsing it and extracting the needed data. For example, there are several GUI tools built on top of GDB using exactly this approach. There is already a project using this approach to control crash and process results programmatically, using Perl as programming language: http://alicia.sourceforge.net.
This works fine in simple cases but performance is not very good if we need to process significant amount of data. For example, Linux hash-tables used to store info about TCP connections have about 100,000 buckets each, parsing output of crash in this case is not a good approach. As a result, I decided to write a new framework from scratch using Python as a programming language. It is easy to embed Python in any other C-application - I was able to make it a a dynamically-loadable extension using crash API in 15 minutes.
Mapping C-structures to Python objects
The Linux kernel is written in C (plus a bit of assembly). To be able to write useful scripts easily, we need two things:
- to be able to read variables and struct/union contents
- to be able to write Python code easily looking at related C-sources
For example, if we want to print routing tables from dump, we start from looking at how they are accessed in /proc routines. It would be nice to be able to copy and paste pieces of related C-sources to our script but this would be extremely tricky, even if used embedded C-intepreter instead of Python.
Python has most of the operators that C does but there is no direct match for dereference operator '*' and related '->' as Python passes everything by reference. It is easy to mimic reading C struct/union in Python:
|
struct blk_major_name {
struct blk_major_name *next;
int major;
char name[16];
} svar;
s = readSU('struct blk_major_name', addr)
major = s.major
print "%3d %-11s" % (major, s.name)
|
Here we read 'struct blk_major_name' from a given address and printed the 'major' field. Here we used dot '.' which is Python's operator to access attributes. Python has many built-in data types, including integers, floating-point numbers and strings. It makes sense to implement attribute access for objects representing structs/unions so that they return the needed type automatically, without a necessity to specify type explicitly. There in no 'pointer' type in Python by integers are good enough to represent pointers. So it seems reasonable return in the example above
- s.next as an integer
- s.major as an integer
- s.name as a string
There are some problems with this approach. What if 'char name[16]' is not intended to be used as a string but rather an array of 1-byte integers? To work around this, we introduce a special 'SmartString' type which mimics null-terminated strings but lets you access info just like a normal array. So if
name="abc\0\5\6\7\8\9\10\11\12\13\14\15\16",
print s.name # will print abc
print s.name[5] # will print 5
By default, struct/union members that are defined as char pointers or char arrays are returned as 'SmartString' type, if they have explicit 'signed' or 'unsigned' specifiers, they are returned as integer arrays.
Dereferencing pointers in Structs and Unions (Emulating * and - > Operators)
What if we want to follow the 'next' pointer in the example above? In C we can write
svar->next;
svar->next->next;
and it will take into account pointer type automatically. In Python we can do it in two ways:
- load the needed structure from the known address manually
next = readSU("'struct blk_major_name", s.next)
- use a special 'Deref' attribute which emulates dereference
next = s.Deref.next
So, to emulate the missing features, we use special attributes (in Python you can bind an arbitrary action for attribute access). This should create problems if we had a struct field with the 'Deref' name. Luckily, this is highly improbable for kernel structures. The "Linux Coding Style" document, http://www.llnl.gov/linux/slurm/coding_style.pdf, says: "mixed-case names are frowned upon" so using 'Deref' should be safe enough. The "internal" methods of Python classes are all named like __aname__ and once again I have never seen Linux kernel structures with a pattern like that.
Emulating & Operator
In some cases we need to find the address of struct/union member instead of accessing its value. For example, we have a field which is defined as a struct (not a pointer), e.g.
|
type = struct task_struct {
volatile long int state;
...
struct list_head tasks;
}
|
When we access 'tasks' attribute, we'll obtain an object representing a structure. For such objects we can use Addr(obj) function to obtain the associated address, e.g.
init_task = readSymbol('init_task')
init_task_saddr = Addr(init_task.tasks)
If we need to obtain address of a field of another type (e.g. int) we have to compute it manually using low-level functions, e.g.
dev_base = readSymbol("dev_base")
offset = member_offset("struct net_device", "next")
addr_next = Addr(dev_base) + offset
In the future we might add a special attribute similar to 'Deref', e.g. if we name it Addr we shall be able to do
addr_next = dev_base.Addr.next
Implementation
The framework consists of a Python extension module - a dynamically-loaded shared library, and pure Python code. The extension module is written in C and linked with Python library. It is loaded from Crash using extend command. This module implements Python interface to low-level functions such as dump memory access and reading
High-Level API
This is a basic tutorial. If you have never used Python before, please read some of the excellent tutorials available online like this one at http://docs.python.org/tut/tut.html. At an absolute minimum you need to know enough Python to understand all our examples: code blocks in Python are marked by indentation only - no {} brackets! You print using print expr1, expr2, ..., exprn. Instead of printf("%s %d", s1, i1) you use print "%s %d" % (s1, i1). if-statements are similar to C but you cannot use assignment in it, e.g.
if (s1 = func())
will not work.
for-statements loop over a sequence (or an iterable). Examples:
|
for s in ("s1", "s2", "s3"):
print s
for i in range(1, 10):
print i
|
Here range(1,10) expands to sequence of integers starting at 1 and ending at 9, similar to C for (i=1; i < 10; i++) functions and methods can have a variable number of arguments. Some of them can be defined as keyword arguments, e.g. func(1, 2, mykeyarg=3) Uniform Python API is used both for PTY-driven and loadable extension-based versions. A very basic example showing how to use API:
|
#!/usr/bin/env python
# This imports all the functions you can use from Crashlib
from Crashlib.API import *
# Check whether symbol exists
if (symbol_exists('all_bdevs')):
print "all_bdevs exists"
# Read the contents of 'tcp_hashinfo' table. The result type is defined automatically according to symbol definition
tcp_hashinfo = readSymbol('tcp_hashinfo')
# Print the size of ESTABLISHED hash-table
print tcp_hashinfo.__tcp_ehash_size
|
Running this test assuming that dump and vmcore are in /Dumps/Linux/test directory:
|
{alexs 17:07:48} test.py /Dumps/Linux/test
*** no embedded Python
all_bdevs exists
131072
|
The 1st line - *** no embedded Python - means that there is no low-level module (implemented as a crash extension) Running the same test from inside crash:
|
{alexs 12:05:40} crash /Dumps/Linux/test/vmlinux-2.6.9-22.ELsmp /Dumps/Linux/test/vmcore-netdump-2.6.9-22.ELsmp
...
crash> extend Extension/python32.so
./Extension/python32.so: shared object loaded
crash> epython test.py
*** Initializing Embedded Python ***
all_bdevs exists
131072
-- 0.13s --
crash> epython test.py
all_bdevs exists
131072
-- 0.00s --
|
Please note that when we ran the script the 2nd time, it took less than 0.01s to execute. The reason is that the most expensive operations are those for parsing symbolic information. They are done only once and then results are cached in memory. As we don't unload Python interpreter after the command is executed, all cached info is available till we exit crash. All functions you need are defined in API module. It calls low-level methods as needed, e.g. if you have embedded Python extension available, it will use it - if not, it will driver crash externally via PTY, sending commands and parsing output as needed.
Implementation Details And Performance
Most low-level access functions are written in C: read a range of memory, convert symbol to address, and so on. In addition, it is possible to issue any GDB or CRASH command and obtain output as a string. The non-trivial part is accessing information about struct/union. At this moment it is written in Python - we issue 'gdb ptype sname' command and then parse its output. The parser is implemented using PyParsing framework and is rather slow. But it does not make sense just to rewrite the parser in C - gdb ptype sname' command is very slow by itself. So the future reimplementation will use lower-level access to symbolic information about struct/union.
As struct/union definitions do not change while we are running the tool, the results of parsing are cached so the next time we need info about the same struct, the cache contents is used. The performance is quite acceptable - it takes about 1.5s to print all IP connections starting from hash-tables (each hashtable has 131072 buckets, most of them empty). Running the same command the second time takes only 0.09s as we do not to retrieve symbolic data again.
Some Practical Results
Using the framework, we have written a number of scripts both to test the framework (and identify what improvements are needed) and to use in work on real cases. At this moment we have mainly networking scripts: printing routing tables from dump (on the 2.6 kernel this means traversing 4 levels of FIB structures), printing status info for LAN-card drivers, printing details of IP connections based on hash-tables and so on. An example output of 'netdev.py' script:
|
-------------------------------------------
0xc03bd080 lo 127.0.0.1 6 (__LINK_STATE_START|__LINK_STATE_PRESENT)
open=, mtu=16436 promisc=0
last_rx 20.80 s ago
Qdisc addr=0xc03ee560 size=96 qlen=0
enqueue= dequeue=
--------------------------------------------------
0xeacde000 eth0 131.225.26.9 6 (__LINK_STATE_START|__LINK_STATE_PRESENT)
open=, mtu=1500 promisc=0
last_rx 20.83 s ago
trans_start 20.80 s ago
Qdisc addr=0xc83a6180 size=96 qlen=0
enqueue= dequeue=
== Bands ==
sk_buff_head=0xc83a61e0 len=0
sk_buff_head=0xc83a61f0 len=0
sk_buff_head=0xc83a6200 len=0
--------------------------------------------------
0xeacde800 eth1 192.168.72.9 6 (__LINK_STATE_START|__LINK_STATE_PRESENT)
open=, mtu=1500 promisc=0
last_rx 46.88 s ago
trans_start 20.80 s ago
Qdisc addr=0xe9a0ce80 size=96 qlen=0
enqueue= dequeue=
== Bands ==
sk_buff_head=0xe9a0cee0 len=0
sk_buff_head=0xe9a0cef0 len=0
sk_buff_head=0xe9a0cf00 len=0
===== Per-CPU Data =====
--CPU 0
input_pkt_queue=0xc045980c qlen=0
netif_rx_stats total=21556, dropped=0
--CPU 1
input_pkt_queue=0xc0459a0c qlen=0
netif_rx_stats total=19922, dropped=0
--CPU 2
input_pkt_queue=0xc0459c0c qlen=0
netif_rx_stats total=15908, dropped=0
--CPU 3
input_pkt_queue=0xc0459e0c qlen=0
netif_rx_stats total=16400, dropped=0
-- 0.95s --
|
Status of the project
The project has been released as open source software by HP. The sourceforge project site is http://sourceforge.net/projects/pykdump. We believe the Python/Crash API provides a useful framework for extending and scripting crash. There are definitely various improvements that can be made. For example, performance is already pretty acceptable for many practical applications, but could benefit from some additional work.
If you need the complete documentation, or better yet, would like to contribute to the project email the author or visit the project website for more details.
Alex Sidorenko is a WTEC Engineer in Global Solution Engineering (GSE)
division of TSG. Before joining HP in 1997 he worked as a research scientist
in Computer Physics in Russia. He specializes in Unix networking (both
protocols and their implementations on HPUX and Linux) but likes to work in
other areas too, especially to develop software in his favorite programming
language - Python. Alex is based in Montreal, Canada.
References
1. Crash tool home page http://people.redhat.com/anderson
2. Python programming language http://www.python.org
3. The sourceforge site http://sourceforge.net/projects/pykdump/
Was this article useful? Tell us what you think!
|
 |
|
 |
|