Installing and Using Ftrace
From OMAPpedia
The following Wiki explains how to use Ftrace and its different tracers on Linux releases. It describes which tracers work with kernel-omap3 mainline or linux-omap-2.6 mainline, and how they work.
Note: Every functionality was tested on v2.6.33 and v2.6.32 kernel.
Note: Some options are enable on Android (Function Tracer, Irq Tracer...) but the patches of this article won't work.
Note: Added section 13. Ftrace on p-android-omap-3.0 kernel tree base with Android File system [ICS].Tested nop, function,irqsoff tracers on v3.0.21 kernel
[edit] Kernel configuration & Re-build
The kernel should be reconfigured and rebuilt to enable the support of FTrace
- Clone the choosen kernel and point to the appropriate commit id.
- Run the following command from the kernel folder
$ make CROSS_COMPILE=arm-none-linux-gnueabi- ARCH=arm distclean $ make CROSS_COMPILE=arm-none-linux-gnueabi- ARCH=arm omap_zoom2_defconfig $ make CROSS_COMPILE=arm-none-linux-gnueabi- ARCH=arm xconfig
- Set the following in xconfig
Kernel Hacking -> Tracers -> FUNCTION_TRACER Kernel Hacking -> Tracers -> FUNCTION_GRAPH_TRACER (if possible) Kernel Hacking -> Tracers -> STACK_TRACER Kernel Hacking -> Tracers -> DYNAMIC_FTRACE (see enabling Dynamic Ftrace)
This kernel configuration focus on using Ftrace as a debugging tool. Some configurations for Ftrace are used for other purposes, like finding latency or analyzing the system, they will be explain later.
- Compile the kernel
$ make CROSS_COMPILE=arm-none-linux-gnueabi- ARCH=arm uImage
(For further information about the compilation of the kernel see Linux OMAP Kernel Project)
- Boot the system with this new kernel
[edit] Using Ftrace
Ftrace has its control files in the debugfs system.This is usually mounted in /sys/kernel/debug. If it is not already mounted, then you can mount it yourself with:
# mount -t debugfs nodev /sys/kernel/debug
Many Developpers prefer to have this directory at the root of the system
# mkdir /mnt/debug # mount -t debugfs nodev /mnt/debugfs
That creates a /debug/tracing subdirectory which is used to control ftrace and for getting output from the tool.
Note: all the command of this article will be pass in the /debug/tracing/ directory
[edit] Choosing a tracer
To find out which tracers are available, simply cat the available_tracers file in the tracing directory:
# cat available_tracers function sched_switch nop
To enable, for example, the function tracer, just echo "function" into the current_tracer file.
# echo function > current_tracer # cat current_tracer function
Note: Ftrace can handle only one tracer at the same time
[edit] Using a tracer
At this point Ftrace is ready to trace the function, to start it just echo 1 in the tracing_on file
# echo 1 > tracing_on
To stop the trace, just echo 0 in the same file:
# echo 0 > tracing_on
The trace is contained in the trace file, here is an example of the output from a function trace. The header helps to decode the various fields in the trace.
# cat trace | head -10 # tracer: function # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | avahi-daemon-630 [000] 9507.400848: vfs_read <-sys_read avahi-daemon-630 [000] 9507.400848: rw_verify_area <-vfs_read avahi-daemon-630 [000] 9507.400848: do_sync_read <-vfs_read avahi-daemon-630 [000] 9507.400848: sock_aio_read <-do_sync_read avahi-daemon-630 [000] 9507.400848: alloc_sock_iocb <-sock_aio_read avahi-daemon-630 [000] 9507.400848: unix_stream_recvmsg <-sock_aio_read
[edit] Tracing a specific process
Perhaps you only need to trace a specific process, or set of processes. The file set_ftrace_pid lets you specify specific processes that you want to trace. To just trace the current thread you can do the following:
[tracing]# echo $$ > set_ftrace_pid
The above will set the function tracer to only trace the bash shell that executed the echo command. If you want to trace a specific process, you can create a shell script wrapper program.
[tracing]# cat ~/bin/ftrace-me #!/bin/sh DEBUGFS=`grep debugfs /proc/mounts | awk '{ print $2; }'` echo $$ > $DEBUGFS/tracing/set_ftrace_pid echo function > $DEBUGFS/tracing/current_tracer exec $* [tracing]# ~/bin/ftrace-me ls -ltr
Note, you must clear the set_ftrace_pid file if you want to go back to generic function tracing after performing the above.
[tracing]# echo -1 > set_ftrace_pid
[edit] The Tracers in detail
All the following tracers are plugins of Ftrace which bring different presentation of trace, and trace different things (Latency, scheduler events)
[edit] nop tracer
This is the tracer by default which trace no function. But it can still be usefull to trace events (see #Trace event)
[edit] Function tracer
This tracer is activated by enabling CONFIG_FUNCTION_TRACER in the kernel configuration. To work, it needs the kernel variable ftrace_enabled to be turned on, otherwise this tracer is a nop.
# sysctl kernel.ftrace_enabled=1 # echo function > current_tracer # cat current_tracer function
[tracing]# cat trace | head -10 # tracer: function # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | bash-16939 [000] 6075.461561: mutex_unlock <-tracing_set_tracer-0 [001] 6075.461561: _spin_unlock_irqrestore <-hrtimer_get_next_event -0 [001] 6075.461562: rcu_needs_cpu <-tick_nohz_stop_sched_tick bash-16939 [000] 6075.461563: inotify_inode_queue_event <-vfs_write -0 [001] 6075.461563: mwait_idle <-cpu_idle bash-16939 [000] 6075.461563: __fsnotify_parent <-vfs_write
The header explains the format of the output pretty well. The first two items are the traced task name and PID. The CPU that the trace was executed on is within the brackets. The timestamp is the time since boot in
This information is quite powerful and shows the flow of functions nicely. But it can be a bit hard to follow.
[edit] How the Function Tracer work ?
The function tracer (enabled by CONFIG_FUNCTION_TRACER) is a way to trace almost all functions in the kernel. When function tracing is enabled, the kernel is compiled with the gcc option -pg. This is a profiler that will make all functions call a special function named mcount. Unfortunately, this could cause a very large overhead. To improve it, enable the dynamic Ftrace (#Dynamic Ftrace), then the mcount calls, when not in use, are converted at run time to nops. This allows the function tracer to have zero overhead when not in use. (Finding Origins of Latencies Using Ftrace by Steven Rostedt)
[edit] Sched_switch Tracer
This tracer, also activated by enabling CONFIG_FUNCTION_TRACER, traces the context switches and wakeups between tasks.
# cat trace # tracer: sched_switch # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | bash-2948 [000] 749.962524: 2948:120:S + [000] 2948:120:S bash bash-2948 [000] 749.962616: 2948:120:S ==> [000] 8369:120:R <...> <...>-8369 [000] 749.964264: 8369:120:R + [000] 625:120:S dbus-daemon <...>-8369 [000] 749.964294: 8369:120:R ==> [000] 625:120:R dbus-daemon dbus-daemon-625 [000] 749.965454: 625:120:S ==> [000] 8369:120:R <...> <...>-8369 [000] 749.965576: 8369:120:R + [000] 625:120:S dbus-daemon
Wake ups are represented by a "+" and the context switches are shown as "==>". The format is:
- Context switches:
Previous task Next Task: : ==> : :
- Wake ups:
Current task Task waking up: : + : :
The prio is the internal kernel priority, which is the inverse of the priority that is usually displayed by user-space tools. Zero represents the highest priority (99). Prio 100 starts the "nice" priorities with 100 being equal to nice -20 and 139 being nice 19. The prio "140" is reserved for the idle task which is the lowest priority thread (pid 0).Below is a quick chart to map the kernel priority to user land priorities.
Kernel Space User Space =============================================================== 0(high) to 98(low) user RT priority 99(high) to 1(low) with SCHED_RR or SCHED_FIFO --------------------------------------------------------------- 99 sched_priority is not used in scheduling decisions(it must be specified as 0) --------------------------------------------------------------- 100(high) to 139(low) user nice -20(high) to 19(low) --------------------------------------------------------------- 140 idle task priority ---------------------------------------------------------------
The task states are:
- R - running : wants to run, may not actually be running
- S - sleep : process is waiting to be woken up (handles signals)
- D - disk sleep (uninterruptible sleep) : process must be woken up (ignores signals)
- T - stopped : process suspended
- t - traced : process is being traced (with something like gdb)
- Z - zombie : process waiting to be cleaned up
- X - unknown
[edit] Sched_switch viewer
To visualize the temporal relationship of Linux tasks, a graphical tools makes it possible. This tool convert the trace data to VCD (value change dump) data. After this, it is possible to view the context switches in a vcd viewer, such as GTKWave. This tools is available Media:sched_switch-0.1.1.tar.gz.
To use this program first make a ftrace file, and export it into a text file
cat /debug/tracing/trace > /tmp/trace.txt
Now convert the trace data and start the viewer:
sched_switch /tmp/trace.txt /tmp/trace.vcd gtkwave /tmp/trace.vcd
To display a particular data channel within gtkwave, select sched_switch in the top left window and drag and drop the label of the requested channel from the bottom left window labeled "Signals" to the drop area at the left side of the main wave display area (also labeled "Signals"). Alternatively, you may select the requested channels in the bottom left window labeled "Signals" and then click on the "Insert" button below the window.
The VCD uses the following definitions :
- Z no cpu assigned
- U undefined (wake up is done for a cpu to become ready)
- 0/1 binary encoded cpu number
- L/H binary encoded cpu number with priority inheritance
[edit] Function Graph Tracer
This tracer depends of the CONFIG_FUNCTION_GRAPH kernel option.
This tracer is similar to the function tracer except that it probes a function on its entry and its exit, and also the duration of each function.
[edit] Enabling Function Graph Tracer for OMAP3
Actually, Function Graph tracer for the ARM architecture is not supported in the mainline. Tim Bird works on patches to enable this option (source), but some improvements has to be done for OMAP3. The following patches will enable this tracer on OMAP3 Media:Function_graph_tracer_support.tar.gz.
[edit] Enabling Function Graph Tracer for OMAP4
OMAP4 is missing a notrace annotation on omap_readl(). Apply the following patch to enable the trace. [1]
[edit] The Function graph presentation
# tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) 0.000 us | } /* clear_buddies */ 0) 0.000 us | update_min_vruntime(); 0) + 30.517 us | } /* dequeue_entity */ 0) | dequeue_entity() { 0) | update_curr() { 0) 0.000 us | update_min_vruntime(); 0) 0.000 us | } 0) 0.000 us | clear_buddies(); 0) 0.000 us | update_min_vruntime(); 0) + 30.518 us | }
The function graph tracer traces both the entry and exit of a function, which gives the tracer the ability to know the depth of functions that are called. The function graph tracer can make following the flow of execution within the kernel much easier to follow with the human eye.
This gives the start and end of a function denoted with the C like annotation of "{" to start a function and "}" at the end. Leaf functions, which do not call other functions, simply end with a ";". The DURATION column shows the time spent in the corresponding function. The function graph tracer records the time the function was entered and exited and reports the difference as the duration.
The "+" that are there are annotation marker. When the duration is greater than 10 microseconds, a "+" is shown. If the duration is greater than 100 microseconds a "!" will be displayed.
[edit] Function Profiling
With the function profiler, it is possible to take a good look at the actual function execution and not just samples. If CONFIG_FUNCTION_GRAPH_TRACER is configured in the kernel, the function profiler will use the function graph infrastructure to record how long a function has been executing. If just CONFIG_FUNCTION_TRACER is configured, the function profiler will just count the functions being called.
# echo nop > current_tracer # echo 1 > function_profile_enabled # cat trace_stat/function0 |head Function Hit Time Avg -------- --- ---- --- schedule 923 325906219 us 353094.4 us vfs_read 238 109621612 us 460595.0 us sys_read 208 108491821 us 521595.2 us do_sync_read 123 49456207 us 402082.9 us pipe_read 63 48292785 us 766552.1 us pipe_wait 63 48276550 us 766294.4 us sys_futex 5 48068145 us 9613629 us do_futex 5 48068115 us 9613623 us
The above also includes the times that a function has been preempted or schedule() was called and the task was swapped out. This may seem useless, but it does give an idea of what functions get preempted often. Ftrace also includes options that allow you to have the function graph tracer ignore the time the task was scheduled out.
# echo 0 > options/sleep-time # echo 0 > function_profile_enabled # echo 1 > function_profile_enabled # cat trace_stat/function0 | head Function Hit Time Avg -------- --- ---- --- cpuidle_idle_call 337 20587921 us 61091.75 us omap3_enter_idle 337 20555786 us 60996.39 us omap3_enter_idle_bm 329 20523254 us 62380.71 us omap_sram_idle 324 19609313 us 60522.57 us asm_do_IRQ 2487 1954010 us 785.689 us do_page_fault 5639 1825439 us 323.716 us handle_mm_fault 5656 1386108 us 245.068 us do_DataAbort 3548 1214050 us 342.178 us
Another option that affects profiling is graph-time (again with a "-"). By default it is enabled. When enabled, the times for a function include the times of all the functions that were called within the function. As you can see from the output in the above examples, several system calls are listed with the highest average. When disabled, the times only include the execution of the function itself, and do not contain the times of functions that are called from the function:
# echo 0 > options/graph-time # echo 0 > function_profile_enabled # echo 1 > function_profile_enabled # cat trace_stat/function0 | head Function Hit Time Avg -------- --- ---- --- omap_sram_idle 41 1189849 us 29020.72 us sub_preempt_count 15131 56701.65 us 3.747 us add_preempt_count 14506 52795.40 us 3.639 us _pwrdm_state_switch 1513 31951.91 us 21.118 us omap_i2c_set_wfc_mpu_wkup_lat 150 31738.27 us 211.588 us prm_read_mod_bits_shift 2579 25024.42 us 9.703 us __do_fault 472 23559.57 us 49.914 us handle_level_irq 352 18859.87 us 53.579 us
- NOTE: Disabling the function profiler and then re-enabling it causes the numbers to reset.
[edit] FTD (Function Trace Dump)
To analyse system data, a post-processing tool called ftd was written by Tim Bird (source).
To install:
* Download * rename to 'ftd': mv Ftd.txt ftd * make it executable: chmod a+x * Put it on your path somewhere: sudo mv ftd /usr/local/bin
FTD is a script with capability to show call counts and cumulative time for functions in a trace log.
Note: Ftd is written in Python, so it is recommended to move your trace log to a development host and run ftd there, rather than on the target.
To retrieve the trace log data, use:
#cat trace > /tmp/trace-data.txt
To see a list of functions, sorted by total time spent in them, use:
$ ftd /tmp/trace-data.txt Function Count Time Average Local ----------------------------------- ----- ---------- -------- ---------- do_page_fault 159 76416012 480603 4364019 handle_mm_fault 159 56945800 358149 4516606 do_DataAbort 94 50079342 532758 1373293 asm_do_IRQ 36 43182375 1199510 1098636 __do_fault 140 40649404 290352 7293704 flush_old_exec 1 33416739 33416739 152581 mmput 1 32623290 32623290 30522 exit_mmap 1 32318110 32318110 427241
Other useful tasks that ftd can be used for include:
- Sorting the function list by function count—the number of times that the function was called during the trace.
- Examining the local time of a function. The local time of a function is the elapsed time between the start and end of the function, minus the time spent in all functions called between the start and end of the function. Note that this includes not just children function called by this function, but also interrupts. Local time also includes the time spent in user space, and in other processes’ kernel functions (ie, when the function’s process is scheduled out.) So local time should be interpreted cautiously, with this understanding.
- Finding the subroutines called by functions the most times.
See ftd -h for usage help, for the command line options to use for these tasks.
[edit] Boot Tracer
This tracer is activated with the CONFIG_BOOT_TRACER kernel configuration option.
This tracer helps developers to optimize boot time: it records the timings of initcalls and traces key events and the identity of tasks that can cause boot delays.
The easiest way is to pass initcall_debug and printk.time=1 to the kernel command line, and boot the system. Then copy/paste the console output to a file (say boot.log or type dmesg > boot.log.
Then, generate a boot graph by using a utility provided in the Linux kernel sources:
cat boot.log | perl $(KERNEL_SRC_DIR)/scripts/bootgraph.pl > boot.svg
This graph can be viewed with Inkscape or Firefox. It gives a visual representation of the delays in initcalls.
Another way to get the same results is to pass in initcall_debug and ftrace=initcall to the kernel command line. You can then access timing information in the /sys/kernel/debug/tracing/trace file:
# cat current_tracer initcall # cat trace |head -20 # tracer: initcall # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | [ 0.000000000] calling init_mmap_min_addr+0x0/0x10 @ 1 [ 0.000000000] initcall init_mmap_min_addr+0x0/0x10 returned 0 after 0 msecs [ 0.000000000] calling net_ns_init+0x0/0x1ec @ 1 [ 0.000000000] initcall net_ns_init+0x0/0x1ec returned 0 after 0 msecs [ 0.000000000] calling ptrace_break_init+0x0/0x24 @ 1 [ 0.000000000] initcall ptrace_break_init+0x0/0x24 returned 0 after 0 msecs [ 0.000000000] calling consistent_init+0x0/0x108 @ 1 [ 0.000000000] initcall consistent_init+0x0/0x108 returned 0 after 0 msecs [ 0.000000000] calling v6_userpage_init+0x0/0x8 @ 1 [ 0.000000000] initcall v6_userpage_init+0x0/0x8 returned 0 after 0 msecs [ 0.000000000] calling sysctl_init+0x0/0x1c @ 1 [ 0.000000000] initcall sysctl_init+0x0/0x1c returned 0 after 0 msecs [ 0.000000000] calling ksysfs_init+0x0/0xc4 @ 1 swapper-1 [000] 0.000000: 1:120:R + [000] 6:120:S khelper swapper-1 [000] 0.000000: 1:120:D ==> [000] 6:120:R khelper khelper-6 [000] 0.000000: 6:120:R + [000] 9:120:R <...>
You will then run:
cat /sys/kernel/debug/tracing/trace | perl $(KERNEL_SRC_DIR)/scripts/bootgraph.pl > output.svg
- Note : The clock used for printk.time=1 does not necessarily start running from zero. On such platforms the bootgraph does not work properly as the calculated boottime will be too long. To correct this, two patches modify the script to accept only non-zero start times ([2] & [3]).
- The boot trace can also be analyzed through a VCD file create by the sched_switch tool, for more information see Sched_switch viewer.
Note: It is possible to trace events during the boot phase with this parameter in the bootargs (more information #Trace Event)
trace_event=[event-list]
[edit] Latency Tracers
When the latency-format option is enabled (Tracer option), the trace file gives somewhat more information to see why a latency happened.
Here is typical trace :
# tracer: irqsoff # # irqsoff latency trace v1.1.5 on 2.6.33-dirty # -------------------------------------------------------------------- # latency: 4516 us, #7/7, CPU#0 | (M:server VP:0, KP:0, SP:0 HP:0) # ----------------- # | task: -25671 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: ret_slow_syscall # => ended at: vector_swi # # # _------=> CPU# # / _-----=> irqs-off # | / _----=> need-resched # || / _---=> hardirq/softirq # ||| / _--=> preempt-depth # |||| /_--=> lock-depth # |||||/ delay # cmd pid |||||| time | caller # \ / |||||| \ | / gdm-bina-25671 0d.... 0us : __up_read gdm-bina-25671 0d.... 0us : __up_read gdm-bina-25671 0d.... 0us : trace_hardirqs_on <-__up_read gdm-bina-25671 0d.... 0us!: ret_slow_syscall gdm-bina-25671 0d.... 4517us : vector_swi gdm-bina-25671 0d.... 4517us : trace_hardirqs_on <-vector_swi gdm-bina-25671 0d.... 4517us :
This shows that the current tracer is "irqsoff" tracing the time for which interrupts were disabled. It gives the trace version and the version of the kernel upon which this was executed on (2.6.33). Then it displays the max latency in microsecs (4516 us). The number of trace entries displayed and the total number recorded (both are seven: #7/7).VP, KP, SP, and HP are always zero and are reserved for later use.
The task is the process that was running when the latency occurred. (pid: 25671).
The start and stop (the functions in which the interrupts were disabled and enabled respectively) that caused the latencies:
-
- ret_slow_syscall is where the interrupts were disabled.
- vector_swi is where they were enabled again.
The next lines after the header are the trace itself. The header explains which is which.
- cmd: The name of the process in the trace.
- pid: The PID of that process.
- CPU#: The CPU which the process was running on.
- irqs-off: 'd' interrupts are disabled. '.' otherwise.
Note: If the architecture does not support a way to read the irq flags variable, an 'X' will always be printed here.
- need-resched: 'N' task need_resched is set, '.' otherwise.
- hardirq/softirq:
- 'H' - hard irq occurred inside a softirq.
- 'h' - hard irq is running
- 's' - soft irq is running
- '.' - normal context.
- preempt-depth: The level of preempt_disabled
The above is mostly meaningful for kernel developers.
- time: When the latency-format option is enabled, the trace file output includes a timestamp relative to the start of the trace. This differs from the output when latency-format is disabled, which includes an absolute timestamp.
- delay: This is just to help catch your eye a bit better. And needs to be fixed to be only relative to the same CPU. The marks are determined by the difference between this current trace and the next trace.
- '!' - greater than preempt_mark_thresh (default 100)
- '+' - greater than 1 microsecond
- ' ' - less than or equal to 1 microsecond.
The rest is the same as the 'trace' file.
For more precise results, you can disabled the function tracing, which can bring a large overhead, by turning off the kernel variable ftrace_enabled.
# sysctl kernel.ftrace_enabled=1
WARNING Before launching each latency tracer, don't forget to reset the tracing_max latency
echo 0 > tracing_max_latency
NOTE: The following tracers are made for measuring latency, so the latency-format option will be enabled automatically.
[edit] Irqsoff Latency Tracer
This tracer is activate with CONFIG_IRQSOFF_TRACER
The plugin irqsoff is a way to measure times in the kernel that interrupts are disabled. When a new maximum latency is hit, the tracer saves the trace leading up to that latency point so that every time a new maximum is reached, the old saved trace is discarded and the new trace is saved. An example of trace is visible above
[edit] Wakeup Tracer
This tracer is activate by enabling the CONFIG_SCHED_TRACER
This tracer tracks the latency of the highest priority task to be scheduled in, starting from the point it has woken up.
# echo wakeup > current_tracer # chrt -f 5 sleep 2 # cat trace # tracer: wakeup # # wakeup latency trace v1.1.5 on 2.6.33-dirty # -------------------------------------------------------------------- # latency: 61 us, #4/4, CPU#0 | (M:server VP:0, KP:0, SP:0 HP:0) # ----------------- # | task: -2237 (uid:0 nice:0 policy:1 rt_prio:5) # ----------------- # # _------=> CPU# # / _-----=> irqs-off # | / _----=> need-resched # || / _---=> hardirq/softirq # ||| / _--=> preempt-depth # |||| /_--=> lock-depth # |||||/ delay # cmd pid |||||| time | caller # \ / |||||| \ | / dbus-dae-623 0d.h.. 31us : 623:120:R + [000] 2237: 94:S sleep dbus-dae-623 0d.h.. 31us+: 0 dbus-dae-623 0..... 61us : schedule dbus-dae-623 0..... 61us : 623:120:R ==> [000] 2237: 94:R sleep
Running this on an idle system, we see that it took 61 microseconds to perform the task switch.
[edit] Tracer options
Some options can be added to the tracer to change its output. The list of all options is contain in the ftrace_options file
# cat trace_options print-parent nosym-offset nosym-addr noverbose noraw nohex nobin noblock nostacktrace trace_printk noftrace_preempt nobranch annotate nouserstacktrace nosym-userobj noprintk-msg-only context-info nolatency-format sleep-time graph-time nofunc_stack_trace
To enable an option, just echo its name in the ftrace_options
# echo sym-addr > trace_options # cat trace | head -10 # tracer: function # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | Xorg-15046 [000] 10769.470612: kmem_cache_free<-user_path_at Xorg-15046 [000] 10769.470612: sys_stat64 <-ret_fast_syscall Xorg-15046 [000] 10769.470612: vfs_stat <-sys_stat64 Xorg-15046 [000] 10769.470612: vfs_fstatat <-sys_stat64 Xorg-15046 [000] 10769.470612: user_path_at <-vfs_fstatat Xorg-15046 [000] 10769.470612: getname <-user_path_at
In this example the sym-addr option add the adress of the symbol call. To remove an option, just echo its name preceded by no in the same ftrace_options file
# echo nosym-addr > trace_options
WARNING the func_stack_trace option is very dangerous and must be use with precaution. Its usage is detailled in the #Dynamic Ftrace part. WARNING
[edit] Dynamic Ftrace
It is highly recommended to enable CONFIG_DYNAMIC_FTRACE because of this performance enhancement (see How the Function Tracer work ? and Benchmark). In addition, CONFIG_DYNAMIC_FTRACE gives the ability to filter which function should be traced.
Note: Dynamic FTrace does not work with Function_graph tracer patches.(see Function Graph Tracer)
[edit] Enabling Dynamic Ftrace for ARM
Actually Dynamic Ftrace for ARM is not officialy supported but a patch permit to activate this tracer.
- Download this patch : (kernel patches)
- move/copy the patch in the kernel folder
- run command
$ patch -p1 < dynamic_ftrace.patch patching file Makefile patching file arch/arm/Kconfig patching file arch/arm/Kconfig.debug patching file arch/arm/include/asm/ftrace.h patching file arch/arm/kernel/armksyms.c patching file arch/arm/kernel/entry-common.S patching file arch/arm/kernel/ftrace.c patching file kernel/trace/Kconfig patching file scripts/Makefile.build patching file scripts/recordmcount.pl
[edit] Using Dynamic Ftrace
Running the function tracer can be overwhelming. The amount of data may be vast, and very hard to get a hold of by the human brain. Ftrace provides a way to limit what functions you see. Two files exist that let you limit what functions are traced:
-
- set_ftrace_filter
- set_ftrace_notrace
When any function is listed in the set_ftrace_filter, only those functions will be traced. This will help the performance of the system when the trace is active. Tracing every function incurs a large overhead, but when using the set_ftrace_filter, only those functions listed in that file will have the NOPs changed to call the tracer. Depending on which functions are being traced, just having a couple of hundred functions enabled is hardly noticeable.
NOTE: The functions listed in set_ftrace_notrace take precedence. That is, if a function is listed in both set_ftrace_notrace and set_ftrace_filter, that function will not be traced.
[edit] Selecting filter function
A list of functions that can be added to the filter files is shown in the available_filter_functions file. This file contain almost all functions in the kernel.
# cat available_filter_functions |head do_one_initcall run_init_process init_post name_to_dev_t elf_check_arch arm_elf_read_implies_exec elf_set_personality set_irq_flags show_interrupts machine_halt ....
To add a filter, just echo his name in set_function_filter
echo do_one_initcall > set_ftrace_filter
But adding one by one all the wanted event can be boring Fortunately, these files also take wildcards; the following glob expressions are valid:
- value* - Select all functions that begin with value.
- *value* - Select all functions that contain the text value.
- *value - Select all functions that end with value.
To remove a filter or more, use the !symbol echo '!do_one_initcall' > set_function_filter
All these commands work also with set_ftrace_notrace
[edit] Stop the trace on a specific function
Enabling and disabling recording to the trace file can be done using tracing_on file from debugfs. It can also be controlled accurately through tracing_on() and tracing_off() functions from the kernel, which require unfortunately to recompile the kernel. In the special case where you just want to switch tracing on/off in a specific kernel module you don't need to rebuild the whole kernel. Just add tracing_on() and tracing_off() to your module and recompile it. A last method allows stopping the trace at a particular function. This method uses the set_ftrace_filter file.
The format of the command to have the function trace enable or disable the ring buffer is as follows:
function:command[:count]
This will execute the command at the start of the function. The command is either traceon or traceoff, and an optional count can be added to have the command only execute a given number of times. If the count is left off (including the leading colon) then the command will be executed every time the function is called.
# echo '__do_softirq:traceoff' > set_ftrace_filter # cat set_ftrace_filter #### all functions enabled #### __do_softirq:traceoff:unlimited
Notice that functions with commands do not affect the general filters. Even though a command has been added to __do_softirq, the filter still allowed all functions to be traced. Commands and filter functions are separate and do not affect each other.
[edit] What calls a specific function?
Sometimes it is useful to know what is calling a particular function. The immediate predecessor is helpful, but an entire backtrace is even better. The function tracer contains an option that will create a backtrace in the ring buffer for every function that is called by the tracer.
To use the function tracer backtrace feature, it is imperative that the functions being called are limited by the function filters. The option to enable the function backtracing is unique to the function tracer and activating it can only be done when the function tracer is enabled. This means you must first enable the function tracer before you have access to the option:
# echo wakeup_softirqd > set_ftrace_filter # echo function > current_tracer # echo 1 > options/func_stack_trace # cat trace | tail => ftrace_call events/0-5 [000] 21472.572144: wakeup_softirqd <-__tasklet_schedule events/0-5 [000] 21472.572174:=> ftrace_call events/0-5 [000] 21472.987244: wakeup_softirqd <-__tasklet_schedule events/0-5 [000] 21472.987244: => ftrace_call events/0-5 [000] 21473.025970: wakeup_softirqd <-__tasklet_schedule events/0-5 [000] 21473.026001: => ftrace_call # echo 0 > options/func_stack_trace # echo > set_ftrace_filter
Warning cat the set_ftrace_filter before enabling the func_stack_trace option to ensure that the filter was enabled. Disable the options/func_stack_trace before disabling the filter.
Warning Also note that the option is non-volatile, that is, even if you enable another tracer plugin in current_tracer, the option will still be enabled if you re-enable the function tracer.
[edit] Trace Event
Using the different previous plugin tracers is not always very practical. Some can bring a large overhead and others do not give enough information to see what happened. Luckily, there is the event tracer. The event tracing is not a plugin. When events are enabled, they will be recorded in any plugin, including the special nop plugin.
[edit] Using Event Tracing
[edit] Via the 'set_event' interface
The events which are available for tracing can be found in the file "available_events".
# cat available_events skb:kfree_skb skb:skb_copy_datagram_iovec block:block_remap block:block_split block:block_plug block:block_bio_complete block:block_bio_bounce block:block_rq_remap block:block_unplug_io block:block_unplug_timer
To enable a particular event, such as 'sched_wakeup', simply echo it to set_event. For example:
# echo sched_wakeup >> /sys/kernel/debug/tracing/set_event
Note: '>>' is necessary, otherwise it will firstly disable all the events.
To disable an event, echo the event name to the set_event file prefixed with an exclamation point:
# echo '!sched_wakeup' >> set_event
To disable all events, echo an empty line to the set_event file:
# echo > /sys/kernel/debug/tracing/set_event
To enable all events, echo '*:*' or '*:' to the set_event file:
# echo *:* > /sys/kernel/debug/tracing/set_event
The events are organized into subsystems, such as ext4, irq, sched, etc., and a full event name looks like this:
# echo 'irq:*' > /sys/kernel/debug/tracing/set_event
[edit] Via the 'enable' toggle
The events available are also listed in /sys/kernel/debug/tracing/events/ hierarchy of directories.
To enable event 'sched_wakeup':
# echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
To disable it:
# echo 0 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
To enable all events in sched subsystem:
# echo 1 > /sys/kernel/debug/tracing/events/sched/enable
To enable all events:
# echo 1 > /sys/kernel/debug/tracing/events/enable
When reading one of these enable files, there are four results:
- 0 - all events this file affects are disabled
- 1 - all events this file affects are enabled
- X - there is a mixture of events enabled and disabled
- ? - this file does not affect any event
[edit] Boot option
In order to facilitate early boot debugging, use boot option:
trace_event=[event-list]
The format of this boot option is the same as the 'set_event' interface
[edit] Example
#echo nop > current_tracer # cat trace # tracer: nop # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | # echo sched:* > set_event # cat trace | head # tracer: nop # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | bash-3871 [000] 20480.632751: sched_wakeup: comm=bash pid=3871 prio=120 success=0 target_cpu=000 bash-3871 [000] 20480.632813: sched_stat_runtime: comm=bash pid=3871 runtime=3295899 [ns] vruntime=14957505137978 [ns] bash-3871 [000] 20480.632843: sched_switch: prev_comm=bash prev_pid=3871 prev_prio=120 prev_state=S ==> next_comm=gdm-binary next_pid=25515 next_prio=120 gdm-binary-25515 [000] 20480.636444: sched_stat_runtime: comm=gdm-binary pid=25515 runtime=3631591 [ns] vruntime=14957507973670 [ns] gdm-binary-25515 [000] 20480.636505: sched_stat_runtime: comm=gdm-binary pid=25515 runtime=61035 [ns] vruntime=14957508034705 [ns] gdm-binary-25515 [000] 20480.636536: sched_wakeup: comm=events/0 pid=5 prio=120 success=1 target_cpu=000
[edit] More Informations
For further detail about event tracing, read the file Documents/Trace/events.txt in the kernel directory
[edit] Using FTrace with Trace-cmd
Using echo and cat to set up Ftrace can be boring. Trace-cmd allow to setup and use Ftrace with very few command. It also provide a tiny GUI to analyse trace data.
[edit] Install Trace-cmd
To get the code of Trace-cmd, just clone the following GIT
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/trace-cmd.git
You need to have "asciidoc" and "swig" packages installed before building Trace-cmd.
Note: the following instructions has to be done in this order, if you want to install trace-cmd on your Zoom2 and the GUI on your computer
[edit] Trace-cmd on the OMAP Zoom2
- you have to download and apply this patch Media:no_cpu_correction.tar.gz, which correct a bug when Trace-cmd try to find the number of cpus on the Zoom2 board.
- Once this patch apply, you can compile Trace-cmd for ARM architecture and copy it into your filesystem.
$ make clean $ make LDFLAGS=-static CC=$HOME/bin/arm-2009q3/bin/arm-none-linux-gnueabi-gcc trace-cmd $ cp trace-cmd /tmp/mmc2
[edit] Install the Trace-cmd GUI
$ make clean $ make install_gui
The last command install the GUI in the $HOME/bin directory.
[edit] Tracing with Trace-cmd
Trace-cmd is very easy to use with simple command. For example, the following one will enable all events and trace them during a ls command.
# trace-cmd record -e all ls
If we had to use Ftrace debugfs interface, it would be a little more difficult
# echo 1 > events/enable # echo 1 > tracing_on # ls # echo 0 > tracing_on
And in this case, trace-cmd is more precise because it trace only during the ls command execution.
At the end of the trace, Trace-cmd create a pre-formated file (trace.dat by default, can be change with the -o option) which contain all information of the trace
If you want to know all the parameter possible with trace-cmd, type trace-cmd without any parameters
# trace-cmd trace-cmd version 0.7 usage: trace-cmd record [-v][-e event [-f filter]][-p plugin][-F][-d][-o file] \ [-s usecs][-O option ][-l func][-g func][-n func] [-P pid][-N host:port][-t][-b size][command ...] -e run command with event enabled -f filter for previous -e event -p run command with plugin enabled -F filter only on the given process -P trace the given pid like -F for the command -l filter function name -g set graph function -n do not trace function -v will negate all -e after it (disable those events) -d disable function tracer when running -o data output file [default trace.dat] -O option to enable (or disable) -s sleep interval between recording (in usecs) [default: 1000] -N host:port to connect to (see listen) -t used with -N, forces use of tcp in live trace -b change kernel buffersize (in kilobytes per CPU) ...........
[edit] Analyzing trace data with GUI
Note : Works only with trace generated by Trace-cmd tools
Trace-cmd provides a tiny GUI call kernelshark. This GUI helps to understand what happens during a trace. To launch kernelshark, just type is name in a terminal. If a trace.dat file is present in the current directory, kernelshark will automatically load it. Otherwise, you can load the file of your choice with "Load info" in the File menu. This GUI permits to apply different filter to the data, watch the process of each task and the process per cpu.
[edit] How to use kernelshark
- Type kernelshark in terminal to launch it
- Load trace data with the File/Load Info menu
- To zoom, just click and drag from left to right on the graph
- To unZoom, just click and drag from right to left
- To plot a precise task, select it in the Plots/Tasks menu
- You can apply some filter on the list of event or on the graph with the Filter menu (the filters are only apply on the CPUs graph)
- NOTE: Presentation of trace-cmd at the Linux Foundation summit 2010 here
[edit] Precision of Ftrace
Actually on OMAP, Ftrace use a 32K Timer with a resolution of only 31 microseconds.This did not allow trace event times to be distinguished accurately. Luckily, there was another clock source available that had higher resolution, the MPU_TIMER. This timer provides more intra-tick resolution than the 32KHz timer, but consumes more power. Tim Bird work on patch to use this timer and will submit it soon.
[edit] Benchmark
As every measuring tool, Ftrace brings an overhead for every function call by the kernel. To get a good idea of the overhead, three benchmark was tested
- Hackbench, this test is a benchmark for measuring the performance, overhead, and scalability of the Linux scheduler. Created by Rusty Russell, it uses client and server processes grouped to send and receive data in order to simulate the connections established for a chat room. Each client sends a message to each server in the group.
- Interbench, It is designed to measure the effect of changes in Linux kernel design or system configuration changes such as cpu, I/O scheduler and filesystem changes and options.
- Duration for uncompress a large tar.gz file.
[edit] Hackbench
According to this graphic, we can see that the Function tracer create a large overhead, but this graph also show up the importance of Dynamic FTrace. Thus, we can see that this option bring an improvement of almost 45% on the performance when tracing the Function Tracer. Moreover, when there is no trace, dynamic Ftrace does not bring an overhead contrary to Ftrace which has an overhead of 47%. Obviously, when using a function filter on Ftrace, the overhead is minimal and does not impact the system
[edit] Interbench
This test bench will measuring the performance of the system for different load. The results show up that for light task (playing 5ms audio intervals, which use only 5% CPU), the impact is only on the latency of the function. But for more heavy task, like gaming which use all CPU it can get,the impact is more important and particularly if an heavy run in background (here reading a file of the size of physical RAM) and major part of the function are not executed, about 60% is not executed compared to an execution without a trace, this will be traduce by bug and freeze for the user, futhermore the latency are still more important. Dynamic Function Tracer obtains the same result, and his overhead is still important for the heavy task.
[edit] Uncompressing a large .tar.gz file
This test consists to decompressing a large tar.gz file (an ubuntu minimal filesystem ~90MB) and measuring the time elapse to do it. According to the graph, we can see the large overhead that bring the function tracer (with or without the Dynamic FTrace option). The other tracers have almost the same time and the deviation are not enough important to be interpreted.
In conclusion, with this test bench, we can say that the overhead of Ftrace is quite important, particularly for the function tracer. We also see that Ftrace brings an overhead even if there is no trace but this problem can be solve by enabling the Dynamic Ftrace option. This options also bring an improvement for the function tracer overhead even if it is still important without a function filter.
[edit] For more information
[edit] patches for gcc 4.4.0
Note recent ARM toolchains instrument the code with calls to '__gnu_mcount_nc' instead of 'mcount'. If you get a compiler warning about missing the symbol '__gnu_mcount_nc', you should apply the patch mentioned in this e-mail (http://marc.info/?l=linux-arm-kernel&m=124946219616111&w=2)
[edit] How to use Ftrace on p-android-omap-3.0 Kernel tree base with Android File system [ICS]
[edit] Kernel configuration & Re-build
The kernel should be reconfigured and rebuilt to enable the support of FTrace
- Clone the choosen kernel and point to the appropriate commit id. Choose one DB and point to its kernel commit id.
- Run the following command from the kernel folder
$ make CROSS_COMPILE=arm-none-linux-gnueabi- ARCH=arm distclean $ make CROSS_COMPILE=arm-none-linux-gnueabi- ARCH=arm blaze_defconfig $ make CROSS_COMPILE=arm-none-linux-gnueabi- ARCH=arm menuconfig
- Set the following in menuconfig
General setup -> Kprobes Kernel Hacking -> Tracers -> Kernel FUNCTION_TRACER Kernel Hacking -> Tracers -> Interrupts-off Latency Tracer Kernel Hacking -> Tracers -> Preemption-off Latency Tracer Kernel Hacking -> Tracers -> Trace max stack Kernel Hacking -> Tracers -> Enable kprobes-based dynamic events Kernel Hacking -> Tracers -> enable/disable ftrace tracepoints dynamically
Note Kernel Function Graph Tracer needs additional patches to support ARM (http://elinux.org/Ftrace_Function_Graph_ARM). So please ensure that you have disabled it.
Note:Follow Module incompatibility rules when you modify/build kernel locally.
- Compile the kernel
$ make CROSS_COMPILE=arm-none-linux-gnueabi- ARCH=arm uImage
(For further information about the compilation of the kernel see Linux OMAP Kernel Project)
[edit] Dependent Modules Rebuild
With Android File system, the system also loads [via insmod] the WLAN and GPS modules. We should build these modules against the kernel for which ftrace has enabled. Without which the system will have continuous reboots!! Please Build WLAN/GFX modules against your kernel, per these instructions: Building Modules
- WLAN
git clone git://git.omapzoom.org/platform/hardware/ti/wlan.git cd wlan git checkout --track -b p-ics-mr1-release origin/p-ics-mr1-release cd mac80211/compat_wl12xx export KLIB=/path/to/kernel export KLIB_BUILD=/path/to/kernel export ARCH=arm make
This results WLAN modules. Replace these modules with Android file system DB WLAN modules.
- compat/compat.ko --> system/lib/modules/
- net/wireless/cfg80211.ko --> system/lib/modules/
- net/mac80211/mac80211.ko --> system/lib/modules/
- drivers/net/wireless/ti/wl12xx/wl12xx.ko --> system/lib/modules/
- drivers/net/wireless/ti/wl12xx/wl12xx_sdio.ko --> system/lib/modules/
- GFX
git clone git://git.omapzoom.org/device/ti/proprietary-open.git cd proprietary-open git checkout --track -b ics-mr1-release origin/ics-mr1-release
Untar the sgx.tgz tarball located in proprietary-open/omap4. You should now have the directory proprietary-open/omap4/sgx. Follow instructions at device/ti/proprietary-open/omap4/sgx/README. Specifically:
cd omap4/sgx cd src/eurasia_km/eurasiacon/build/linux2/omap4430_android export KERNELDIR=path_to_kernel
If you are building for OMAP4430/4460, use the make command:
make ARCH=arm CROSS_COMPILE=arm-none-linux-gnueabi- TARGET_PRODUCT="blaze_tablet" BUILD=release TARGET_SGX=540 PLATFORM_VERSION=4.0
If you are building for OMAP4470, use the make command:
make ARCH=arm CROSS_COMPILE=arm-none-linux-gnueabi- TARGET_PRODUCT="blaze_tablet" BUILD=release TARGET_SGX=544sc PLATFORM_VERSION=4.0
This results GFX modules. Replace these modules with Android file system DB GFX modules.
OMAP 4430/60
- target/kbuild/omaplfb_sgx540_120.ko --> system/lib/modules/
- target/kbuild/pvrsrvkm_sgx540_120.ko --> system/lib/modules/
OMAP 4470
- target/kbuild/omaplfb_sgx544_120.ko --> system/lib/modules/
- target/kbuild/pvrsrvkm_sgx544_112.ko --> system/lib/modules/
- GPS
This is proprietary module. The source code is available in Clear case. Please contact DB team for location and build instructions. If one can not able to build GPS modules against your kernel , you can rename the gps module in system/lib/modules. With this GPS module will not get loaded.
[edit] Using Ftrace
If debugfs is not mounted please follow below instruction to mount it. Ftrace has its control files in the debugfs system.This is usually mounted in /sys/kernel/debug. If it is not already mounted, then you can mount it yourself with:
# mount -t debugfs nodev /sys/kernel/debug
Many Developpers prefer to have this directory at the root of the system
# mkdir /mnt/debug # mount -t debugfs nodev /mnt/debugfs
That creates a /debug/tracing subdirectory which is used to control ftrace and for getting output from the tool.
Note: all the command of this article will be pass in the /debug/tracing/ directory or d/tracing directory.
[edit] Choosing a tracer
To find out which tracers are available, simply cat the available_tracers file in the tracing directory:
# cat available_tracers function sched_switch nop
To enable, for example, the function tracer, just echo "function" into the current_tracer file.
# echo function > current_tracer # cat current_tracer function
Note: Ftrace can handle only one tracer at the same time
[edit] Using a tracer
At this point Ftrace is ready to trace the function, to start it just echo 1 in the tracing_on file
# echo 1 > tracing_enabled
To stop the trace, just echo 0 in the same file:
# echo 0 > tracing_enabled
The trace is contained in the trace file, here is an example of the output from a function trace. The header helps to decode the various fields in the trace.
# cat trace | head -10 # tracer: function # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | avahi-daemon-630 [000] 9507.400848: vfs_read <-sys_read avahi-daemon-630 [000] 9507.400848: rw_verify_area <-vfs_read avahi-daemon-630 [000] 9507.400848: do_sync_read <-vfs_read avahi-daemon-630 [000] 9507.400848: sock_aio_read <-do_sync_read avahi-daemon-630 [000] 9507.400848: alloc_sock_iocb <-sock_aio_read
[edit] Function tracer
This tracer is activated by enabling CONFIG_FUNCTION_TRACER in the kernel configuration. To work, it needs the kernel variable ftrace_enabled to be turned on, otherwise this tracer is a nop.
# echo function > current_tracer # cat current_tracer function
For Ex: I want to trace prcm_interrupt_handler function
# echo 'prcm_interrupt*' > set_ftrace_filter # cat set_ftrace_filter prcm_interrupt_handler # echo 1 > tracing_enabled run the use case # echo 0 > tracing_enabled [tracing]# cat trace | head -10 # tracer: function # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | |-0 [000] 858.965424: prcm_interrupt_handler <-handle_irq_event_percpu -0 [000] 858.965759: prcm_interrupt_handler <-handle_irq_event_percpu -0 [000] 861.355255: prcm_interrupt_handler <-handle_irq_event_percpu -0 [000] 861.355591: prcm_interrupt_handler <-handle_irq_event_percpu UEventObserver-210 [000] 861.724243: prcm_interrupt_handler <-handle_irq_event_percpu -0 [000] 861.725037: prcm_interrupt_handler <-handle_irq_event_percpu <...>-931 [000] 861.730316: prcm_interrupt_handler <-handle_irq_event_percpu <...>-931 [000] 861.730682: prcm_interrupt_handler <-handle_irq_event_percpu
Note: trace file might already contain some boot up function traces. You can delete contents using below command before actual function trace.
# echo 0 > trace
[edit] Using pyTimechart to visualize Ftrace
- Run the following command
$ sudo apt-get install pytimechart
- Recording the trace
Script to record the traces. Can be edited based on the events of interest. [4]
$ chmod a+x pytimechart-record
usage of this script
# ./pytimechart-record start #... run your workload now # ./pytimechart-record stop trace written to ~/traceYY-MM-DD-hh-mm-ss.txt.lzma
- The recorded trace can be opened with pytimechart
pytimechart ~/traceYY-MM-DD-hh-mm-ss.txt.lzma
More information on using user interface of pytimechart can be found at [5]