Installing and Using Ftrace

From OMAPpedia

Jump to: navigation, search

The following Wiki explains how to use Ftrace and its different tracers on Linux releases. It describes which tracers work with kernel-omap3 mainline or linux-omap-2.6 mainline, and how they work.

Note: Every functionality was tested on v2.6.33 and v2.6.32 kernel.

Note: Some options are enable on Android (Function Tracer, Irq Tracer...) but the patches of this article won't work.

Note: Added section 13. Ftrace on p-android-omap-3.0 kernel tree base with Android File system [ICS].Tested nop, function,irqsoff tracers on v3.0.21 kernel


Contents

[edit] Kernel configuration & Re-build

The kernel should be reconfigured and rebuilt to enable the support of FTrace

$ make CROSS_COMPILE=arm-none-linux-gnueabi- ARCH=arm distclean
$ make CROSS_COMPILE=arm-none-linux-gnueabi- ARCH=arm omap_zoom2_defconfig 
$ make CROSS_COMPILE=arm-none-linux-gnueabi- ARCH=arm xconfig
Kernel Hacking -> Tracers -> FUNCTION_TRACER
Kernel Hacking -> Tracers -> FUNCTION_GRAPH_TRACER (if possible)
Kernel Hacking -> Tracers -> STACK_TRACER
Kernel Hacking -> Tracers -> DYNAMIC_FTRACE (see  enabling Dynamic Ftrace)

This kernel configuration focus on using Ftrace as a debugging tool. Some configurations for Ftrace are used for other purposes, like finding latency or analyzing the system, they will be explain later.

$ make CROSS_COMPILE=arm-none-linux-gnueabi- ARCH=arm uImage

(For further information about the compilation of the kernel see Linux OMAP Kernel Project)

[edit] Using Ftrace

Ftrace has its control files in the debugfs system.This is usually mounted in /sys/kernel/debug. If it is not already mounted, then you can mount it yourself with:

# mount -t debugfs nodev /sys/kernel/debug

Many Developpers prefer to have this directory at the root of the system

# mkdir /mnt/debug
# mount -t debugfs nodev /mnt/debugfs

That creates a /debug/tracing subdirectory which is used to control ftrace and for getting output from the tool.

Note: all the command of this article will be pass in the /debug/tracing/ directory


[edit] Choosing a tracer

To find out which tracers are available, simply cat the available_tracers file in the tracing directory:

# cat available_tracers 
function sched_switch nop

To enable, for example, the function tracer, just echo "function" into the current_tracer file.

# echo function > current_tracer
# cat current_tracer
function

Note: Ftrace can handle only one tracer at the same time


[edit] Using a tracer

At this point Ftrace is ready to trace the function, to start it just echo 1 in the tracing_on file

# echo 1 > tracing_on

To stop the trace, just echo 0 in the same file:

# echo 0 > tracing_on

The trace is contained in the trace file, here is an example of the output from a function trace. The header helps to decode the various fields in the trace.

# cat trace | head -10
# tracer: function
#
#           TASK-PID    CPU#    TIMESTAMP  FUNCTION
#              | |       |          |         |
    avahi-daemon-630   [000]  9507.400848: vfs_read <-sys_read
    avahi-daemon-630   [000]  9507.400848: rw_verify_area <-vfs_read
    avahi-daemon-630   [000]  9507.400848: do_sync_read <-vfs_read
    avahi-daemon-630   [000]  9507.400848: sock_aio_read <-do_sync_read
    avahi-daemon-630   [000]  9507.400848: alloc_sock_iocb <-sock_aio_read
    avahi-daemon-630   [000]  9507.400848: unix_stream_recvmsg <-sock_aio_read


[edit] Tracing a specific process

Perhaps you only need to trace a specific process, or set of processes. The file set_ftrace_pid lets you specify specific processes that you want to trace. To just trace the current thread you can do the following:

  [tracing]# echo $$ > set_ftrace_pid

The above will set the function tracer to only trace the bash shell that executed the echo command. If you want to trace a specific process, you can create a shell script wrapper program.

  [tracing]# cat ~/bin/ftrace-me
  #!/bin/sh
  DEBUGFS=`grep debugfs /proc/mounts | awk '{ print $2; }'`
  echo $$ > $DEBUGFS/tracing/set_ftrace_pid
  echo function > $DEBUGFS/tracing/current_tracer
  exec $*
  [tracing]# ~/bin/ftrace-me ls -ltr

Note, you must clear the set_ftrace_pid file if you want to go back to generic function tracing after performing the above.

  [tracing]# echo -1 > set_ftrace_pid



[edit] The Tracers in detail

All the following tracers are plugins of Ftrace which bring different presentation of trace, and trace different things (Latency, scheduler events)


[edit] nop tracer

This is the tracer by default which trace no function. But it can still be usefull to trace events (see #Trace event)

[edit] Function tracer

This tracer is activated by enabling CONFIG_FUNCTION_TRACER in the kernel configuration. To work, it needs the kernel variable ftrace_enabled to be turned on, otherwise this tracer is a nop.

    # sysctl kernel.ftrace_enabled=1
    # echo function > current_tracer
    # cat current_tracer
    function
    [tracing]# cat trace | head -10
    # tracer: function
    #
    #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
    #              | |       |          |         |
                bash-16939 [000]  6075.461561: mutex_unlock <-tracing_set_tracer
              -0     [001]  6075.461561: _spin_unlock_irqrestore <-hrtimer_get_next_event
              -0     [001]  6075.461562: rcu_needs_cpu <-tick_nohz_stop_sched_tick
                bash-16939 [000]  6075.461563: inotify_inode_queue_event <-vfs_write
              -0     [001]  6075.461563: mwait_idle <-cpu_idle
                bash-16939 [000]  6075.461563: __fsnotify_parent <-vfs_write

The header explains the format of the output pretty well. The first two items are the traced task name and PID. The CPU that the trace was executed on is within the brackets. The timestamp is the time since boot in .format, followed by the function name with its parent following the "<-" symbol.

This information is quite powerful and shows the flow of functions nicely. But it can be a bit hard to follow.


[edit] How the Function Tracer work ?

The function tracer (enabled by CONFIG_FUNCTION_TRACER) is a way to trace almost all functions in the kernel. When function tracing is enabled, the kernel is compiled with the gcc option -pg. This is a profiler that will make all functions call a special function named mcount. Unfortunately, this could cause a very large overhead. To improve it, enable the dynamic Ftrace (#Dynamic Ftrace), then the mcount calls, when not in use, are converted at run time to nops. This allows the function tracer to have zero overhead when not in use. (Finding Origins of Latencies Using Ftrace by Steven Rostedt)

[edit] Sched_switch Tracer

This tracer, also activated by enabling CONFIG_FUNCTION_TRACER, traces the context switches and wakeups between tasks.

# cat trace
# tracer: sched_switch
#
#           TASK-PID    CPU#    TIMESTAMP  FUNCTION
#              | |       |          |         |
            bash-2948  [000]   749.962524:   2948:120:S   + [000]  2948:120:S bash
            bash-2948  [000]   749.962616:   2948:120:S ==> [000]  8369:120:R <...>
           <...>-8369  [000]   749.964264:   8369:120:R   + [000]   625:120:S dbus-daemon
           <...>-8369  [000]   749.964294:   8369:120:R ==> [000]   625:120:R dbus-daemon
     dbus-daemon-625   [000]   749.965454:    625:120:S ==> [000]  8369:120:R <...>
           <...>-8369  [000]   749.965576:   8369:120:R   + [000]   625:120:S dbus-daemon


Wake ups are represented by a "+" and the context switches are shown as "==>". The format is:

      Previous task              Next Task

 ::  ==>  ::
      Current task               Task waking up

 ::    +  ::


The prio is the internal kernel priority, which is the inverse of the priority that is usually displayed by user-space tools. Zero represents the highest priority (99). Prio 100 starts the "nice" priorities with 100 being equal to nice -20 and 139 being nice 19. The prio "140" is reserved for the idle task which is the lowest priority thread (pid 0).Below is a quick chart to map the kernel priority to user land priorities.


  Kernel Space                     User Space
===============================================================
  0(high) to  98(low)     user RT priority 99(high) to 1(low)
                          with SCHED_RR or SCHED_FIFO
---------------------------------------------------------------
 99                       sched_priority is not used in scheduling
                          decisions(it must be specified as 0)
---------------------------------------------------------------
100(high) to 139(low)     user nice -20(high) to 19(low)
---------------------------------------------------------------
140                       idle task priority
---------------------------------------------------------------

The task states are:

R - running : wants to run, may not actually be running
S - sleep  : process is waiting to be woken up (handles signals)
D - disk sleep (uninterruptible sleep) : process must be woken up (ignores signals)
T - stopped : process suspended
t - traced  : process is being traced (with something like gdb)
Z - zombie  : process waiting to be cleaned up
X - unknown


[edit] Sched_switch viewer

To visualize the temporal relationship of Linux tasks, a graphical tools makes it possible. This tool convert the trace data to VCD (value change dump) data. After this, it is possible to view the context switches in a vcd viewer, such as GTKWave. This tools is available Media:sched_switch-0.1.1.tar.gz.

To use this program first make a ftrace file, and export it into a text file

cat /debug/tracing/trace > /tmp/trace.txt

Now convert the trace data and start the viewer:

sched_switch /tmp/trace.txt /tmp/trace.vcd
gtkwave /tmp/trace.vcd

To display a particular data channel within gtkwave, select sched_switch in the top left window and drag and drop the label of the requested channel from the bottom left window labeled "Signals" to the drop area at the left side of the main wave display area (also labeled "Signals"). Alternatively, you may select the requested channels in the bottom left window labeled "Signals" and then click on the "Insert" button below the window.

The VCD uses the following definitions :

[edit] Function Graph Tracer

This tracer depends of the CONFIG_FUNCTION_GRAPH kernel option.

This tracer is similar to the function tracer except that it probes a function on its entry and its exit, and also the duration of each function.

[edit] Enabling Function Graph Tracer for OMAP3

Actually, Function Graph tracer for the ARM architecture is not supported in the mainline. Tim Bird works on patches to enable this option (source), but some improvements has to be done for OMAP3. The following patches will enable this tracer on OMAP3 Media:Function_graph_tracer_support.tar.gz.

[edit] Enabling Function Graph Tracer for OMAP4

OMAP4 is missing a notrace annotation on omap_readl(). Apply the following patch to enable the trace. [1]

[edit] The Function graph presentation

# tracer: function_graph
#
# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |
 0)   0.000 us    |                      } /* clear_buddies */
 0)   0.000 us    |                      update_min_vruntime();
 0) + 30.517 us   |                    } /* dequeue_entity */
 0)               |                    dequeue_entity() {
 0)               |                      update_curr() {
 0)   0.000 us    |                        update_min_vruntime();
 0)   0.000 us    |                      }
 0)   0.000 us    |                      clear_buddies();
 0)   0.000 us    |                      update_min_vruntime();
 0) + 30.518 us   |                    }


The function graph tracer traces both the entry and exit of a function, which gives the tracer the ability to know the depth of functions that are called. The function graph tracer can make following the flow of execution within the kernel much easier to follow with the human eye.


This gives the start and end of a function denoted with the C like annotation of "{" to start a function and "}" at the end. Leaf functions, which do not call other functions, simply end with a ";". The DURATION column shows the time spent in the corresponding function. The function graph tracer records the time the function was entered and exited and reports the difference as the duration.


The "+" that are there are annotation marker. When the duration is greater than 10 microseconds, a "+" is shown. If the duration is greater than 100 microseconds a "!" will be displayed.

[edit] Function Profiling

With the function profiler, it is possible to take a good look at the actual function execution and not just samples. If CONFIG_FUNCTION_GRAPH_TRACER is configured in the kernel, the function profiler will use the function graph infrastructure to record how long a function has been executing. If just CONFIG_FUNCTION_TRACER is configured, the function profiler will just count the functions being called.


# echo nop > current_tracer
# echo 1 > function_profile_enabled
# cat trace_stat/function0 |head
 Function                               Hit    Time            Avg
 --------                               ---    ----            ---
 schedule                               923    325906219 us     353094.4 us 
 vfs_read                               238    109621612 us     460595.0 us 
 sys_read                               208    108491821 us     521595.2 us 
 do_sync_read                           123    49456207 us     402082.9 us 
 pipe_read                               63    48292785 us     766552.1 us 
 pipe_wait                               63    48276550 us     766294.4 us 
 sys_futex                                5    48068145 us     9613629 us 
 do_futex                                 5    48068115 us     9613623 us 

The above also includes the times that a function has been preempted or schedule() was called and the task was swapped out. This may seem useless, but it does give an idea of what functions get preempted often. Ftrace also includes options that allow you to have the function graph tracer ignore the time the task was scheduled out.

# echo 0 > options/sleep-time
# echo 0 > function_profile_enabled
# echo 1 > function_profile_enabled
# cat trace_stat/function0  | head
Function                               Hit    Time            Avg
 --------                               ---    ----            ---
 cpuidle_idle_call                      337    20587921 us     61091.75 us 
 omap3_enter_idle                       337    20555786 us     60996.39 us 
 omap3_enter_idle_bm                    329    20523254 us     62380.71 us 
 omap_sram_idle                         324    19609313 us     60522.57 us 
 asm_do_IRQ                            2487    1954010 us     785.689 us  
 do_page_fault                         5639    1825439 us     323.716 us  
 handle_mm_fault                       5656    1386108 us     245.068 us  
 do_DataAbort                          3548    1214050 us     342.178 us

Another option that affects profiling is graph-time (again with a "-"). By default it is enabled. When enabled, the times for a function include the times of all the functions that were called within the function. As you can see from the output in the above examples, several system calls are listed with the highest average. When disabled, the times only include the execution of the function itself, and do not contain the times of functions that are called from the function:

# echo 0 > options/graph-time
# echo 0 > function_profile_enabled
# echo 1 > function_profile_enabled
# cat trace_stat/function0  | head
Function                               Hit    Time            Avg
--------                               ---    ----            ---
omap_sram_idle                          41    1189849 us     29020.72 us 
sub_preempt_count                    15131    56701.65 us     3.747 us    
add_preempt_count                    14506    52795.40 us     3.639 us    
_pwrdm_state_switch                   1513    31951.91 us     21.118 us   
omap_i2c_set_wfc_mpu_wkup_lat          150    31738.27 us     211.588 us  
prm_read_mod_bits_shift               2579    25024.42 us     9.703 us    
__do_fault                             472    23559.57 us     49.914 us   
handle_level_irq                       352    18859.87 us     53.579 us  


[edit] FTD (Function Trace Dump)

To analyse system data, a post-processing tool called ftd was written by Tim Bird (source).

To install:

* Download
* rename to 'ftd': mv Ftd.txt ftd
* make it executable: chmod a+x
* Put it on your path somewhere: sudo mv ftd /usr/local/bin

FTD is a script with capability to show call counts and cumulative time for functions in a trace log.

Note: Ftd is written in Python, so it is recommended to move your trace log to a development host and run ftd there, rather than on the target.

To retrieve the trace log data, use:

#cat trace > /tmp/trace-data.txt

To see a list of functions, sorted by total time spent in them, use:

$ ftd /tmp/trace-data.txt
Function                            Count Time       Average  Local     
----------------------------------- ----- ---------- -------- ----------
do_page_fault                         159   76416012   480603    4364019
handle_mm_fault                       159   56945800   358149    4516606
do_DataAbort                           94   50079342   532758    1373293
asm_do_IRQ                             36   43182375  1199510    1098636
__do_fault                            140   40649404   290352    7293704
flush_old_exec                          1   33416739 33416739     152581
mmput                                   1   32623290 32623290      30522
exit_mmap                               1   32318110 32318110     427241

Other useful tasks that ftd can be used for include:

See ftd -h for usage help, for the command line options to use for these tasks.

[edit] Boot Tracer

This tracer is activated with the CONFIG_BOOT_TRACER kernel configuration option.

This tracer helps developers to optimize boot time: it records the timings of initcalls and traces key events and the identity of tasks that can cause boot delays.

The easiest way is to pass initcall_debug and printk.time=1 to the kernel command line, and boot the system. Then copy/paste the console output to a file (say boot.log or type dmesg > boot.log.

Then, generate a boot graph by using a utility provided in the Linux kernel sources:

cat boot.log | perl $(KERNEL_SRC_DIR)/scripts/bootgraph.pl > boot.svg

This graph can be viewed with Inkscape or Firefox. It gives a visual representation of the delays in initcalls.

Another way to get the same results is to pass in initcall_debug and ftrace=initcall to the kernel command line. You can then access timing information in the /sys/kernel/debug/tracing/trace file:

# cat current_tracer 
initcall
# cat trace |head -20
# tracer: initcall
#
#           TASK-PID    CPU#    TIMESTAMP  FUNCTION
#              | |       |          |         |
[    0.000000000] calling  init_mmap_min_addr+0x0/0x10 @ 1
[    0.000000000] initcall init_mmap_min_addr+0x0/0x10 returned 0 after 0 msecs
[    0.000000000] calling  net_ns_init+0x0/0x1ec @ 1
[    0.000000000] initcall net_ns_init+0x0/0x1ec returned 0 after 0 msecs
[    0.000000000] calling  ptrace_break_init+0x0/0x24 @ 1
[    0.000000000] initcall ptrace_break_init+0x0/0x24 returned 0 after 0 msecs
[    0.000000000] calling  consistent_init+0x0/0x108 @ 1
[    0.000000000] initcall consistent_init+0x0/0x108 returned 0 after 0 msecs
[    0.000000000] calling  v6_userpage_init+0x0/0x8 @ 1
[    0.000000000] initcall v6_userpage_init+0x0/0x8 returned 0 after 0 msecs
[    0.000000000] calling  sysctl_init+0x0/0x1c @ 1
[    0.000000000] initcall sysctl_init+0x0/0x1c returned 0 after 0 msecs
[    0.000000000] calling  ksysfs_init+0x0/0xc4 @ 1
         swapper-1     [000]     0.000000:      1:120:R   + [000]     6:120:S khelper
         swapper-1     [000]     0.000000:      1:120:D ==> [000]     6:120:R khelper
         khelper-6     [000]     0.000000:      6:120:R   + [000]     9:120:R <...>

You will then run:

cat /sys/kernel/debug/tracing/trace | perl $(KERNEL_SRC_DIR)/scripts/bootgraph.pl > output.svg


Note: It is possible to trace events during the boot phase with this parameter in the bootargs (more information #Trace Event)

trace_event=[event-list]

[edit] Latency Tracers

When the latency-format option is enabled (Tracer option), the trace file gives somewhat more information to see why a latency happened.

Here is typical trace :

# tracer: irqsoff
#
# irqsoff latency trace v1.1.5 on 2.6.33-dirty
# --------------------------------------------------------------------
# latency: 4516 us, #7/7, CPU#0 | (M:server VP:0, KP:0, SP:0 HP:0)
#    -----------------
#    | task: -25671 (uid:0 nice:0 policy:0 rt_prio:0)
#    -----------------
#  => started at: ret_slow_syscall
#  => ended at:   vector_swi
#
#
#                  _------=> CPU#            
#                 / _-----=> irqs-off        
#                | / _----=> need-resched    
#                || / _---=> hardirq/softirq 
#                ||| / _--=> preempt-depth   
#                |||| /_--=> lock-depth       
#                |||||/     delay             
#  cmd     pid   |||||| time  |   caller      
#     \   /      ||||||   \   |   /           
gdm-bina-25671   0d....    0us : __up_read
gdm-bina-25671   0d....    0us : __up_read
gdm-bina-25671   0d....    0us : trace_hardirqs_on <-__up_read
gdm-bina-25671   0d....    0us!: ret_slow_syscall
gdm-bina-25671   0d.... 4517us : vector_swi
gdm-bina-25671   0d.... 4517us : trace_hardirqs_on <-vector_swi
gdm-bina-25671   0d.... 4517us : 


This shows that the current tracer is "irqsoff" tracing the time for which interrupts were disabled. It gives the trace version and the version of the kernel upon which this was executed on (2.6.33). Then it displays the max latency in microsecs (4516 us). The number of trace entries displayed and the total number recorded (both are seven: #7/7).VP, KP, SP, and HP are always zero and are reserved for later use.

The task is the process that was running when the latency occurred. (pid: 25671).

The start and stop (the functions in which the interrupts were disabled and enabled respectively) that caused the latencies:

ret_slow_syscall is where the interrupts were disabled.
vector_swi is where they were enabled again.

The next lines after the header are the trace itself. The header explains which is which.

Note: If the architecture does not support a way to read the irq flags variable, an 'X' will always be printed here.

'H' - hard irq occurred inside a softirq.
'h' - hard irq is running
's' - soft irq is running
'.' - normal context.

The above is mostly meaningful for kernel developers.

'!' - greater than preempt_mark_thresh (default 100)
'+' - greater than 1 microsecond
' ' - less than or equal to 1 microsecond.

The rest is the same as the 'trace' file.

For more precise results, you can disabled the function tracing, which can bring a large overhead, by turning off the kernel variable ftrace_enabled.

# sysctl kernel.ftrace_enabled=1


WARNING Before launching each latency tracer, don't forget to reset the tracing_max latency

echo 0 > tracing_max_latency


NOTE: The following tracers are made for measuring latency, so the latency-format option will be enabled automatically.


[edit] Irqsoff Latency Tracer

This tracer is activate with CONFIG_IRQSOFF_TRACER

The plugin irqsoff is a way to measure times in the kernel that interrupts are disabled. When a new maximum latency is hit, the tracer saves the trace leading up to that latency point so that every time a new maximum is reached, the old saved trace is discarded and the new trace is saved. An example of trace is visible above


[edit] Wakeup Tracer

This tracer is activate by enabling the CONFIG_SCHED_TRACER

This tracer tracks the latency of the highest priority task to be scheduled in, starting from the point it has woken up.

# echo wakeup > current_tracer 
# chrt -f 5 sleep 2
# cat trace
# tracer: wakeup
#
# wakeup latency trace v1.1.5 on 2.6.33-dirty
# --------------------------------------------------------------------
# latency: 61 us, #4/4, CPU#0 | (M:server VP:0, KP:0, SP:0 HP:0)
#    -----------------
#    | task: -2237 (uid:0 nice:0 policy:1 rt_prio:5)
#    -----------------
#
#                  _------=> CPU#            
#                 / _-----=> irqs-off        
#                | / _----=> need-resched    
#                || / _---=> hardirq/softirq 
#                ||| / _--=> preempt-depth   
#                |||| /_--=> lock-depth       
#                |||||/     delay             
#  cmd     pid   |||||| time  |   caller      
#     \   /      ||||||   \   |   /           
dbus-dae-623     0d.h..   31us :    623:120:R   + [000]  2237: 94:S sleep
dbus-dae-623     0d.h..   31us+: 0
dbus-dae-623     0.....   61us : schedule 
dbus-dae-623     0.....   61us :    623:120:R ==> [000]  2237: 94:R sleep

Running this on an idle system, we see that it took 61 microseconds to perform the task switch.


[edit] Tracer options

Some options can be added to the tracer to change its output. The list of all options is contain in the ftrace_options file

# cat trace_options
print-parent
nosym-offset
nosym-addr
noverbose
noraw
nohex
nobin
noblock
nostacktrace
trace_printk
noftrace_preempt
nobranch
annotate
nouserstacktrace
nosym-userobj
noprintk-msg-only
context-info
nolatency-format
sleep-time
graph-time
nofunc_stack_trace

To enable an option, just echo its name in the ftrace_options

# echo sym-addr > trace_options 
# cat trace | head -10
# tracer: function
#
#           TASK-PID    CPU#    TIMESTAMP  FUNCTION
#              | |       |          |         |
            Xorg-15046 [000] 10769.470612: kmem_cache_free  <-user_path_at 
            Xorg-15046 [000] 10769.470612: sys_stat64  <-ret_fast_syscall 
            Xorg-15046 [000] 10769.470612: vfs_stat  <-sys_stat64 
            Xorg-15046 [000] 10769.470612: vfs_fstatat  <-sys_stat64 
            Xorg-15046 [000] 10769.470612: user_path_at  <-vfs_fstatat 
            Xorg-15046 [000] 10769.470612: getname  <-user_path_at 

In this example the sym-addr option add the adress of the symbol call. To remove an option, just echo its name preceded by no in the same ftrace_options file

# echo nosym-addr > trace_options

WARNING the func_stack_trace option is very dangerous and must be use with precaution. Its usage is detailled in the #Dynamic Ftrace part. WARNING




[edit] Dynamic Ftrace

It is highly recommended to enable CONFIG_DYNAMIC_FTRACE because of this performance enhancement (see How the Function Tracer work ? and Benchmark). In addition, CONFIG_DYNAMIC_FTRACE gives the ability to filter which function should be traced.

Note: Dynamic FTrace does not work with Function_graph tracer patches.(see Function Graph Tracer)

[edit] Enabling Dynamic Ftrace for ARM

Actually Dynamic Ftrace for ARM is not officialy supported but a patch permit to activate this tracer.

$ patch -p1 < dynamic_ftrace.patch 
patching file Makefile
patching file arch/arm/Kconfig
patching file arch/arm/Kconfig.debug
patching file arch/arm/include/asm/ftrace.h
patching file arch/arm/kernel/armksyms.c
patching file arch/arm/kernel/entry-common.S
patching file arch/arm/kernel/ftrace.c
patching file kernel/trace/Kconfig
patching file scripts/Makefile.build
patching file scripts/recordmcount.pl

[edit] Using Dynamic Ftrace

Running the function tracer can be overwhelming. The amount of data may be vast, and very hard to get a hold of by the human brain. Ftrace provides a way to limit what functions you see. Two files exist that let you limit what functions are traced:

  • set_ftrace_filter
  • set_ftrace_notrace

When any function is listed in the set_ftrace_filter, only those functions will be traced. This will help the performance of the system when the trace is active. Tracing every function incurs a large overhead, but when using the set_ftrace_filter, only those functions listed in that file will have the NOPs changed to call the tracer. Depending on which functions are being traced, just having a couple of hundred functions enabled is hardly noticeable.

NOTE: The functions listed in set_ftrace_notrace take precedence. That is, if a function is listed in both set_ftrace_notrace and set_ftrace_filter, that function will not be traced.


[edit] Selecting filter function

A list of functions that can be added to the filter files is shown in the available_filter_functions file. This file contain almost all functions in the kernel.

# cat available_filter_functions |head
do_one_initcall
run_init_process
init_post
name_to_dev_t
elf_check_arch
arm_elf_read_implies_exec
elf_set_personality
set_irq_flags
show_interrupts
machine_halt
....


To add a filter, just echo his name in set_function_filter

echo do_one_initcall > set_ftrace_filter

But adding one by one all the wanted event can be boring Fortunately, these files also take wildcards; the following glob expressions are valid:

To remove a filter or more, use the !symbol echo '!do_one_initcall' > set_function_filter

All these commands work also with set_ftrace_notrace

[edit] Stop the trace on a specific function

Enabling and disabling recording to the trace file can be done using tracing_on file from debugfs. It can also be controlled accurately through tracing_on() and tracing_off() functions from the kernel, which require unfortunately to recompile the kernel. In the special case where you just want to switch tracing on/off in a specific kernel module you don't need to rebuild the whole kernel. Just add tracing_on() and tracing_off() to your module and recompile it. A last method allows stopping the trace at a particular function. This method uses the set_ftrace_filter file.

The format of the command to have the function trace enable or disable the ring buffer is as follows:

  function:command[:count]

This will execute the command at the start of the function. The command is either traceon or traceoff, and an optional count can be added to have the command only execute a given number of times. If the count is left off (including the leading colon) then the command will be executed every time the function is called.

# echo '__do_softirq:traceoff' > set_ftrace_filter 
# cat set_ftrace_filter 
#### all functions enabled ####
__do_softirq:traceoff:unlimited

Notice that functions with commands do not affect the general filters. Even though a command has been added to __do_softirq, the filter still allowed all functions to be traced. Commands and filter functions are separate and do not affect each other.

[edit] What calls a specific function?

Sometimes it is useful to know what is calling a particular function. The immediate predecessor is helpful, but an entire backtrace is even better. The function tracer contains an option that will create a backtrace in the ring buffer for every function that is called by the tracer.

To use the function tracer backtrace feature, it is imperative that the functions being called are limited by the function filters. The option to enable the function backtracing is unique to the function tracer and activating it can only be done when the function tracer is enabled. This means you must first enable the function tracer before you have access to the option:

# echo wakeup_softirqd > set_ftrace_filter 
# echo function > current_tracer 
# echo 1 > options/func_stack_trace 
# cat trace | tail
 => ftrace_call
        events/0-5     [000] 21472.572144: wakeup_softirqd <-__tasklet_schedule
        events/0-5     [000] 21472.572174: 
 => ftrace_call
        events/0-5     [000] 21472.987244: wakeup_softirqd <-__tasklet_schedule
        events/0-5     [000] 21472.987244: 
 => ftrace_call
        events/0-5     [000] 21473.025970: wakeup_softirqd <-__tasklet_schedule
        events/0-5     [000] 21473.026001: 
 => ftrace_call
# echo 0 > options/func_stack_trace
# echo > set_ftrace_filter 


Warning cat the set_ftrace_filter before enabling the func_stack_trace option to ensure that the filter was enabled. Disable the options/func_stack_trace before disabling the filter.

Warning Also note that the option is non-volatile, that is, even if you enable another tracer plugin in current_tracer, the option will still be enabled if you re-enable the function tracer.


[edit] Trace Event

Using the different previous plugin tracers is not always very practical. Some can bring a large overhead and others do not give enough information to see what happened. Luckily, there is the event tracer. The event tracing is not a plugin. When events are enabled, they will be recorded in any plugin, including the special nop plugin.


[edit] Using Event Tracing

[edit] Via the 'set_event' interface

The events which are available for tracing can be found in the file "available_events".

# cat available_events
skb:kfree_skb
skb:skb_copy_datagram_iovec
block:block_remap
block:block_split
block:block_plug
block:block_bio_complete
block:block_bio_bounce
block:block_rq_remap
block:block_unplug_io
block:block_unplug_timer


To enable a particular event, such as 'sched_wakeup', simply echo it to set_event. For example:

# echo sched_wakeup >> /sys/kernel/debug/tracing/set_event

Note: '>>' is necessary, otherwise it will firstly disable all the events.

To disable an event, echo the event name to the set_event file prefixed with an exclamation point:

# echo '!sched_wakeup' >> set_event

To disable all events, echo an empty line to the set_event file:

# echo > /sys/kernel/debug/tracing/set_event

To enable all events, echo '*:*' or '*:' to the set_event file:

# echo *:* > /sys/kernel/debug/tracing/set_event

The events are organized into subsystems, such as ext4, irq, sched, etc., and a full event name looks like this: :. The subsystem name is optional, but it is displayed in the available_events file. All of the events in a subsystem can be specified via the syntax ":*"; for example, to enable all irq events, you can use the command:

# echo 'irq:*' > /sys/kernel/debug/tracing/set_event


[edit] Via the 'enable' toggle

The events available are also listed in /sys/kernel/debug/tracing/events/ hierarchy of directories.

To enable event 'sched_wakeup':

# echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable

To disable it:

# echo 0 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable

To enable all events in sched subsystem:

# echo 1 > /sys/kernel/debug/tracing/events/sched/enable

To enable all events:

# echo 1 > /sys/kernel/debug/tracing/events/enable

When reading one of these enable files, there are four results:

0 - all events this file affects are disabled
1 - all events this file affects are enabled
X - there is a mixture of events enabled and disabled
? - this file does not affect any event


[edit] Boot option

In order to facilitate early boot debugging, use boot option:

trace_event=[event-list]

The format of this boot option is the same as the 'set_event' interface

[edit] Example

#echo nop > current_tracer 
# cat trace
# tracer: nop
#
#           TASK-PID    CPU#    TIMESTAMP  FUNCTION
#              | |       |          |         |
# echo sched:* > set_event 
# cat trace | head
# tracer: nop
#
#           TASK-PID    CPU#    TIMESTAMP  FUNCTION
#              | |       |          |         |
            bash-3871  [000] 20480.632751: sched_wakeup: comm=bash pid=3871 prio=120 success=0 target_cpu=000
            bash-3871  [000] 20480.632813: sched_stat_runtime: comm=bash pid=3871 runtime=3295899 [ns] vruntime=14957505137978 [ns]
            bash-3871  [000] 20480.632843: sched_switch: prev_comm=bash prev_pid=3871 prev_prio=120 prev_state=S ==>  next_comm=gdm-binary next_pid=25515 next_prio=120
      gdm-binary-25515 [000] 20480.636444: sched_stat_runtime: comm=gdm-binary pid=25515 runtime=3631591 [ns] vruntime=14957507973670 [ns]
      gdm-binary-25515 [000] 20480.636505: sched_stat_runtime: comm=gdm-binary pid=25515 runtime=61035 [ns] vruntime=14957508034705 [ns]
      gdm-binary-25515 [000] 20480.636536: sched_wakeup: comm=events/0 pid=5 prio=120 success=1 target_cpu=000


[edit] More Informations

For further detail about event tracing, read the file Documents/Trace/events.txt in the kernel directory


[edit] Using FTrace with Trace-cmd

Using echo and cat to set up Ftrace can be boring. Trace-cmd allow to setup and use Ftrace with very few command. It also provide a tiny GUI to analyse trace data.

[edit] Install Trace-cmd

To get the code of Trace-cmd, just clone the following GIT

git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/trace-cmd.git

You need to have "asciidoc" and "swig" packages installed before building Trace-cmd.

Note: the following instructions has to be done in this order, if you want to install trace-cmd on your Zoom2 and the GUI on your computer


[edit] Trace-cmd on the OMAP Zoom2

$ make clean
$ make LDFLAGS=-static CC=$HOME/bin/arm-2009q3/bin/arm-none-linux-gnueabi-gcc trace-cmd
$ cp trace-cmd /tmp/mmc2

[edit] Install the Trace-cmd GUI

$ make clean
$ make install_gui

The last command install the GUI in the $HOME/bin directory.

[edit] Tracing with Trace-cmd

Trace-cmd is very easy to use with simple command. For example, the following one will enable all events and trace them during a ls command.

# trace-cmd record -e all ls

If we had to use Ftrace debugfs interface, it would be a little more difficult

# echo 1 > events/enable
# echo 1 > tracing_on
# ls
# echo 0 > tracing_on

And in this case, trace-cmd is more precise because it trace only during the ls command execution.

At the end of the trace, Trace-cmd create a pre-formated file (trace.dat by default, can be change with the -o option) which contain all information of the trace

If you want to know all the parameter possible with trace-cmd, type trace-cmd without any parameters

 #  trace-cmd

trace-cmd version 0.7

usage:
 trace-cmd record [-v][-e event [-f filter]][-p plugin][-F][-d][-o file] \
           [-s usecs][-O option ][-l func][-g func][-n func]
           [-P pid][-N host:port][-t][-b size][command ...]
          -e run command with event enabled
          -f filter for previous -e event
          -p run command with plugin enabled
          -F filter only on the given process
          -P trace the given pid like -F for the command
          -l filter function name
          -g set graph function
          -n do not trace function
          -v will negate all -e after it (disable those events)
          -d disable function tracer when running
          -o data output file [default trace.dat]
          -O option to enable (or disable)
          -s sleep interval between recording (in usecs) [default: 1000]
          -N host:port to connect to (see listen)
          -t used with -N, forces use of tcp in live trace
          -b change kernel buffersize (in kilobytes per CPU)
...........


[edit] Analyzing trace data with GUI

Note : Works only with trace generated by Trace-cmd tools

Trace-cmd provides a tiny GUI call kernelshark. This GUI helps to understand what happens during a trace. To launch kernelshark, just type is name in a terminal. If a trace.dat file is present in the current directory, kernelshark will automatically load it. Otherwise, you can load the file of your choice with "Load info" in the File menu. This GUI permits to apply different filter to the data, watch the process of each task and the process per cpu.

Trace of all events during a 'ls' command (#trace-cmd -e all -F ls)


[edit] How to use kernelshark




[edit] Precision of Ftrace

Actually on OMAP, Ftrace use a 32K Timer with a resolution of only 31 microseconds.This did not allow trace event times to be distinguished accurately. Luckily, there was another clock source available that had higher resolution, the MPU_TIMER. This timer provides more intra-tick resolution than the 32KHz timer, but consumes more power. Tim Bird work on patch to use this timer and will submit it soon.

[edit] Benchmark

As every measuring tool, Ftrace brings an overhead for every function call by the kernel. To get a good idea of the overhead, three benchmark was tested


[edit] Hackbench

Hackbench of Ftrace

According to this graphic, we can see that the Function tracer create a large overhead, but this graph also show up the importance of Dynamic FTrace. Thus, we can see that this option bring an improvement of almost 45% on the performance when tracing the Function Tracer. Moreover, when there is no trace, dynamic Ftrace does not bring an overhead contrary to Ftrace which has an overhead of 47%. Obviously, when using a function filter on Ftrace, the overhead is minimal and does not impact the system


[edit] Interbench

This test bench will measuring the performance of the system for different load. The results show up that for light task (playing 5ms audio intervals, which use only 5% CPU), the impact is only on the latency of the function. But for more heavy task, like gaming which use all CPU it can get,the impact is more important and particularly if an heavy run in background (here reading a file of the size of physical RAM) and major part of the function are not executed, about 60% is not executed compared to an execution without a trace, this will be traduce by bug and freeze for the user, futhermore the latency are still more important. Dynamic Function Tracer obtains the same result, and his overhead is still important for the heavy task.

[edit] Uncompressing a large .tar.gz file

Decompression of a large tar.gz file

This test consists to decompressing a large tar.gz file (an ubuntu minimal filesystem ~90MB) and measuring the time elapse to do it. According to the graph, we can see the large overhead that bring the function tracer (with or without the Dynamic FTrace option). The other tracers have almost the same time and the deviation are not enough important to be interpreted.


In conclusion, with this test bench, we can say that the overhead of Ftrace is quite important, particularly for the function tracer. We also see that Ftrace brings an overhead even if there is no trace but this problem can be solve by enabling the Dynamic Ftrace option. This options also bring an improvement for the function tracer overhead even if it is still important without a function filter.



[edit] For more information

[edit] patches for gcc 4.4.0

Note recent ARM toolchains instrument the code with calls to '__gnu_mcount_nc' instead of 'mcount'. If you get a compiler warning about missing the symbol '__gnu_mcount_nc', you should apply the patch mentioned in this e-mail (http://marc.info/?l=linux-arm-kernel&m=124946219616111&w=2)

[edit] How to use Ftrace on p-android-omap-3.0 Kernel tree base with Android File system [ICS]

[edit] Kernel configuration & Re-build

The kernel should be reconfigured and rebuilt to enable the support of FTrace

$ make CROSS_COMPILE=arm-none-linux-gnueabi- ARCH=arm distclean
$ make CROSS_COMPILE=arm-none-linux-gnueabi- ARCH=arm blaze_defconfig
$ make CROSS_COMPILE=arm-none-linux-gnueabi- ARCH=arm menuconfig
General setup -> Kprobes
Kernel Hacking -> Tracers -> Kernel FUNCTION_TRACER
Kernel Hacking -> Tracers -> Interrupts-off Latency Tracer
Kernel Hacking -> Tracers -> Preemption-off Latency Tracer  
Kernel Hacking -> Tracers -> Trace max stack
Kernel Hacking -> Tracers -> Enable kprobes-based dynamic events
Kernel Hacking -> Tracers -> enable/disable ftrace tracepoints dynamically

Note Kernel Function Graph Tracer needs additional patches to support ARM (http://elinux.org/Ftrace_Function_Graph_ARM). So please ensure that you have disabled it.


Note:Follow Module incompatibility rules when you modify/build kernel locally.

$ make CROSS_COMPILE=arm-none-linux-gnueabi- ARCH=arm uImage

(For further information about the compilation of the kernel see Linux OMAP Kernel Project)

[edit] Dependent Modules Rebuild

With Android File system, the system also loads [via insmod] the WLAN and GPS modules. We should build these modules against the kernel for which ftrace has enabled. Without which the system will have continuous reboots!! Please Build WLAN/GFX modules against your kernel, per these instructions: Building Modules

git clone git://git.omapzoom.org/platform/hardware/ti/wlan.git
cd wlan
git checkout --track -b p-ics-mr1-release origin/p-ics-mr1-release
cd mac80211/compat_wl12xx
export KLIB=/path/to/kernel
export KLIB_BUILD=/path/to/kernel
export ARCH=arm
make

This results WLAN modules. Replace these modules with Android file system DB WLAN modules.

compat/compat.ko --> system/lib/modules/
net/wireless/cfg80211.ko --> system/lib/modules/
net/mac80211/mac80211.ko --> system/lib/modules/
drivers/net/wireless/ti/wl12xx/wl12xx.ko --> system/lib/modules/
drivers/net/wireless/ti/wl12xx/wl12xx_sdio.ko --> system/lib/modules/
git clone git://git.omapzoom.org/device/ti/proprietary-open.git
cd proprietary-open
git checkout --track -b ics-mr1-release origin/ics-mr1-release

Untar the sgx.tgz tarball located in proprietary-open/omap4. You should now have the directory proprietary-open/omap4/sgx. Follow instructions at device/ti/proprietary-open/omap4/sgx/README. Specifically:

cd omap4/sgx
cd src/eurasia_km/eurasiacon/build/linux2/omap4430_android
export KERNELDIR=path_to_kernel

If you are building for OMAP4430/4460, use the make command:

make ARCH=arm CROSS_COMPILE=arm-none-linux-gnueabi- TARGET_PRODUCT="blaze_tablet" BUILD=release TARGET_SGX=540 PLATFORM_VERSION=4.0

If you are building for OMAP4470, use the make command:

make ARCH=arm CROSS_COMPILE=arm-none-linux-gnueabi- TARGET_PRODUCT="blaze_tablet" BUILD=release TARGET_SGX=544sc PLATFORM_VERSION=4.0

This results GFX modules. Replace these modules with Android file system DB GFX modules.

OMAP 4430/60

target/kbuild/omaplfb_sgx540_120.ko --> system/lib/modules/
target/kbuild/pvrsrvkm_sgx540_120.ko --> system/lib/modules/

OMAP 4470

target/kbuild/omaplfb_sgx544_120.ko --> system/lib/modules/
target/kbuild/pvrsrvkm_sgx544_112.ko --> system/lib/modules/
This is proprietary module. The source code is available in Clear case. Please contact DB team
for location and build instructions.
If one can not able to build GPS modules against your kernel , you can rename the gps module 
in system/lib/modules. With this GPS module will not get loaded.

[edit] Using Ftrace

If debugfs is not mounted please follow below instruction to mount it. Ftrace has its control files in the debugfs system.This is usually mounted in /sys/kernel/debug. If it is not already mounted, then you can mount it yourself with:

# mount -t debugfs nodev /sys/kernel/debug

Many Developpers prefer to have this directory at the root of the system

# mkdir /mnt/debug
# mount -t debugfs nodev /mnt/debugfs

That creates a /debug/tracing subdirectory which is used to control ftrace and for getting output from the tool.

Note: all the command of this article will be pass in the /debug/tracing/ directory or d/tracing directory.

[edit] Choosing a tracer

To find out which tracers are available, simply cat the available_tracers file in the tracing directory:

# cat available_tracers 
function sched_switch nop

To enable, for example, the function tracer, just echo "function" into the current_tracer file.

# echo function > current_tracer
# cat current_tracer
function

Note: Ftrace can handle only one tracer at the same time

[edit] Using a tracer

At this point Ftrace is ready to trace the function, to start it just echo 1 in the tracing_on file

# echo 1 > tracing_enabled

To stop the trace, just echo 0 in the same file:

# echo 0 > tracing_enabled

The trace is contained in the trace file, here is an example of the output from a function trace. The header helps to decode the various fields in the trace.

# cat trace | head -10
# tracer: function
#
#           TASK-PID    CPU#    TIMESTAMP  FUNCTION
#              | |       |          |         |
    avahi-daemon-630   [000]  9507.400848: vfs_read <-sys_read
    avahi-daemon-630   [000]  9507.400848: rw_verify_area <-vfs_read
    avahi-daemon-630   [000]  9507.400848: do_sync_read <-vfs_read
    avahi-daemon-630   [000]  9507.400848: sock_aio_read <-do_sync_read
    avahi-daemon-630   [000]  9507.400848: alloc_sock_iocb <-sock_aio_read

[edit] Function tracer

This tracer is activated by enabling CONFIG_FUNCTION_TRACER in the kernel configuration. To work, it needs the kernel variable ftrace_enabled to be turned on, otherwise this tracer is a nop.

    # echo function > current_tracer
    # cat current_tracer
    function

For Ex: I want to trace prcm_interrupt_handler function

    # echo 'prcm_interrupt*' > set_ftrace_filter
    # cat set_ftrace_filter
    prcm_interrupt_handler
    # echo 1 > tracing_enabled
    run the use case
    # echo 0 > tracing_enabled
    [tracing]# cat trace | head -10
    # tracer: function
    #
    #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
    #              | |       |          |         |
         -0     [000]   858.965424: prcm_interrupt_handler <-handle_irq_event_percpu
         -0     [000]   858.965759: prcm_interrupt_handler <-handle_irq_event_percpu
         -0     [000]   861.355255: prcm_interrupt_handler <-handle_irq_event_percpu
         -0     [000]   861.355591: prcm_interrupt_handler <-handle_irq_event_percpu
 UEventObserver-210   [000]   861.724243: prcm_interrupt_handler <-handle_irq_event_percpu
         -0     [000]   861.725037: prcm_interrupt_handler <-handle_irq_event_percpu
          <...>-931   [000]   861.730316: prcm_interrupt_handler <-handle_irq_event_percpu
          <...>-931   [000]   861.730682: prcm_interrupt_handler <-handle_irq_event_percpu

Note: trace file might already contain some boot up function traces. You can delete contents using below command before actual function trace.

  # echo 0 > trace


[edit] Using pyTimechart to visualize Ftrace

$ sudo apt-get install pytimechart

Script to record the traces. Can be edited based on the events of interest. [4]

$ chmod a+x pytimechart-record

usage of this script

  # ./pytimechart-record start
  #... run your workload now
  # ./pytimechart-record stop
  trace written to ~/traceYY-MM-DD-hh-mm-ss.txt.lzma
  pytimechart ~/traceYY-MM-DD-hh-mm-ss.txt.lzma
pytimechart screenshot


More information on using user interface of pytimechart can be found at [5]

Personal tools
Namespaces
Variants
Views
  • Read
  • Edit
  • View history
Actions
Navigation
Toolbox
  • What links here
  • Related changes
  • Special pages
  • Printable version