USB testing

From OMAPpedia

Jump to: navigation, search

Contents

[edit] USB TESTING

[edit] Formatting eMMC partition using g_file_storage gadget

@ OMAP side:

1. echo /dev/block/mmcblk1 >/sys/devices/platform/musb_hdrc/gadget/lun0/file

2. insmod g_file_storage file=/sys/devices/platform/musb_hdrc/gadget/lun0/file stall=0

@ Linux PC side:

1. Connect USB cable from the OMAP board to a the Linux PC

2. run “cat /proc/partitions” to see if the partitions has been recognized

3. Use “mkfs.ext3” on the PC to format the partition


[edit] USB mass-storage throughput Measurement:

The USB mass-storage throughput depends on a number of variables:


For best performance use Sandisk SDHC class 6 EXTREME III card with a Linux host. Only use SD Formatter 2.0 for SD/SDHC cards. You can download the Formatter tool from here:

http://www.sdcard.org/consumers/formatter/


A. Throughput with the filesysytem involved:


1. Using a Windows Host Machine:


Make sure you have the following selection in place:



Asynchronous Transfers:

The 'FUA' bit of the CBW packet is in fact a matter of Synchronous vs. Asynchronous Write mode. While transferring files from Windows/MS-DOS, the FUA bit is always set meaning that all write accesses are synchronous accesses. So since we cannot change Windows/MS-DOS we have decided to ignore the FUA bit on the Gadget side. In file_storage.c file in the do_write() routine you need to comment out the following lines:

if (fsg->cmnd[1] & 0x08) {  // FUA
 spin_lock(&curlun->filp->f_lock);
 curlun->filp->f_flags |= O_DSYNC;
 spin_unlock(&curlun->filp->f_lock);
}

That way you make sure that whatever the value of the FUA bit set by the Host, it is ignored by the Gadget driver meaning that all write accesses are asynchronous accesses. Now to measure throughput follow:


From an Ms-Dos command window you type: usb_ms_perf_benchmarking.bat [source] [destination]


With 'source' being a file on the PC Host and 'destination' being the SD Card seen from the PC Host; this is for a write transfer.It’s of course the contrary for read transfer, I mean 'source' being a file on the SD Card and 'destination' being a location on your PC Host’s Hard Drive.


Synchronous Transfers:

While transfering files from a Windows/MS-DOS system, the FUA bit is always set and hence all transfers are synchronous. To measure throughput follow:


From an Ms-Dos command window you type: usb_ms_perf_benchmarking.bat [source] [destination]


With 'source' being a file on the PC Host and 'destination' being the SD Card seen from the PC Host; this is for a write transfer. It’s of course the contrary for read transfer, I mean 'source' being a file on the SD Card and 'destination' being location on your PC Host’s Hard Drive.


For instance execute:

C:\OMAP_PSI\OMAP3630\USB_Mass_Storage>usb_ms_perf_benchmarking.bat test_file.avi E:\

And here is the kind of output you get:


USB MS Performance Benchmarking                                 
version : alpha -11/18/2009     
author : 
       
15:39:18.92
       1 file(s) copied.
15:42:06.93

Process took 0 hours, 2 minutes, 48 seconds, 1 centiseconds
Process took 16801 centiseconds

Then you of course need to perform a basic computation: 701 MB (size of test_file.avi in my example) / 168.01 => 4.17 MB/s


Attached is the executable script usb_ms_perf_benchmarking.txt save it as a .BAT extension on your Windows Host


2. Using Lnux Host Machine:


Asynchronous Transfer :

/usr/bin/time -p dd if=/dev/zero of=/media/boot/test.bin bs=102400  count=1024; //Write 100 MB

Un-mount the drive and detach-attach MUSB cable again

/usr/bin/time -p dd if=/media/boot/test.bin of=/dev/null bs=102400 count=1024;  //Read 100 MB


Synchronous Transfer :

/usr/bin/time -p dd if=/dev/zero of=/media/boot/test.bin bs=102400  count=1024;/usr/bin/time -p sync;  //Write 100 MB

Un-mount drive and detach-attach cable again

/usr/bin/time -p dd if=/media/boot/test.bin of=/dev/null bs=102400 count=1024;/usr/bin/time -p sync;   //Read 100 MB


Here is the kind of output you get from the dd command:


/usr/bin/time -p dd if=/dev/zero of=/media/boot/test.bin bs=102400  count=1024; // Asynchronous Write of 100 MB

100+0 records in
100+0 records out
64857600 bytes (65 MB) copied, 5.40164 s, 12.0 MB/s
real 5.40
user 0.00
sys 0.76

Then calculate throughput = 64857600 Bytes/ 5.40 = 11.5 MBps


/usr/bin/time -p dd if=/dev/zero of=/media/boot/test.bin bs=102400  count=1024;/usr/bin/time -p sync;  //Synchronous Write of 100 MB

100+0 records in
100+0 records out
64857600 bytes (65 MB) copied, 5.40164 s, 12.0 MB/s
real 5.40
user 0.00
sys 0.76
real 1.92
user 0.00
sys 0.01

Then, calculate throughput = 64857600 Bytes/ (5.40+1.92) = 8.5 MBps


Note:


B. RAW throughput without the filesystem involved using MSC and Hdparm:


MSC:


It is an USB Mass Storage Class Verification Tool written by Felipe Balbi as part of his usb-tools tree, for testing the Mass Storage Class (MSC) devices. This msc has various different read/write test cases available within it defined by the -t (which test number) parameter.


Ex:

-t 0: Simple write/Read/verify
-t 1: write/Read/verify 1 sector at a time
-t 2: write/Read/verify 8 sectors at a time
-t 3: write/Read/verify 32 sectors at a time
-t 4: write/Read/verify 64 sectors at a time
-t 5: SG write/read/verify 2 sectors at a time
...
-t n: attempt to read past last sector
-t n: attempt to read starting past the last sector
-t n: attempt to write past last sector
-t n: write 1 64k sg and read in several of random size
-t n: write several of random size, read 1 64k
-t n: write and read several of random size

...many more
* n is appropriate test number, read the source code for the exact test number needed


You can download the source code using:

git clone git://gitorious.org/usb/usb-tools.git

Compile the code:

make CROSS_COMPILE=arm-none-linux-gnueabi- gcc -Wall -O2 -D_GNU_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -g -o msc msc.c

This will give you the executable msc


Hdparm:


'hdparm' is a command line utility for the Linux and Windows operating systems to set and view SATA and IDE hard disk hardware parameters. It can set parameters such as drive caches, sleep mode, power management, acoustic management, and DMA settings. Changing hardware parameters from suboptimal conservative defaults to their optimal settings can improve performance greatly. For example, turning on DMA can in some instances double or triple data throughput.


Unfortunately at present there's no reliable method for determining the optimal settings for a given controller/drive combination, except careful trial and error; nor is there yet any central database that collects and shares the combined experience of hdparm users.


Following are the parameters of our interest:


-T Perform timings of cache reads for benchmark and comparison purposes.

For meaningful results, this operation should be repeated 2-3 times on an otherwise inactive system (no other active processes) with at least a couple of megabytes of free memory. This displays the speed of reading directly from the Linux buffer cache without disk access. This measurement is essentially an indication of the throughput of the processor, cache, and memory of the system under test.


-t Perform timings of device reads for benchmark and comparison purposes.

For meaningful results, this operation should be repeated 2-3 times on an otherwise inactive system (no other active processes) with atleast a couple of megabytes of free memory. This displays the speed of reading through the buffer cache to the disk without any prior caching of data. This measurement is an indication of how fast the drive can sustain sequential data reads under Linux, without any filesystem overhead. To ensure accurate measurements, the buffer cache is :flushed during the processing of -t using the BLKFLSBUF ioctl.


I was able to get this tool running on my system here are the results of that:

$ sudo ./msc -t 0 -o /dev/sdi -s 65536 -c 1024
test 0: sent    64.0000 MBytes read   17937.22 kB/s write    7471.40 kB/s ... success

$ sudo hdparm -tT /dev/sdi

/dev/sdi:

Timing cached reads:   558 MB in  2.00 seconds = 278.89 MB/sec
Timing buffered disk reads:   40 MB in  3.05 seconds =  13.13 MB/sec

[edit] USB mass storage Average CPU Load measurement

A. Using LINUX TOP command :

The method of determining the Average CPU load using %id (idle) from the TOP command is not correct.

Actually idle represents the amount of time when the CPU is idle, but when we run our test case, for majority of the time CPU waits for IO operations (Read/Write to the SD card in mmcqd perhaps) and actually is not 'idle' and therefore we see a low %idle and a high %io during the transfer. Now while it is in this wait state definitely it can perform other work as we do ‘schedule’ within mmcqd, but it's not idle it's waiting on io and hence %idle is low.


Sample:

# top d 1 | grep '^C'

CPU:  0.0% usr 13.0% sys  0.0% nic  0.0% idle 87.0% io  0.0% irq  0.0% sirq
CPU:  0.0% usr 21.2% sys  0.0% nic  0.0% idle 78.7% io  0.0% irq  0.0% sirq
CPU:  0.0% usr 30.0% sys  0.0% nic  0.0% idle 70.0% io  0.0% irq  0.0% sirq
CPU:  0.0% usr 30.6% sys  0.0% nic  0.0% idle 69.3% io  0.0% irq  0.0% sirq
CPU:  0.0% usr 32.3% sys  0.0% nic 12.1% idle 55.5% io  0.0% irq  0.0% sirq

Actually %idle is in no way a true parameter to determine the CPU load. Instead we should use the following methods to determine the Average CPU load:


1. Using the 'Load average' field of top command:


-Execute this command on OMAP:

top d 1 | grep '^L'

-Execute the transfer of data between the Host and SD card [You will see the load numbers increasing]


Sample top output on OMAP:

# top d 1 | grep '^L'
Load average: 0.01 0.02 0.00 1/41 695
Load average: 0.01 0.02 0.00 1/41 695
Load average: 0.01 0.02 0.00 1/41 695
Load average: 0.01 0.02 0.00 1/41 695
Load average: 0.01 0.02 0.00 4/41 695
Load average: 0.01 0.02 0.00 2/41 695
Load average: 0.33 0.08 0.02 1/41 695
Load average: 0.33 0.08 0.02 2/41 695
Load average: 0.54 0.13 0.04 2/41 695
Load average: 0.54 0.13 0.04 2/41 695
Load average: 0.54 0.13 0.04 2/41 695
Load average: 0.74 0.18 0.05 2/41 695
Load average: 0.74 0.18 0.05 1/41 695
Load average: 0.76 0.19 0.06 1/41 695
Load average: 0.76 0.19 0.06 1/41 695
…


-When this transfer is completed, allow the top command to continue running until the first (leftmost) load number slowly decays to a steady point from where it started initially (in this example say ~0.01)

-Then for the collected output, add all the leftmost numbers and divide it by the total number of outputs from top (i.e. average). This would be ~ < 0.25. Use this as the % CPU utilization i.e. (25%* ARM frequency)


2. Use the average load of all the active threads over the whole file transfer duration to determine average CPU load

-Execute this command on OMAP:

top d 1 | egrep 'mmcqd|file-storage|flush'

-Execute the transfer of data between the Host and SD card [you will see the %cpu numbers increasing for mmcqd, file-storage-ga, pdflush ]

Note: Number before the process name is the %cpu


Sample top output on OMAP:

# top d 1 | egrep 'mmcqd|file-storage|flush'
 702   659 root     S     2536  1.0   0  0.0 egrep mmcqd|file-storage|flush
 562     2 root     SW<      0  0.0   0  0.0 [mmcqd]
 299     2 root     SW       0  0.0   0  0.0 [pdflush]
 662     2 root     DW<      0  0.0   0 19.7 [file-storage-ga]
 299     2 root     SW       0  0.0   0  3.9 [pdflush]
 562     2 root     DW<      0  0.0   0  1.9 [mmcqd]
 702   659 root     S     2536  1.0   0  0.0 egrep mmcqd|file-storage|flush
 562     2 root     RW<      0  0.0   0  3.9 [mmcqd]
 702   659 root     S     2536  1.0   0  0.0 egrep mmcqd|file-storage|flush
 662     2 root     DW<      0  0.0   0  0.0 [file-storage-ga]
 299     2 root     SW       0  0.0   0  0.0 [pdflush]
 562     2 root     DW<      0  0.0   0 16.6 [mmcqd]
 662     2 root     DW<      0  0.0   0  4.9 [file-storage-ga]
 702   659 root     S     2536  1.0   0  0.0 egrep mmcqd|file-storage|flush
 299     2 root     SW       0  0.0   0  0.0 [pdflush]
 562     2 root     DW<      0  0.0   0 27.6 [mmcqd]
 662     2 root     DW<      0  0.0   0  8.9 [file-storage-ga]


-When this transfer is completed, stop the top command.

-Now add all %cpu numbers for mmcqd occurrences and divide by the number of mmcqd occurrences, this will give the average %cpu utilization for mmcqd over the file transfer length. Likewise calculate this average %cpu for ‘file-storage’ and ‘pdflush’ processes

-Then,

Total Avg %cpu = Avg %cpu[mmcqd] + Avg %cpu[file=storage] + Avg %cpu[pdflush] + ~1.5% [other commands which we don’t calculate]

This total Avg %cpu would be ~ < 25%. Use this as the % CPU utilization i.e. (25%* ARM frequency)


3. Use ‘%sys’ field from the top command output, which is the ‘System CPU Time: Time the CPU has spent running the kernel and its processes’. This more or less reflects the load on the system.

-Execute this command on OMAP:

top d 1 | grep '^C'

-Execute the transfer of data between the Host and SD card [You will see the %sys numbers increasing ]

Note: use the number before the %sys


Sample top output on OMAP:

# top d 1 | grep '^C'
CPU:  0.0% usr  9.0% sys  0.0% nic 90.9% idle  0.0% io  0.0% irq  0.0% sirq
CPU:  0.0% usr  0.9% sys  0.0% nic 99.0% idle  0.0% io  0.0% irq  0.0% sirq
CPU:  0.0% usr  0.9% sys  0.0% nic 99.0% idle  0.0% io  0.0% irq  0.0% sirq
CPU:  0.9% usr  0.9% sys  0.0% nic 98.0% idle  0.0% io  0.0% irq  0.0% sirq
CPU:  1.0% usr  4.0% sys  0.0% nic 95.0% idle  0.0% io  0.0% irq  0.0% sirq
CPU:  0.0% usr 16.8% sys  0.0% nic 63.3% idle 19.8% io  0.0% irq  0.0% sirq
CPU:  0.0% usr 13.0% sys  0.0% nic  0.0% idle 87.0% io  0.0% irq  0.0% sirq
CPU:  0.0% usr 21.2% sys  0.0% nic  0.0% idle 78.7% io  0.0% irq  0.0% sirq
CPU:  0.0% usr 30.0% sys  0.0% nic  0.0% idle 70.0% io  0.0% irq  0.0% sirq
CPU:  0.0% usr 30.6% sys  0.0% nic  0.0% idle 69.3% io  0.0% irq  0.0% sirq
CPU:  0.0% usr 32.3% sys  0.0% nic 12.1% idle 55.5% io  0.0% irq  0.0% sirq
CPU:  0.4% usr 21.3% sys  0.0% nic  0.0% idle 78.1% io  0.0% irq  0.0% sirq
CPU:  0.0% usr 31.3% sys  0.0% nic  0.0% idle 68.6% io  0.0% irq  0.0% sirq
CPU:  0.0% usr 27.7% sys  0.0% nic  0.0% idle 72.2% io  0.0% irq  0.0% sirq
CPU:  0.9% usr  8.9% sys  0.0% nic  0.0% idle 90.0% io  0.0% irq  0.0% sirq
CPU:  0.0% usr 31.6% sys  0.0% nic  0.0% idle 68.3% io  0.0% irq  0.0% sirq
CPU:  0.0% usr 10.9% sys  0.0% nic 35.7% idle 53.2% io  0.0% irq  0.0% sirq
CPU:  0.0% usr  1.9% sys  0.0% nic 98.0% idle  0.0% io  0.0% irq  0.0% sirq
CPU:  0.9% usr  0.9% sys  0.0% nic 98.0% idle  0.0% io  0.0% irq  0.0% sirq
CPU:  0.0% usr  0.9% sys  0.0% nic 59.4% idle 39.6% io  0.0% irq  0.0% sirq
CPU:  0.9% usr  0.9% sys  0.0% nic 98.0% idle  0.0% io  0.0% irq  0.0% sirq
CPU:  0.0% usr  0.9% sys  0.0% nic 99.0% idle  0.0% io  0.0% irq  0.0% sirq
CPU:  0.9% usr  0.9% sys  0.0% nic 98.0% idle  0.0% io  0.0% irq  0.0% sirq
CPU:  0.0% usr  1.9% sys  0.0% nic 98.0% idle  0.0% io  0.0% irq  0.0% sirq


-When this transfer is completed, stop the top command. -Now add all %sys numbers and divide by the number of output prints, this will give the average %cpu utilization over the file transfer length.

This total Avg %sys would be ~ < 25%. Use this as the % CPU utilization i.e. (25%* ARM frequency)


== B. Using OPROFILE: ==

Here is the the wiki link to get oprofile running:

[[1]]

Personal tools
Namespaces
Variants
Views
  • Read
  • Edit
  • View history
Actions
Navigation
Toolbox
  • What links here
  • Related changes
  • Special pages
  • Printable version