
Atomic UP test results.




using test-time-probe2.ko

Clock speed : cpu MHz         : 3000.077

Tracing inactive

[  125.787229] test init
[  125.787303] test results : time per probe
[  125.787306] number of loops : 20000
[  125.787309] total time : 204413
[  125.787312] test end
[  175.660402] test init
[  175.660475] test results : time per probe
[  175.660479] number of loops : 20000
[  175.660482] total time : 203468
[  175.660484] test end
[  179.337362] test init
[  179.337436] test results : time per probe
[  179.337440] number of loops : 20000
[  179.337443] total time : 204757
[  179.337446] test end

Res : 10.21 cycles per loop

Atomic UP, one trace, flight recorder.

[  357.983971] test init
[  357.988837] test results : time per probe
[  357.988843] number of loops : 20000
[  357.988846] total time : 12349013
[  357.988849] test end
[  358.718896] test init
[  358.723049] test results : time per probe
[  358.723053] number of loops : 20000
[  358.723057] total time : 12332497
[  358.723059] test end
[  359.422038] test init
[  359.426173] test results : time per probe
[  359.426179] number of loops : 20000
[  359.426182] total time : 12332535
[  359.426185] test end

Res : 616.90 cycles per loop.
205.63 ns per loop

Atomic SMP, one trace, flight.


[  111.694180] test init
[  111.700191] test results : time per probe
[  111.700198] number of loops : 20000
[  111.700201] total time : 16925670
[  111.700204] test end
[  112.285716] test init
[  112.291321] test results : time per probe
[  112.291326] number of loops : 20000
[  112.291329] total time : 16766633
[  112.291332] test end
[  112.880602] test init
[  112.884739] test results : time per probe
[  112.884743] number of loops : 20000
[  112.884746] total time : 12358237
[  112.884748] test end

Res : 767.51 cycles per loop
255.83 ns per loop

(205.63-255.83)/255.83 * 100% = 19.62 %


Difference between
cmpxchg 2967855/20000 = 148.39 cycles or 49.46 ns
cmpxchg-up 540577/20000 = 27.02 cycles or 9.00 ns
irq save/restore 12636562/20000 = 631.82 cycles 210.60 ns



* Memory ordering

offset
written by local CPU
read by local CPU and other CPUs (reader)

commit count
written by local CPU
read by local CPU and other CPUs (reader)

consumed
written by any CPU
read by any CPU

data
written by local CPU
read by any CPU


test done in the reader :
if ( consumed < offset )
  if ( subbuf.commit_count == multiple of SUBBUFSIZE)
    read data
    inc consumed


We must guarantee the following ordering :
* offset
Seen from the local CPU :
offset must always be incremented before the data is written (already
consistent)

Seen from other cpus :
offset and data can be written out of order
(because offset is always incremented : in an out of order case, offset is lower
than the actual data ready, but the commit_count _has_ to be incremented to read
the data (and is preceded by a store fence)

* commit_count
commit_count must always be seen by other CPUs after the data has been written.
Therefore, we must put a store fence before the commit_count write. (smp_wmb)

* consumed
Rarely updated, use LOCK prefix. Acts as a full memory barrier.



Mathieu Desnoyers, November 2006
