Compensated (Kahan) summation and compilers  
Author Message
braams@mathcs.emory.edu





PostPosted: 2008-6-15 8:04:36 Top

fortran, Compensated (Kahan) summation and compilers Hello All,

I am interested to employ compensated summation (Kahan summation) in a
few subsections of my Fortran-95 code. The code is used on a number
of different platforms and is shared with colleagues who compile it at
their own institutions, now and in the future, also on new machines
and using new compiler releases. I can't be looking over their
shoulders all the time.

You can guess where this is going. I need to write the Fortran in a
manner that is robust against possible compiler treatments, and I
wonder if anyone can offer experience with this. I also need high
efficiency; if the Kahan summation would cost more than a small factor
then I can't afford it, and then I must live with standard double
precision. For that reason I think I must use in-line code. I like
to assume that the compiler conforms to the Fortran standard, so that
if explicit parentheses are used in the code then those won't be
broken. However, the compiler may certainly optimize temporary
variables into registers, and I don't believe that the Fortran
standard then prohibits the register arithmetic from being, say, 80-
bit accurate, with variables truncated to 64 bits when stored.

The setting where I need the Kahan summation is in an accumulation.
There is no parallellism involved. In the following all real
variables are double precision real.

s = 0
do while (..)
x = ... ! computed here
s = s+x
enddo

I've been testing various implementations of Kahan summation for the
special case of an inner product, basically a Kahan version of the
dot_product intrinsic. For anyone who wants to do experiments I offer
my small, uncommented code here below. Functions xdot, ydot, and zdot
all implement Kahan summation, function udot uses standard summation,
and the test code also simply calls the dot_product intrinsic.

Using either Sun Workshop Fortran or Intel Fortran, functions xdot,
ydot and zdot all behave correctly, returning an accurate inner
product, when optimization is turned off at the compilation stage.
With Sun Fortran all three still give the expected good result when
compilation is done with -O5, but when the -fast option is used only
zdot behaves correctly. Here is the output for that case.

> f90 -fast xdot-experiments.f90 ; ./a.out
f90: Warning: -xarch=native has been explicitly specified, or
implicitly specified by a macro option, -xarch=native on this
architecture implies -xarch=sparcvis2 which generates code that does
not run on pre UltraSPARC III processors
dot_product: 512.2501221299369 -81.9599609375
xdot: 512.2501221299312 -131.935546875
ydot: 512.2501221299369 -81.9599609375
zdot: 512.2501221299462 0.0E+0
udot: 512.2501221299369 -81.9599609375
exact: 512.2501221299462
also exact: 512.2501221299462
final term: 1.0664280010966473E-17

This gave me some hope that with zdot I had found an implementation
that is robust against compiler treatments, but it didn't survive the
Ifort test. With Ifort and option -O0, again all of xdot, ydot and
zdot behave correctly in this test, but all fail when -O1 is used.
Here is the output for the case of option -O0.

$ ifort -O0 xdot-experiments.f90 ; ./a.out
dot_product: 512.250122129931 -130.936035156250
xdot: 512.250122129946 0.000000000000000E+000
ydot: 512.250122129946 0.000000000000000E+000
zdot: 512.250122129946 0.000000000000000E+000
udot: 512.250122129931 -130.936035156250
exact: 512.250122129946
also exact: 512.250122129946
final term: 1.066428001096647E-017

I'll appreciate advice about implementating Kahan summation in
Fortran. The code must be correct, I value efficiency in execution,
and I value it if I can use in-line code and then not have to worry
with every new release of a compiler or every transfer to a new
platform that the code will no longer behave correctly.

Here is the little test program. It evaluates the inner product
between the identical vectors x and y of which the general element is
x(i) = t0**i (0.le.i.lt.n); in the initialization phase the elements
are computed by a nonstandard recursion as will be seen. The
summation is in the natural order, 0.le.i.lt.n, so in order of
decreasing value of the addends. This way it is a good test of the
benefit of Kahan summation.

program main
implicit none
integer, parameter :: wp=selected_real_kind(12,300)
integer, parameter :: n=20000
integer :: i
real (kind=wp) :: x(0:n-1), y(0:n-1), t0, t1
t0 = 1023/1024.0_wp
t1 = (1-t0**(2*n))/(1-t0**2)
x(0) = 1 ; x(1) = t0
do i = 2, n-1
x(i) = x(i/2)*x((i+1)/2)
enddo
y = x
write (*,*) 'dot_product:', dot_product(x,y), &
(dot_product(x,y)-t1)/(epsilon(t1)*t1)
write (*,*) 'xdot: ', xdot(x,y), &
(xdot(x,y)-t1)/(epsilon(t1)*t1)
write (*,*) 'ydot: ', ydot(x,y), &
(ydot(x,y)-t1)/(epsilon(t1)*t1)
write (*,*) 'zdot: ', zdot(x,y), &
(zdot(x,y)-t1)/(epsilon(t1)*t1)
write (*,*) 'udot: ', udot(x,y), &
(udot(x,y)-t1)/(epsilon(t1)*t1)
write (*,*) 'exact: ', t1
write (*,*) 'also exact: ', (1-x(n/2)**2*x((n+1)/2)**2)/(1-x(1)**2)
write (*,*) 'final term: ', x(n-1)*y(n-1)
stop
contains
function xdot (x, y)
real (kind=wp), intent (in) :: x(0:), y(0:)
real (kind=wp) :: xdot
integer :: i
real (kind=wp) :: s, e, t0, t1
s = 0 ; e = 0
do i = 0, size(x)-1
t0 = x(i)*y(i)+e ; t1 = s+t0
e = t0-(t1-s)
s = t1
enddo
xdot = s+e
return
end function xdot
function ydot (x, y)
real (kind=wp), intent (in) :: x(0:), y(0:)
real (kind=wp) :: ydot
integer :: i
real (kind=wp) :: s, e, t0
s = 0 ; e = 0
do i = 0, n-1
t0 = x(i)*y(i)+e
e = t0-((s+t0)-s)
s = s+t0
enddo
ydot = s+e
return
end function ydot
function zdot (x, y)
real (kind=wp), intent (in) :: x(0:), y(0:)
real (kind=wp) :: zdot
integer :: i
real (kind=wp) :: s, e, t0, t1
s = 0 ; e = 0
do i = 0, size(x)-1
t0 = x(i)*y(i)+e ; t1 = s ; s = s+t0
e = t0-(s-t1)
enddo
zdot = s+e
return
end function zdot
function udot (x, y)
real (kind=wp), intent (in) :: x(0:), y(0:)
real (kind=wp) :: udot
integer :: i
real (kind=wp) :: s
s = 0
do i = 0, size(x)-1
s = s+x(i)*y(i)
enddo
udot = s
return
end function udot
end

Bas Braams


 
Tim Prince





PostPosted: 2008-6-14 21:43:00 Top

fortran >> Compensated (Kahan) summation and compilers braams@mathcs.emory.edu wrote:
> Hello All,
>
> I am interested to employ compensated summation (Kahan summation) in a
> few subsections of my Fortran-95 code. The code is used on a number
> of different platforms and is shared with colleagues who compile it at
> their own institutions, now and in the future, also on new machines
> and using new compiler releases. I can't be looking over their
> shoulders all the time.
>
> You can guess where this is going. I need to write the Fortran in a
> manner that is robust against possible compiler treatments, and I
> wonder if anyone can offer experience with this. I also need high
> efficiency; if the Kahan summation would cost more than a small factor
> then I can't afford it, and then I must live with standard double
> precision. For that reason I think I must use in-line code. I like
> to assume that the compiler conforms to the Fortran standard, so that
> if explicit parentheses are used in the code then those won't be
> broken. However, the compiler may certainly optimize temporary
> variables into registers, and I don't believe that the Fortran
> standard then prohibits the register arithmetic from being, say, 80-
> bit accurate, with variables truncated to 64 bits when stored.
>
> The setting where I need the Kahan summation is in an accumulation.
> There is no parallellism involved. In the following all real
> variables are double precision real.
>
> s = 0
> do while (..)
> x = ... ! computed here
> s = s+x
> enddo
>
> I've been testing various implementations of Kahan summation for the
> special case of an inner product, basically a Kahan version of the
> dot_product intrinsic. For anyone who wants to do experiments I offer
> my small, uncommented code here below. Functions xdot, ydot, and zdot
> all implement Kahan summation, function udot uses standard summation,
> and the test code also simply calls the dot_product intrinsic.
>
> Using either Sun Workshop Fortran or Intel Fortran, functions xdot,
> ydot and zdot all behave correctly, returning an accurate inner
> product, when optimization is turned off at the compilation stage.
> With Sun Fortran all three still give the expected good result when
> compilation is done with -O5, but when the -fast option is used only
> zdot behaves correctly. Here is the output for that case.
>
>> f90 -fast xdot-experiments.f90 ; ./a.out
> f90: Warning: -xarch=native has been explicitly specified, or
> implicitly specified by a macro option, -xarch=native on this
> architecture implies -xarch=sparcvis2 which generates code that does
> not run on pre UltraSPARC III processors
> dot_product: 512.2501221299369 -81.9599609375
> xdot: 512.2501221299312 -131.935546875
> ydot: 512.2501221299369 -81.9599609375
> zdot: 512.2501221299462 0.0E+0
> udot: 512.2501221299369 -81.9599609375
> exact: 512.2501221299462
> also exact: 512.2501221299462
> final term: 1.0664280010966473E-17
>
> This gave me some hope that with zdot I had found an implementation
> that is robust against compiler treatments, but it didn't survive the
> Ifort test. With Ifort and option -O0, again all of xdot, ydot and
> zdot behave correctly in this test, but all fail when -O1 is used.
> Here is the output for the case of option -O0.
>
> $ ifort -O0 xdot-experiments.f90 ; ./a.out
> dot_product: 512.250122129931 -130.936035156250
> xdot: 512.250122129946 0.000000000000000E+000
> ydot: 512.250122129946 0.000000000000000E+000
> zdot: 512.250122129946 0.000000000000000E+000
> udot: 512.250122129931 -130.936035156250
> exact: 512.250122129946
> also exact: 512.250122129946
> final term: 1.066428001096647E-017
>
> I'll appreciate advice about implementating Kahan summation in
> Fortran. The code must be correct, I value efficiency in execution,
> and I value it if I can use in-line code and then not have to worry
> with every new release of a compiler or every transfer to a new
> platform that the code will no longer behave correctly.
>
> Here is the little test program. It evaluates the inner product
> between the identical vectors x and y of which the general element is
> x(i) = t0**i (0.le.i.lt.n); in the initialization phase the elements
> are computed by a nonstandard recursion as will be seen. The
> summation is in the natural order, 0.le.i.lt.n, so in order of
> decreasing value of the addends. This way it is a good test of the
> benefit of Kahan summation.
>
> program main
> implicit none
> integer, parameter :: wp=selected_real_kind(12,300)
> integer, parameter :: n=20000
> integer :: i
> real (kind=wp) :: x(0:n-1), y(0:n-1), t0, t1
> t0 = 1023/1024.0_wp
> t1 = (1-t0**(2*n))/(1-t0**2)
> x(0) = 1 ; x(1) = t0
> do i = 2, n-1
> x(i) = x(i/2)*x((i+1)/2)
> enddo
> y = x
> write (*,*) 'dot_product:', dot_product(x,y), &
> (dot_product(x,y)-t1)/(epsilon(t1)*t1)
> write (*,*) 'xdot: ', xdot(x,y), &
> (xdot(x,y)-t1)/(epsilon(t1)*t1)
> write (*,*) 'ydot: ', ydot(x,y), &
> (ydot(x,y)-t1)/(epsilon(t1)*t1)
> write (*,*) 'zdot: ', zdot(x,y), &
> (zdot(x,y)-t1)/(epsilon(t1)*t1)
> write (*,*) 'udot: ', udot(x,y), &
> (udot(x,y)-t1)/(epsilon(t1)*t1)
> write (*,*) 'exact: ', t1
> write (*,*) 'also exact: ', (1-x(n/2)**2*x((n+1)/2)**2)/(1-x(1)**2)
> write (*,*) 'final term: ', x(n-1)*y(n-1)
> stop
> contains
> function xdot (x, y)
> real (kind=wp), intent (in) :: x(0:), y(0:)
> real (kind=wp) :: xdot
> integer :: i
> real (kind=wp) :: s, e, t0, t1
> s = 0 ; e = 0
> do i = 0, size(x)-1
> t0 = x(i)*y(i)+e ; t1 = s+t0
> e = t0-(t1-s)
> s = t1
> enddo
> xdot = s+e
> return
> end function xdot
> function ydot (x, y)
> real (kind=wp), intent (in) :: x(0:), y(0:)
> real (kind=wp) :: ydot
> integer :: i
> real (kind=wp) :: s, e, t0
> s = 0 ; e = 0
> do i = 0, n-1
> t0 = x(i)*y(i)+e
> e = t0-((s+t0)-s)
> s = s+t0
> enddo
> ydot = s+e
> return
> end function ydot
> function zdot (x, y)
> real (kind=wp), intent (in) :: x(0:), y(0:)
> real (kind=wp) :: zdot
> integer :: i
> real (kind=wp) :: s, e, t0, t1
> s = 0 ; e = 0
> do i = 0, size(x)-1
> t0 = x(i)*y(i)+e ; t1 = s ; s = s+t0
> e = t0-(s-t1)
> enddo
> zdot = s+e
> return
> end function zdot
> function udot (x, y)
> real (kind=wp), intent (in) :: x(0:), y(0:)
> real (kind=wp) :: udot
> integer :: i
> real (kind=wp) :: s
> s = 0
> do i = 0, size(x)-1
> s = s+x(i)*y(i)
> enddo
> udot = s
> return
> end function udot
> end
>
> Bas Braams
>
>
Kahan summation will certainly cost "more than a small factor" compared
with "vectorized" dot product reduction, i.e. a factor of 3 or more.
In order to get consistent results among a wide variety of compilers with
Kahan summation of double precision, without removing optimization, I
declare the critical intermediate values as elements of a COMMON block.
 
robert.corbett





PostPosted: 2008-6-15 13:00:00 Top

fortran >> Compensated (Kahan) summation and compilers On Jun 14, 5:04 pm, "bra...@mathcs.emory.edu"
<bra...@mathcs.emory.edu> wrote:

> With Sun Fortran all three still give the expected good result when
> compilation is done with -O5, but when the -fast option is used only
> zdot behaves correctly.

The option -fast implies two other options, -fns and -fsimple=2,
that can cause problems for codes that depend on the exact
nature of the arithmetic being done. In this case, my guess is
that it is the option -fsimple=2 that is causing the problem.
Try the option combination

-fast -fsimple=1

Order is important. When there are conflicting options, the
last option wins.

Bob Corbett
 
 
Craig Powers





PostPosted: 2008-6-17 4:03:00 Top

fortran >> Compensated (Kahan) summation and compilers braams@mathcs.emory.edu wrote:
> With Sun Fortran all three still give the expected good result when
> compilation is done with -O5, but when the -fast option is used only
> zdot behaves correctly.

As a general rule, compilers tend to offer "extreme" optimization
options that may do things that are unsafe mathematically. I don't
think you can entirely protect against that---I'd tend to think that you
would need to document a restriction against certain classes of
optimizations. (Users who don't know what risk is entailed with extreme
speed optimizations shouldn't be using them at all, and I don't know
that it makes sense to try to protect them from themselves.)
 
 
Jan Vorbr黦gen





PostPosted: 2008-6-17 18:18:00 Top

fortran >> Compensated (Kahan) summation and compilers > As a general rule, compilers tend to offer "extreme" optimization
> options that may do things that are unsafe mathematically. I don't
> think you can entirely protect against that---I'd tend to think that you
> would need to document a restriction against certain classes of
> optimizations. (Users who don't know what risk is entailed with extreme
> speed optimizations shouldn't be using them at all, and I don't know
> that it makes sense to try to protect them from themselves.)

The IBM XLF compiler goes so far to issue a warning when these options
are used, telling you that results might change from what they were
without the use of those options. And yes, strange results will happen...

Jan