: ) wonderful world ( :

the metasyntactic variable

Archive for February 2008

floating point ST15

without comments

Let fl(x) denote the floating point number which is determined by a machine from x. Other than that, e0 is the smallest floating point number with radix a, t long significand, and k- as the smallest exponent: e0=ak--1. Let (1+e1) be the floating point number (which numbers are rational numbers) after 1, e1 equals to a1-t. The mathematical meaning of a floating point number, or the number it represents is like:

     t
   ----- m
 k \      i
a   )    --
   /      i
   ----- a
   i = 1

Theorem.

|fl(x)-x| is less then or equal to e1|x| if |x|>e0

Proof.

It is enough to see the proof for a x>0, the x<0 case is similar.
Note that x can be written in a form

   /                         \
   |   t        infinity     |
   | ----- m    --------- m  |
 k | \      i   \          i |
a  |  )    -- +  )        -- |
   | /      i   /          i |
   | ----- a    --------- a  |
   | i = 1      i = t + 1    |
   \                         /

After truncation the second sum’ll disappear so that is the distance of fl(x) and x. Now let’s see e1x:

   infinity
   -------- m
 k \         i - t + 1
a   )       ----------
   /         i
   -------- a
    i = t

Because of normalization, m1>0.

QuED..

Written by grault

February 3, 2008 - 12:01 pm at February 3, 2008 - 12:01 pm

Posted in math