Computer
Engineering / Computer Science 126 Supplemental Notes Doug Sapp[Chapter 1] [Chapter 2] [Chapter 3] [Chapter 4] Floating Point A) Introduction Computers are integer machines. In order to hold a number other than an integer we need to develop a representation. The Institute for Electrical and Electronic Engineers, IEEE, has defined a widely used standard for floating point notation which we will also use. As you will soon see, performing arithmetic operations with floating point is computationally intensive. Performing operations with software would be a great burden on the CPU. In order to alleviate the burden the operations are performed with hardware called floating point units, FPUs or math coprocessors.
B) Floating point numbers resemble scientific notation
C) Normalized numbers Just as in scientific notation floating point mantissas must be normalized. Normalized form means that there is only 1 digit in the ones place and the rest remain to the right of the decimal point. To normalize a number we can adjust the exponent. To normalize -123.45678x101 all we have to do is shift the decimal two places to the left. We can do this by increasing the exponent by 1 each time we move the decimal one placemarker to the left. The same thing applies to the denormalized number -0.0012345678x106, except this time we will be moving the decimal point to the right and we decrease the exponent each time. D) Floating Point Notation We will focus on the IEEE 754 (32 bit single precision) floating point standard. After understanding this notation you can adapt it to all other floating point notations. Floating point numbers are nothing more than binary numbers in a certain predefined form. The form consists of three main parts:
BYTE 1
| BYTE2 |
BYTE3 |
BYTE4 The exponent is biased by -127. This enables it to represent very small numbers along with very large numbers. When deriving the exponent you need to subtract 127 to get the actual exponent. If your exponent is 10000000b, subtract 127 to get the actual exponent of 1. If your exponent is 01111100b, subtract 127 to get the actual exponent of -3. There are some special cases which we will learn about later that limits the exponent to a range of -126 to 127. Since we are dealing in binary the base is 2 unlike scientific notation which has base 10. The mantissa is 23 bits long with a hidden bit at the beginning. Remember what we said about normalizing numbers? The mantissa must have a 1 at the beginning. If it requires a 1 every time why not assume it is always there and free up an extra bit for the mantissa? By doing this we can extend the mantissa to 24 bits - the assumed 1 + 23 bits. There are some special cases where the hidden bit is a 0 which we will cover later. E) Conversions Example: #E0781CF8h BYTE 1
| BYTE2 |
BYTE3 |
BYTE4 In integer form (0 - s) * m x 2e where:
Example: 1234.5678d
BYTE 1
| BYTE2 |
BYTE3 |
BYTE4 In hex: #449A522Bh F) Special Cases Zeros
Infinities
NaNs - (Not a Number) ie: 0/0
Denormals
As you can see, this is why the exponent can only go from -126 to 127 for normalized numbers. Exponent of 0 is reserved for denormals and zero. Exponent of 128 is reserved for infinity and NaNs. G) Other Notations
H) Representation Problems Arithmetic Overflow
Arithmetic Underflow
Cancellation Error
I) Addition in Floating Point
Example: #7271A05Fh + #702B847Ch (4.785905E+30 + 2.123284E+29) 1) #702B847Ch is smaller so shift mantissa right until exponent matches larger number (larger exponent is 101d)
2) Perform 24 bit addition on the two mantissas.
3) No carry so exponent stays the same. Combine the mantissa back together to form the new floating point answer. #727C58A6h = 4.998233E+30 J) Multiplication in Floating Point
Example: #42421010h * #44003311h (48.51569 * 512.7979)
#46C25D7Ch = 24878.7421876 This document is (c) 1998 Doug Sapp. |