IEEE Standard 754 for Floating Point Numbers
This article specifies how single precision (32 bit) and double precision (64 bit) floating point numbers are to be represented as per IEEE Standard 754.
Floating point Representation:
Floating-point representation represents real numbers in scientific notation. Scientific notation represents numbers as a base number and an exponent. For example, in decimal, 123.456 could be represented as 1.23456 × 102.
In binary, the number 1100.111 might be represented as 1.10111 × 23. Here, the value part i.e. 1.10111 is referred to as “mantissa” and the power part, i.e. 3 is called “exponent”. 2 here is referred as base of the exponent.
From the storage layout point of view, floating point numbers have three components: the sign, the exponent, and the mantissa.
IEEE floating point numbers come in two sizes, 32-bit single precision and 64-bit double precision numbers. The layouts for the parts of a floating point number are:
Single-Precision |
Sign |
Exponent |
Fraction |
Bit Positions |
31 |
30-23 |
22-00 |
Number of bits |
1 |
8 |
23 |
Bias |
127 | ||
Double-Precision |
Sign |
Exponent |
Fraction |
Bit Positions |
63 |
62-52 |
51-00 |
Number of bits |
1 |
11 |
52 |
Bias |
1023 |
The Sign
A zero in the sign bit indicates that the number is positive; a one indicates a negative number.
The Exponent
The exponent base 2 is implicit and is not stored. In order to keep things simple, the exponent is not stored as a signed number. To accomplish the same, a bias is added to the actual exponent in order to get the stored exponent. A single-precision number uses eight bits for the exponent, so it should be capable of storing exponents ranging from -127 through +127. But the value actually stored is the exponent plus the bias. Thus, the bias for single precision numbers is 127
. Similarly, the bias for double precision numbers is 1023
. This means that the value stored will range from zero to 255 for a single, and zero to 2047 for a double.
For example, in case of single precision float, a stored exponent of 150 means that the actual exponent is 23.
It also needs to be noted that exponents of having all the bits as 0 or having all the bits as 1 are used for special numbers.
The Mantissa
The mantissa represents the precision bits of the number. It has an implicit leading bit (1) and the fraction bits.
To find out the value of the implicit leading bit, consider that a binary number can be expressed in scientific notation in different ways like 1.0011 x 23
or 100.11 x 21
. Now, the mantissa is normalized so that the most significant digit is just to the left of the decimal point. Since in case of binary, the only possible non-zero digit is 1, the leading digit of 1 can be ignored, and does not need to be represented explicitly. As a result, the mantissa for example in case of single precision float has effectively 24 bits of resolution, by way of 23 fraction bits.
Thus,
The sign bit is 0 for positive, 1 for negative.
The exponent's base is two.
The exponent field contains bias plus the true exponent
The mantissa would always looks like 1.f, where f is the field of fraction bits.
Now, let’s see what goes into 32 bits of a single precision float when we assign 2865412.25 to it:
First, get the sign: |
Since the sign of the number is positive, a 0 goes into the top bit. |
Second, convert the number to binary: |
2865412.25 is |
Third, normalize the number: |
We can normalize the binary to |
Fourth, store the exponent: |
Adding in the bias, the exponent will be stored as |
Fifth, store the mantissa |
The mantissa is handled by dropping the most significant bit, leaving us with 0101 1101 1100 1000 0010 001 or |
Thus, the result is stored in 32-bits as :
Sign |
Exponent |
Mantissa |
0 |
10010100 |
|
i.e 4A2EE411 in hex.
You can also visit the following link which provides visual tool to covert IEEE-754 hexadecimal representations to decimal Floating-Point Numbers and vice versa and gives awesome details of all the components:
https://babbage.cs.qc.edu/courses/cs341/IEEE-754.html
Check my next blog which would contain details regarding range of values for IEEE-754 floating point numbers, Denormalized forms, NAN and some simple algorithms for conversion between IEEE-754 hexadecimal representations and decimal Floating-Point Numbers.
Comments
Anonymous
February 26, 2006
As I mentioned in my previous blog, we shall now discuss details regarding range of values for IEEE-754...Anonymous
April 24, 2006
I found these blog entries from Prem to be quite exhaustive. So posting the links here just in case you...Anonymous
March 02, 2007
I NEED CONVERSION FORMULA TO CONVERT IEEE754 REAL NUMBER TO 32 BITAnonymous
February 07, 2008
Respected sir, I want vc++ source code for the conversion of any real value to IEEE 754 format .If you have any source code kindly send it to me. Regards, vinaik353@gmail.comAnonymous
March 30, 2008
PingBack from http://collegefunfactsblog.info/gaurav-seths-weblog/