Monday, July 10, 2017

Conversion of a number from Single precision floating point representation to a Half precision floating point



Hope it is relevant here.


I have a code where I have to work on Half precision floating point representation numbers. To achieve that I have created my own C++ class fp16 with all operators(arithmetic logical, relational) related to this type overloaded with my custom functions, while using a Single precision floating point number with a Half precision floating point number.


Half precision floating point = 1 Sign bit , 5 exponent bits , 10 significand bits = 16 bit


Single precision floating point = 1 Sign bit, 8 exponent bits, 23 significand bits = 32 bits


So what I do to convert from a Single precision floating point number to a Half precision floating point number:-


For significand bits - I use truncation i.e. loose 13 bits from the 32 bits to get 10 bits significand for half precision float.


What should I do to handle the exponent bits. How do I go from 8 exponent bits to 5 exponent bits?


Any good reading material would help.




No comments:

Post a Comment

Simple past, Present perfect Past perfect

Can you tell me which form of the following sentences is the correct one please? Imagine two friends discussing the gym... I was in a good s...