-
Notifications
You must be signed in to change notification settings - Fork 428
Closed
Description
This doesn't look correct. The exponent + sign for 16 bit floats is 6 bits and for 32 bit floats is 9 bits. If you simply shift the bits then parts of the mantissa for the half precision will be in the exponent for full precision (and the inverse).
impl Into<f32> for BFloat16 {
fn into(self) -> f32 {
unsafe {
// Assumes that the architecture uses IEEE-754 natively for floats
// and twos-complement for integers.
mem::transmute::<u32, f32>((self.0 as u32) << 16)
}
}
}
impl From<f32> for BFloat16 {
fn from(value: f32) -> Self {
unsafe {
// Assumes that the architecture uses IEEE-754 natively for floats
// and twos-complement for integers.
BFloat16((mem::transmute::<f32, u32>(value) >> 16) as u16)
}
}
}
Metadata
Metadata
Assignees
Labels
No labels