TensorFloat-32

TensorFloat-32 or TF32 is a numeric floating point format designed for Tensor Core running on certain Nvidia GPUs.

Format[edit]

The binary format is:

1 sign bit
8 exponent bits
10 fraction bits (also called mantissa, or precision bits)

The total 19 bits fits within a double word (32 bits), and while it lacks precision compared with a normal 32 bit IEEE 754 floating point number, provides much faster computation, up to 8 times on a V100.^[1]

References[edit]

^ https://deeprec.readthedocs.io/en/latest/NVIDIA-TF32.html accessed 23 May 2024

External links[edit]

This computer-engineering-related article is a stub. You can help Wikipedia by expanding it.

[1] ttps://deeprec.readthedocs.io/en/latest/NVIDIA-TF32.html accessed 23 May 2024

[1]

Format[edit]

See also[edit]

References[edit]

External links[edit]