Floating point number in JavaScript

JavaScript’s Number type, which is essentially IEEE 754 basic 64-bit binary floating-point, has 53-bit significands. 52 bits are encoded in the “trailing significand” field. The leading bit is encoded via the exponent field (an exponent field of 1-2046 means the leading bit is one, an exponent field of 0 means the leading bit is zero, and an exponent field of 2047 is used for infinity or NaN).

matthias

15 Nov. 2023

JavaScript TypeScript Number IEEE 754 Floating-point arithmetic

Floating-point numbers are a method of representing real numbers in computing, particularly in programming languages like JavaScript. They are designed to handle a wide range of values, both very large and very small, with a fixed amount of memory:

Floating-Point Representation: A floating-point number is typically represented as two components: a significand (also known as the mantissa) and an exponent. The number is essentially expressed in scientific notation. For example, the number 123.45 can be represented as 1.2345 * 10^2.
Precision and Range: The precision of a floating-point number refers to the number of significant digits it can represent. The range refers to the span of values it can cover, from very small to very large. These two properties are interrelated: a wider range often means lower precision.
Precision Loss: Due to the finite number of bits available for representation, not all real numbers can be accurately stored as floating-point numbers. This can lead to precision loss, where certain arithmetic operations may produce results that are not exact representations of the mathematical operations performed.
IEEE 754 Standard: Most modern computers use the IEEE 754 standard for representing floating-point numbers. This standard defines several formats for representing floating-point numbers, such as single precision (32 bits) and double precision (64 bits). It also defines rules for arithmetic operations on these numbers.
Special Values: Floating-point formats often include special values like positive and negative infinity, as well as NaN (Not-a-Number), which represents undefined or unrepresentable results, such as the square root of a negative number.

In JavaScript and many other programming languages, floating-point numbers are the default way of representing and performing arithmetic operations on real numbers.

However, this representation have also some limitations, especially when performing operations that require high precision or when comparing floating-point values for equality:

Adding Large and Small Numbers

let largeNumber = 1e17; // 100000000000000000
let smallNumber = 0.1;
let sum = largeNumber + smallNumber;
console.log(sum); // still 1e17

In this example, the small number is too small to affect the precision of the large number significantly. However, when added, the result isn't exactly what you might expect due to the limited precision of floating-point representation.

Subtracting Nearly Equal Numbers

let num1 = 0.3;            // 0.3
let num2 = 0.1 + 0.1 + 0.1; // 0.30000000000000004
let difference = num1 - num2;
console.log(difference);    // -4.44089209850063e-17

Here, even though num1 and num2 seem almost equal, the subtraction reveals a small error due to the way floating-point numbers are stored.

Multiplying Large and Small Numbers

let largeNum = 1e16;       // 10000000000000000
let smallNum = 0.000001;   // 0.000001
let product = largeNum * smallNum;
console.log(product);      // 10000

In this case, the multiplication of a large number with a very small number results in a rounded-off value due to the limited precision.

Associative Property

let a = 0.1;
let b = 0.2;
let c = 0.3;

let result1 = (a + b) + c;
let result2 = a + (b + c);

console.log(result1);  // 0.6000000000000001
console.log(result2);  // 0.6

Even though mathematically the associative property should hold true, in floating-point arithmetic, the order of operations can sometimes lead to slightly different results.

Comparing Floating-Point Numbers

let num1 = 0.1 + 0.2;   // 0.30000000000000004
let num2 = 0.3;         // 0.3
console.log(num1 === num2);  // false
console.log(num1.toFixed(1) === num2.toFixed(1));  // true

Comparing floating-point numbers directly using ``===`` can lead to unexpected results due to the precision issues. Rounding the numbers before comparison can provide more accurate results.

Let's delve into the details of why numbers like 0.1 and 0.2 cannot be exactly represented in binary floating-point format due to their decimal expansions.

Binary Floating-Point Representation

In computers, floating-point numbers are stored using a binary representation, which means they are represented as a sum of powers of 2 rather than powers of 10 as in our decimal number system. This binary representation is based on the IEEE 754 standard, which uses a combination of a significand (also known as mantissa) and an exponent.

Decimal to Binary Conversion

When we write decimal numbers like 0.1 and 0.2, they have non-terminating, recurring binary expansions. This means that when you try to convert these decimal fractions to binary fractions, the conversion process never truly ends. The recurring part continues indefinitely.

Rounding Error

Since computers have finite memory (a fixed number of bits) to represent floating-point numbers, they have to round these infinite binary expansions to a finite number of bits. This rounding introduces a small error known as rounding error or precision error.

Example

Let's focus on the number 0.1:

In decimal, 0.1 is written as 0.1.
In binary, 0.1 is written as 0.000110011001100... (recurring).

However, due to the limited number of bits available in the IEEE 754 double-precision format, the actual stored binary representation is an approximation of this recurring binary expansion. This approximation is very close, but not exact.

When you perform arithmetic operations on these approximated binary representations, such as addition, the precision error accumulates and can lead to small discrepancies in the result.