Calculation Process:
1. Identify the bytes per param, based on the bit precision: float (4
bytes), half/BF16 (2 bytes), int8
(1 byte), or int4 (0.5 bytes).
2. Calculate storage size: Multiply the number of parameters by the
bytes per param.
3. Training typically needs 3-4x the memory needed for
inference.