Integers#

How they are stored inside of your computers and how to manipulate them with care

Notebook Setup#

from ipywidgets import interact

Introduction#

Integers are the bread and butter of every computer program. In fact, for a long time, there was no floating points in computers and they were the only available number representation (check floating point notebook for more detail). One big advantage is that they are reliable, easy to store and to understand as we will see together. Some languages even gives access to their binary representation to allow some advanced operation by the programmer.


I - How it works#

You might already know that computers stores values with 1 and 0, but do you really know how it is handled ?

A - Memory storage#

Computers get their data from Hard drives](https://www.explainthatstuff.com/harddrive.html) or SSD when they boot up. With both these [storage methods, data is layed out and read with only binary, 0 and 1 values. If you put 8 bits together it is called a byte. If you continue to group bytes together, you access common measurements for memory. 1 000 bytes are 1 Kilo Byte (KB), and a Mega Byte (MB) is 1 000 KBs, this continue onward.

Note that there is a parrallel measurement unit which has a different basis : 1 024 bytes are 1 Kibi Byte (Kio). It is the binary basis opposed to the decimal one.

Note

A bit is a value that can be in only 2 states, true or false, 0 or 1.
Is is a shorthand name for binary digits

It is no surprise that memory works in binary as the base unit of operation in computers are transistors](https://electronics.howstuffworks.com/transistor.htm). It is a physical electronic component that can be in 2 states, open or closed, the electricity goes through, or is blocked. Transistors are used to create [logic gates which allow to make simple operations on binary values. With enough of those gates, you can make mathematical operation on your binary digits. Any operation made on the computer will be deduced as a mathematical operation made on these logic circuit.

Note

A logic circuit is a system composed of logic gates.
With the basic operations of logic gates, once put in serie, you can make any mathematical operation.

B - Binary to decimal#

Everything in computer are in binary (base 2), but we human, works with the decimal basis. Fortunately is is easy to get the decimal value (base 10) from a binary number.

For example, the number 1101

1

-

1

-

0

-

1

(base 2)

\( 1 \times 2^3\)

+

\( 1 \times 2^2 \)

+

\( 0 \times 2^1 \)

+

\( 1 \times 2^0 \)

(base 10)

\( 1 \times 8 \)

+

\( 1 \times 4 \)

+

\( 0 \times 2 \)

+

\( 1 \times 1 \)

(base 10)

\( 8 \)

+

\( 4 \)

+

\( 0 \)

+

\( 1 \)

(base 10)

\[ = 13 \]

As you see, each digits is multiplied by 2 with a power depending its position. Then the result is summed. The general formula for this process is :

\[ x = \sum_{i=0}^{m} a_i 2^{m-i} \]

And there is the implementation of the formula :

@interact(binary = '1101')
def binaryToDecimal(binary: str):
    array = [int(x) for x in str(binary)]
    x = 0
    for i, a in enumerate(array):
        x += a * pow(2, (len(array) - 1) - i)
    print("Decimal representation of " + binary + " is : " + str(x))

Operation made in binary basis are still valid when we represent them in decimal. So every operation in computers are made in binary format and then translated for our eyes. For exemple let’s see the binary addition between 1011 and 0010, respectively 11 and 2 in decimal :

\[\begin{split} 1011 \\ + 0010 \\ = 1101 \end{split}\]

As you can see, we obtain 13.

C - Decimal to binary#

To obtain the binary representation of a decimal number, you have to divide the number by 2 and keep the remainder. Put together, the remainders are be the binary representation.

@interact(decimal = (0, 26))
def decimalToBinary(decimal : int) -> str:
    binary = ''
    while decimal > 0:
        binary += str(decimal % 2)
        decimal = decimal // 2
    return binary[::-1] # Reverse string

II - Limitations#

Even though they are great, integers have some caveats by design

A - Size limits#

Computer memory has a limit, and so must be the variable size. Many languages are using different types of integers variables with different max sizes. If you store the number 2 with a unsigned int in C++ (32 bits), it will take the same size in memory that the number \(2^{32} - 1\), so choosing the right variable type depending on your usage is important.

In C++, there is a difference between signed and unsigned integers. The unsigned cannot be negative, but they can use a greater range of values.

For really big number, you often need an additionnal library to handle those depending on the language.

B - Use cases#

✔️

The advantages of integers is that they are always stable, their values remain mathematically correct after any set of operations. This is why we use them in databases to store crucial information like money or geographical location. We could think that floating points would be more suited for this job but it is false (check notebook of floating points for more details).

Yet, there are jobs that they can’t do. Try to tell a mathematician or a computer graphics engineer that they can’t use floats to see how they react. The fact is that for the large amount of operation in 3D graphics, or any kind of linear algebra, you need to keep track of the fractionnal part of each operation. Something that integers simply can’t do (but workaround exists, check Game Engine Black book for more details on old days tricks).


III - Binary operations usages in programming#

Because it is easy to understand, some langages offer operation on their binary representation. Let’s see why

A - Binary shifting#

You can decal the binary representation of an integer to the left. For exemple :

a = 1 # 0000 0001
print(a << 3) # 0000 1000
8

It is a uber-fast way to make any \(2^x\)

@interact(x = 5)
def pow2(x: int) -> int:
    return 1 << x

B - Bitmask#

When you want to provide a function with multiple options in the parameter, one way is to use flags to create a bitmask. With it, users can set any kind of flags they want, it is cheap and uberfast (way more than using an array of enums).

To do so, you create constants with different bit values. The bits must be at different places so for an int you have 8 possible flags (64 for a long int in C++, or almost unlimited with a bitset). Then you create your mask by adding them with the OR operator (|) and you can check flags sets in the mask with the AND operator (&).

In the function, you simply have to check for the bits set to 1 in the mask to activate asked options.

flag0 = 1 << 0 # 0000 0001 
flag1 = 1 << 1 # 0000 0010 
flag2 = 1 << 2 # 0000 0100

myMask = flag0 | flag2
print(decimalToBinary(myMask)) # 0000 0101

# Check if the flag is set
print(myMask & flag0) # True
print(myMask & flag1) # False
101
1
0

Conclusion#

Integers are really useful and are used everywhere. Their easy-to use binary representation allows programmer to optimize bit and bits of their app, and their stability make them great for storing data with confidence. However, they lack by nature the precision offered by floating points which make them unsuited for many use cases.


📖 Sources#

Books#

Name

Author

Date

Game Engine Black Book

Fabien Sanglard

2018