Mark Yosef - Security Researcher and Data Scientist at Valid Network.
Has an MSc in Information and Software Systems and Engineering from Ben-Gurion University. Specializes in Machine Learning and Big Data, and is extremely passionate about Cybersecurity, Blockchain and Algo-trading. Enjoys collecting Pokémon cards and bobble-heads of rappers
Modern computers store data from different types like numbers, text, etc. This article will focus on integer overflow, although other overflows (like buffer overflow) exist. There is an infinite amount of numbers between -∞ and ∞, but our computers have a finite capacity of storage, so there is a limit on how large or small numbers we can store.
In addition, computers don’t understand human languages, so the data is represented in binary, a 2-based number system, which consists of 0’s and 1’s. Sometimes, these values are represented in hexadecimal, which is a 16-based number system, with a ‘0x’ prefix .
In many programming languages, every storage slot has a type that defines what kind of information is stored within it, and what the maximum capacity of the specific slot is. In languages like C and Java, these can be represented by multiple types for numbers like ‘int’, ‘short’, ‘double’, etc. An int type declares that the maximum value length that can be stored in a specific slot is 4 Bytes or 32 bits (each Byte consists of 8 bits).
Thus, we can deduce that in such languages:
This can be also calculated by 2³² - 1, which is all possible combinations of 0’s and 1’s of length 32, minus 1 because computers start counting from 0.
Well, if there is a limit to the size of numbers in computers, what happens when we cross this limit?
In Ethereum, every unsigned int slot in the storage is 32 Bytes or 256 bits. Let’s say you want to perform an arithmetic addition between 2 legitimate but unsigned integers:
What Just Happened?
The result of this arithmetic addition is a number that is greater than the maximum possible integer. It consists of one followed by 256 zeroes (b100…000), which in total has a length of 257 bits. But the slot of the integer value in the storage can only have 256 bits. Therefore, only the 256 RMB (rightmost bits) are stored in the storage, and everything else is ignored.
As a result, the value that is actually stored is 0x00…000 (0), this is an integer overflow.
There are two types of entities on the Ethereum network:
• User (EOA) - Externally Owned Account which is also called a wallet.
• Smart Contract - an entity that has a code and persistent storage.
When a user or a contract wants to interact with a contract on the Ethereum network, it creates a transaction that invokes some functionality of the called contract code and sends it to the network. Once a potential block miner pulls this transaction from the transaction pool, he executes the transaction using the Ethereum Virtual Machine (EVM) .
The miner uses an Ethereum node such as ‘geth’ to interact with the Ethereum network. EVM is a component of the node that is responsible for executing transactions. It starts by getting the contract’s context, the immutable code, and the persistent storage. Then, it executes the called code and stores the changes to the storage. The size of every slot in the EVM data structure (stack/memory/storage) is 32 Bytes or 256 bits.
The code of a contract consists of byte codes, somewhat similar to Java’s JVM and byte codes. These byte codes are complicated for a human to read, so usually they are a product of a compilation process from a high-level programming language like Java, or in our case Solidity.
Solidity  is a high-level programming language that can develop smart contracts. Solidity’s compiler compiles the contracts to EVM byte codes that can be deployed to Ethereum’s network.
In Solidity, you can perform many different operations with numbers. One such case is arithmetic and a problem associated with it is that an integer overflow can occur in such code.
There are two types of integers in Solidity :
In signed int, the LMB (leftmost bit) represents the sign of the number and thus signed int. 0 in the LMB stands for positive numbers, and 1 stands for negative numbers. Therefore, the number of bits in a number value is decreased from 256 to 255.
At a first look, one can see that in the unsigned number circle we can have either addition of 2 numbers that overflows to a smaller value, or subtraction of 2 numbers that underflows to a greater value. However, in the signed circle and due to the sign we can have both overflow and underflow within the same operation.
Let’s look at the addition of 2 signed numbers for example:
In this section, we will present the process and challenges of detecting integer overflow in Ethereum.
While in other software languages and machine codes there is an indication of arithmetic integer overflow (for example, Overflow flag in Assembly ), that is not the case for EVM. There is no indication that an overflow has occurred during an execution of a transaction on the EVM. In some cases, you can deduce that an overflow has occurred from the values that are stored after the execution of the transaction. However, you most probably will have to re-run the transaction and find out overflows using different heuristics.
Integer overflow/underflow can occur after the addition or subtraction of 2 numbers. However, because the multiplication operation is based on addition, it can cause overflow as well. The same goes for exponent operation which is based on multiplication. So specifically, to EVM, those are some of the opcodes that can cause an integer overflow: ADD, SUB, MUL, EXP .
Things get even more complicated when we consider the type of operands. As I have mentioned above, the same hexadecimal value in the storage can be interpreted differently based on the type of slot. For example, 0xff…fff is -1 in signed int, but a MAX_UINT (2²⁵⁶ -1) in unsigned int. Therefore, the detection of integer overflows should be aware of the slot types.
Let’s go through an example:
We can see this behavior clearly with the circle of integers visualization.
Generally, signed integers are more complex and may have more overflow issues than unsigned integers. There is also an arithmetic operation that can cause overflow only in signed numbers. When we have 2 unsigned integers A and B (positive and non-fractions), the result of A / B will always be a positive number smaller than A and B. However, let’s look at the division operation (SDIV) edge case with signed numbers:
While -2²⁵⁵ / -1 should be equal to 2²⁵⁵, in hexadecimal value it is 0x80…000. But, in signed integer type, 0x80…000 represents -2²⁵⁵ [INT_MIN]. So instead of getting a positive number by dividing 2 negative numbers, we get a negative number, which in turn is an overflow. This isn’t possible when we use unsigned numbers division.
Even if we can identify that an overflow has occurred, we can understand what operation caused it and whether the operands are signed or unsigned integers, sometimes it still doesn’t enough. Sometimes an overflow is desirable behavior. Some compilers create an overflow intentionally to run some functionality, and sometimes even the smart contract’s developers base their coding logic on desirable overflows. Therefore, even when we detect an overflow, we can’t be sure whether it is an unexpected behavior that can be a potential vulnerability or a desirable functionality. Thus, the FP (False Positive) rate of integer overflow detection is high.
The types of unsigned and signed integers are declared in the high-level programming language, which for us is Solidity for Ethereum. There are no types on the machine code or byte codes level. Therefore, what happens when there is no Solidity source code for a contract? How can we know whether the addition of 2 numbers is a signed or an unsigned addition, without knowing the types of slots storing those numbers?
Let’s look at the following example:
These two pieces of code are almost identical Solidity codes, except for the types of the parameters ‘a’ and ‘b’. They have been compiled to EVM byte codes using Remix IDE . Unfortunately, we can see that the compiled byte codes of the addition operation are identical, even when the types of the parameters are different. Therefore, we can’t distinguish between them based only on the byte codes.
BeautyChain (BEC) contract is a great example of using an integer overflow as a vulnerability to perform an attack on a contract. The attacker used the behavior of integer overflow to overcome some security checks and have stolen a huge amount of BEC tokens. A link to a great blog describing the attack is mentioned in .
Luckily, some solutions can prevent integer overflow issues.
SafeMath.sol [9, 10] is a well-known library used in many contracts. It provides the basic arithmetic operations but can also check the preconditions and postconditions to understand whether an overflow has occurred. In case it did, the library fails the execution of the transaction and updates the status of the transaction as ‘Reverted’.
You can compile your code with a newer compiler version . This way, the preventive code of external libraries like SafeMath is embedded in the compiled code. However, be sure to design your code properly to avoid Denial of Service attacks that are based on integer overflow.
Many involved in blockchain do not fully comprehend the impact of software flaws and how they can enable vulnerability. It is critical to understand how numbers are represented with computers, what are signed and unsigned numbers, and what an integer overflow attack is to understand the full scope of vulnerabilities.
Valid Network focuses on providing a holistic solution to deal with such integer overflow issues to mitigate risk and vulnerability in applications. What can seem like a simple issue can lead to catastrophic consequences in the software operations, as can be seen in the examples above leading potentially to exploitable situations.
Without ensuring coding can mitigate such issues the smart contracts and software operations handling digital assets are inherently at risk, and this is why Valid Network provides a holistic solution for dealing with potential attack vectors and software issues that can occur on the Ethereum network. We believe no matter how small or minor the vulnerability the impact it could have has the very real potential to cause incredible damage.
Valid Data’s real-time and predictive insights are used by Cryptocurrency traders and exchanges, as well as investors and hedge funds, to make better investment and trading decisions, to protect the value of their digital assets, and to capitalize on market opportunities that only Valid Network’s technology can uncover.