Side Channel Analysis (SCA) is a method by which an adversary can gather information about cryptographic keys by examining the physical environment surrounding the microprocessor while it is performing cryptographic operations. In this article, we present our research which is focused upon devising methods to increase the difficulty of conducting SCA successfully on a microprocessor running Advanced Encryption Standard (AES) encryption. We make use of the open-source, soft-core Java Optimized Processor (JOP) implemented on a Xilinx Virtex 5 ML506 Field Programmable Gate Array (FPGA) evaluation board to evaluate the effectiveness of SCA countermeasures in attacks against the cryptographic algorithm. The experimental results show that implementing a power normalizing mask can increase the security of a device by requiring an adversary to collect up to 87% more data to successfully attack AES.

## I Introduction

Security and cryptography in electronics have played an integral part in society for several decades. Starting with securing military communication channels and in the civilian sector with Automated Teller Machines (ATMs), the need for security has been on the rise for decades. Secure crypto-processors in particular (microprocessors that process cryptographic algorithms) have become the backbone of modern security solutions. One can find crypto-processors in smart cards, cable and satellite TV set top boxes, lottery ticket vending machines, and mobile-phone systems. As adversarial techniques and skills have evolved to compromise crypto-processors, so have the means used by manufacturers to protect or prevent system tampering, reproduction, disabling, and reverse-engineering [3].

There are basically four different classes of attack by which an adversary can attack a crypto-processor: Semi-Invasive Attacks, Invasive Attacks, Remote Attacks, Local Noninvasive Attacks [3]. In this section, we briefly review the attributes of each of the attack classes. Semi-Invasive Attacks do not require damaging the coating of the semiconductor surface, known as the passivation layer, because it uses lasers to ionize atoms within the transistors and change its state. This method is difficult to employ in practice due to the variability inherent when attempting to ionize specific transistors making information extraction unreliable. Invasive Attacks involve actual damage to the device and monitoring of the device interior. Although this can be useful to gain information, it also destroys the device which is unacceptable when an adversary only has a limited number of devices, or only a single device, to analyze. Remote Attacks interface with a device in normal operation over a communication channel such as exploiting a buffer overflow exploit in a networked device. Remote attacks have their place, but they deal solely with programming vulnerabilities and not hardware vulnerabilities. Local Noninvasive Attacks involve gaining information about the device through close observation of the device in operation, watching Electromagnetic (EM) radiation emissions, current consumption, and other environmental effects surrounding the device. Local Noninvasive Attacks were chosen to be the focus of this research because of the magnitude of the risk they pose. They allow attackers to circumvent cryptographic algorithms just by having physical access to the device. Side Channel Analysis (SCA) attacks, characterized as Local Non-Invasive attacks, are the method by which an adversary can cleverly deduce information about a cryptographic system by watching the interaction of a circuit with its surrounding environment. The three main branches of SCA are timing, power-analysis, and EM attacks. In all types, the basic idea is to determine a cryptographic device’s secret key by measuring its execution time, power consumption, and/or electromagnetic field [16].

In this paper, we present the findings of our initial research focused upon improving the security of cryptographic processors. The goal of our research is to propose new methods to protect cryptographic information by making dynamic changes to the underlying architecture of a microprocessor. To measure the effectiveness of different protection methods, we implemented the Advanced Encryption Standard (AES) cryptographic algorithm in a soft-core Java Optimized Processor (JOP) contained within an FPGA and measured the time required to expose the underlying cryptographic key using standard SCA methods.

## II. Related Work

### A. Background

The basic premise of SCA attacks stem from the reality that the switching activity of Complementary Metal-Oxide Semiconductor (CMOS) circuits leak information. When a CMOS circuit charges to logic level ‘1’ or discharges to logic level ‘0’, a change in the electric potential creates a change in the electric field (or current) which is measurable outside the chip. Generally the quantization of the energy for a given value is derived from either the Hamming Weight (HW) or the Hamming Distance (HD). In the case of the HW, the value of a given data is the summation of the bits that are in a ‘non zero’ state. For example, the HW of 0x50 (0b01010000) and the HW of 0x03 (0b00000011) are both two. The HW of 0xFF (0b11111111) is eight. In contrast, the HD is a measure of the change of a value, measuring the number of bits that change from the previous state to the current state. For example the hamming distance between 0x50 and 0x03 is four, while the hamming distance between 0x50 and 0xFF is six. The Hamming Weight can also be thought of as the Hamming Distance between the given value and zero (0x00). Commonly, the model used to describe the information leakage off a chip is given by , where is the “exclusive or” (XOR) of a and b, HW is the Hamming Weight function , is the power consumption used by the circuit when inverting the bit, and is the noise [4].

After monitoring the execution time, power consumption and/or the electric field from a microprocessor, the three main branches of SCA attacks used to find secret key information are: Simple Power Analysis (SPA), Differential Power Analysis (DPA), and Second Order Differential Power Analysis (SODPA) [14]. A SPA attack involves directly observing a system’s power consumption and can be done with only one trace. DPA is significantly more powerful than SPA, but is more complicated and requires the collection of many more traces. DPA looks at the changes in the trace values over time to narrow down using statistical hypothesis testing. DPA is normally done by looking at the difference of means or using Correlation Power Analysis (CPA). Lastly SODPA is a method often used to overcome many time variable masking countermeasures. It involves looking at the values of traces at several points in time for a trace so that all of the mask will be accounted for when various correlation methods are used.

Defenses against these SCA attacks fit into two high-level categories: algorithmic countermeasures (changes made to the algorithm of encryption) and circuit-level countermeasures (changes made to the actual hardware). Countermeasures can be further classified based on the method by which they try to decouple the power consumption with the data being processed, these are: masking countermeasures (trying to make data appear as a different value) and elimination countermeasures [14] (trying to remove any correlation of the data being processed and the power signatures being measured).

### B. Masking Techniques

Many masking techniques at the circuit level introduce random power consumptions which are akin to noise. Examples include Random Switching Logic (RSL) [15], masking-AND [18], and Dynamic Voltage and Frequency Switching (DVFS) [19]. The RSL countermeasure adds in random logic paths, masking-AND masks every output with random inputs, and DVFS randomly modulates voltage and switching frequency to introduce randomness into power traces. All of these circuit level masking techniques, however, are still susceptible to glitches. Glitches are the transitions at the output of a gate that occur before the gate switches to the correct output. Because glitches add to the power signature, they are susceptible to leak information, especially when they leak key information before the correct mask is applied [9][2]. RSL uses random input and enable control signals to randomize the power signature and is thus able to avoid the information leakage posed by glitching, but the enable signals need to be carefully timed to ensure it functions properly [2].

Masking at the algorithmic level has the key notion of minimizing the correlation between intermediate values and the secret key [5]. One simple method to accomplish this is to introduce noise into the power consumption measurements. This method can be overcome by the collection of more samples. In theory, if the variance of the noise is great, then the necessary sample size might be large and infeasible. However, this method is still surmountable by increasing the number of samples used in the analysis [7].

Another option for masking power traces at the algorithmic level is the introduction of Random Process Interrupts (RPI) during the cryptographic algorithm. This approach can be done by interleaving random dummy commands or “No Operation” (NOOP) commands randomly throughout the code thus masking the actual cryptographic algorithm execution sequence. To attack a circuit using RPIs, the correlation spikes can be reconstructed by integrating the signal over the number of consecutive clock cycles equal to the greatest variance in the clock cycles [6]. This method to overcome RPIs is called the “sliding window attack.” For this attack, several traces are integrated together and then compared against other integrated traces for the power spikes [8].

A more common method of masking however is to split a value Z into d shares such that and where is a function like the XOR or modular addition [12]. A masking operation is said to be (d-1)^{th}-order depending on the number of shares d. When a (d-1)^{th} order masking is used, a d^{th}-order DPA can be performed by combining the leakage signals at time intervals L(t_{1}), … , L(t_{d}) resulting from the manipulation of the d shares that make up the value Z. This method of masking generally can be circumvented through the use of higher-order differential power analysis. By combing the leakage signals at time intervals L(t_{1}), … , L(t_{d}) that are the resulting leakages from the manipulation of the d shares that make up the value Z, the differential power spike for correct key guesses can be reproduced [12][10][11].

### C. Elimination Techniques

Elimination is another method that can be used to confound power variation. The key notion of elimination (hiding) is to remove power variation information from the attacker. Where masking seeks to decouple the power variation from the data being processed, elimination seeks to eliminate it. Four ways that elimination is used to protect circuitry are [7]:

- Using constant execution path code
- Choosing operations that leak less information in their power consumption
- Balancing hamming weights and state transitions
- By physically shielding the device

With the goal of elimination techniques to be no variation in power due to the key, no information is leaked through side channels. One example of this is Dynamic and Differential Logic (DDL), where elimination of power differences is done through ensuring that one of the outputs is charged for any input, be it the output or the complimented output, and it ensures that one output transition occurs in every clock cycle. More specifically, DDL logic is split into a precharge state, where all outputs are at zero, and then an evaluation phase, where at least one output or its compliment goes high [17]. Sense Amplifier Based Logic is an implementation of DDL that uses dynamic-CMOS logic. Using this method of DDL, requires the circuit designer to deal with the effects of cascading circuits and can introduce signal integrity issues that degrade the signal making it inefficient [17].

Wave Dynamic and Differential Logic (WDDL) is another type of DDL. WDDL uses a static CMOS implementation of AND and OR gates. Each gate in the WDDL has both the gate with the inputs and a complimentary gate with the inverse of the inputs. By introducing complementary structures, the information that is leaked via the side channel is reduced.

### D. Advanced Encryption Standard (AES)

Advanced Encryption Standard (AES), the cryptography algorithm used in this research, is a symmetric key crypto-algorithm. A symmetric cryptographic algorithm uses the same key to both encrypt and decrypt data. The AES algorithm is made up of eleven rounds. With the exception of the first and last round, each round is made up of the four functions: SubBytes, ShiftRows, MixColumns, and AddRoundKey. Of particular note for this research are SubBytes and AddRoundKey. SubBytes uses a simple substitution algorithm and takes the current hex values and substitutes the values with known quantities in a look-up table called a “substitution box” (also known as the Sbox). AddRoundKey is the function where the key, modified slightly for each round, is XORed with the current text state [1].

Of those four functions, only AddRoundkey directly manipulates the data based on key, which makes AddRoundkey the target for key extraction in SCA. The first call to AddRoundKey uses the original key, and each subsequent call to AddRoundKey uses a different version of the key. Following the AddRoundKey phase is SubBytes which uses a simple substitution algorithm where by the current state of the plain text is used to find the corresponding substitution value in the Sbox. The best place to attack the AES algorithm is between the AddRoundKey phase of the previous round and the SubBytes of the next round. This location is highly vulnerable because given the plaintext, an attacker knows exactly what the state of the plaintext is going into AddRoundKey, but he does not know what the state will be after returning from AddRoundKey as they do not yet know the key. The attacker does, however, know the simple look-up table used in SubBytes and if he can correlate power signatures to approximate values, it’s possible with enough traces to use a statistical algorithm to derive what the key is. The specific location of AES attack is shown in Figure 1.

Fig. 1. Two rounds of AES with location of attack noted with dotted lines