DEVELOPMENT OF THE SEARCH METHOD FOR NON-LINEAR SHIFT REGISTERS USING HARDWARE, IMPLEMENTED ON FIELD PROGRAMMABLE GATE ARRAYS

Nikolay Poluyanenko
Department of Information Systems and Technologies Security
V. N. Karazin Kharkiv National University
4 Svobody Sq., Kharkiv, Ukraine, 61022
rnos@mail.ua

Abstract
The nonlinear feedback shift registers of the second order in GF(2) are considered, because based on them it can be developed a generator of stream ciphers with enhanced cryptographic strength.

Feasibility of nonlinear feedback shift register search is analyzed. These registers form a maximal length sequence, using programmable logic devices.

Performance evaluation of programmable logic devices in the generation of pseudo-random sequence by nonlinear feedback shift registers is given. Recommendations to increase this performance are given. The dependence of the maximum generation rate (clock frequency), programmable logic devices on the number of concurrent nonlinear registers is analyzed.

A comparison of the generation rate of the sequences that are generated by nonlinear feedback shift registers is done using hardware and software.

The author suggests, describes and explores the search method of nonlinear feedback shift registers, generating a sequence with a maximum period. As the main result are found non-linear 26, 27, 28 and 29 degrees polynomials.

Keywords: stream ciphers, random number generators, M-sequence, search of nonlinear shift registers, non-linear polynomials.

DOI: 10.21303/2461-4262.2017.00271

1. Introduction
There is currently rapid development of cryptanalytic systems. One of the main requirements to the main element of the cryptographic stream encryption system – a generator of pseudo-random sequences (PRSs) is an indiscernible of the sequence, complexity, speed and repetition period for PRSs [1]. Cryptographic primitives that meet these requirements are constructions on the basis of linear feedback shift registers (LFSRs).

Common cryptographic algorithms, which are built using LFSRs, are: stream cipher A5/1, used to ensure the privacy of telephone mobile communication of GSM standard [2]; stream cipher E0, used in the Bluetooth protocol [3], etc. LFSR main disadvantage is its linearity, which leads to a relatively simple cryptanalysis [4].

2. Overview of the problem and formulation of research problems
As an alternative for LFSRs to PRSs generation in the stream cipher nonlinear feedback shift registers (NLFSRs) are proposed. NLFSRs on the basis of stream ciphers are included in Achtterbahn [5], Dragon [6], Grain [7], Trivium [8] and VEST [9]. In [10] it is shown that NLFSRs are more resistant to cryptanalytic attacks than LFSRs. Using L cells NLFSRs, a cryptanalyst can take up to \( O(2^L) \) [11] or as given in [12], a sequence of \( L \cdot (L+1)/2 + L \) bits is necessary to determine the structure of L-bit NLFSR generating this sequence.

At the same time, large size NLFSR generation with a guaranteed period remains an unsolved problem [13]. Only some special cases were considered in [14]. It is known that LFSRs generate maximum length sequence (M-sequence) equal to \( 2^L-1 \) if and only if when its characteristic polynomial is primitive [15]. For NLFSRs such property is not found to this day. Small NLFSRs with a maximum period can be built with the help of simulation. Nevertheless, modern computing capacities allow to simulate NLFSRs with a size only to \( L<35 \) [16]. In [17], NLFSRs with a size to \( L<26 \) are shown. This is insufficient for cryptographic applications that require long periods, for example, \( 2^{28} \) [18].
Non-standard hardware solution for implementation of cryptographic algorithms is the use of field programmable gate arrays (FPGAs) [19]. FPGA allows to construct digital devices with high-level hardware description languages, which reduces the complexity of the development and allows the reuse the code through the use of IP-cores [20].

In [21], a search of NLFSRs that are implemented on FPGAs on the basis of analysis of de Bruijn sequences is carried out. Good statistical properties of NLFSR-generated sequences are confirmed and obtained non-linear 25 and 27 polynomials are given.

FPGAs have a high versatility and reliability, which ensures thorough testing [22]. Programmable logic is attractive due to possibility of providing performance close to ASIC technology, to achieve high throughput with high construction flexibility and low power consumption [23].

One of the important advantages of the implementation of cryptographic algorithms on FPGAs is the ability to construct a parallel and asynchronous architecture, superior to GPU and CPU microprocessor-based solutions in performance [24].

In this paper we study NLFSRs performance implemented on the FPGAs and problems of their optimization. Search method of NLFSRs generating M-sequence (M-NLFSRs) is given. This method is based on a practical synthesis of the results obtained previously. The aim of this research is to explore the possibility of NLFSR implementation of FPGA, as well as the development of an acceptable, for hardware and temporal parameters, the search algorithm for M-NLFSR.

3. M-NLFSR search method

The registers are used multiplication of only two cells, and such NLFSRs are called NLFSRs of the second order. Later, in NLFSR operation we understand NLFSR of the second order in $\mathbb{GF}(2)$. General NLFSR construction for the register, consisting of $L = 4$ cells, is shown in Fig. 1.

![Fig. 1. General NLFSR construction](image)

Where $a_i \in \{0,1\}$ appropriate the presence or absence of feedback, $q_i(t) \in \{0,1\}$ — value of the $i$-th register at the time $t$, $Q$ — generated sequence, bits. $\oplus$ denotes a nonlinear function of multiplication and $\odot$ — linear function of summation.

Preliminary feedback coefficients $a_i$ are limited that equate some of the coefficients to zero. The total number of variables (initially not equal to zero) of feedback coefficient $a_i$ denoted as $n_i$. As has been shown in [25], the number of feedback coefficients $a_i$ for NLFSR with $L$ size is calculated by ratio $n_i = L \cdot (L+1) / 2$, therefore, $n_i$ can take values in the range $0 \leq n_i \leq n_L$.

M-NLFSR search method consists of two stages that can be carried out simultaneously or sequentially.

Stage 1: There is a decrease in the test set by excluding NLFSRs not forming an M-sequence. The total amount of rejected polynomials is selected in the range of 90–99% of the possible amount. The volume of rejected polynomials is limited by available memory and time resources.

Rejected set is defined by an analytical method, which is implemented in software. The analytical method consists of checking the feedback coefficients to meet certain requirements as
detailed in [25, 26, 12]. The essence of these requirements is to analyze the place, type and arrangement of non-zero coefficients of feedback $a_i$. Mismatch of $a_i$ coefficients to specified requirements means the inability of NLFSR to form M-sequence and, therefore, its rejection.

After the first stage, we obtain the set of polynomials, acceptable (for time-consuming) to check in the second stage. Let’s denote the number of such polynomials as $k_{FPGA}^{NLFSR}$.

Stage 2: Direct verification of the set of polynomials, obtained after the first stage, is carried out for the possibility of M-sequence generation by them is carried out using computer system.

As shown below, for moving the computational process from CPU on the FPGA, performance (speed) of each of NLFSRs is maintained. This fact allows to tens or hundreds of times increase the overall performance of the complex for M-NLFSR search compared to use only a PC.

4. The structure of the computer system for M-NLFSR search

Complete structure of computer system, which was used for the calculations, is shown in Fig. 2.

![Fig. 2. The structure of the computer system for M-NLFSR search](image)

The hardware consists of the card with integrated chip of Altera company of Cyclone IV E FPGA EP4CE10E22C8N family (China). FPGAs are made at 60 nm optimized process and includes 10 320 logic elements (LEs – Logic Elements). Project for FPGA is written in Verilog HDL, for compilation and simulation CAD Cadence Virtex Prime Version 16.0.0 was used.

USB-TTL converter was used as communication equipment (the maximum data transfer speed of 921 600 bps), performs the function of Universal Asynchronous Receiver/Transmitter (UART) between the FPGA and the PC. Two USB-TTL converters were used to enhance the data rate. They operate in parallel with a maximum data rate.

The software realizes: reading data from the hard disk; formation of a package to send to the FPGA; synchronous (for two USB-TTL converters) data sending to the specified COM port; receiving information through the COM port from the FPGA; decoding, visualization and storage on disk test results carried out in the FPGA.

5. FPGA-based NLFSR performance analysis

Hardware performance is a number of tested NLFSRs per time unit. Hardware performance consists of the following components:

- time for one cycle in the module that implements NLFSR operation (denoted as $t_{FPGA}^{cycle}$). Cycle is implementation of cyclic changes in NLFSR state in which all possible combinations are implemented for M-NLFSR, and the system returns to the initial state. Registers change their state in one clock cycle of the reference frequency, therefore amount of these clock cycles in a cycle should be $2^L$;
- the number of simultaneously tested NLFSRs per cycle (denote as $k_{FPGA}^{NLFSR}$). The parameter depends on the number of LEs, engaged for one NLFSR and total capacity of used FPGA;
- operational clock frequency. Upon reaching a certain maximum clock frequency ($f_{max}^{FPGA}$) FPGA elements have no time to fully execute that leads to errors in the determination of M-RSNOS.

The increase of $k_{FPGA}^{NLFSR}$ leads to an increase in the total number of involved LEs. As an example, the number LEs involved in FPGA to M-NLFSR search for $L = 29$ and $n_1 = 28$ depending from $k_{FPGA}^{NLFSR}$ are shown in Fig. 3.
Due to the increase in the number of LEs involved in FPGA, their mutual arrangement and increasing the physical distance between the interacting elements in the chip, there is a decrease of $f_{\text{max}}^{\text{FPGA}}$. Also $f_{\text{max}}^{\text{FPGA}}$ significantly affect the temperature factor, which can be partly explains a small deviation observed in Fig. 4.

Let’s consider in detail how to change the above characteristics on the number of simultaneously tested NLFSRs. Fig. 4, a shows the dependence of observed $f_{\text{max}}^{\text{FPGA}}$ on $k_{\text{NLFSR}}^{\text{FPGA}}$ for NLFSRs with a size of $L=28$ and $n=28$.

Taking into account obtained results, FPGA configuration modification has been made for $L=29$. Modification allowed to improve performance for $k_{\text{NLFSR}}^{\text{FPGA}}=1$ to $f_{\text{max}}^{\text{FPGA}}=725$ MHz and reduce LEs in the module. Fig. 4, b shows the results of dependence of $f_{\text{max}}^{\text{FPGA}}$ on $k_{\text{NLFSR}}^{\text{FPGA}}$ for NLFSRs with a size of $L=29$ and $n=28$. The change of the clock frequency for $f_{\text{max}}^{\text{FPGA}}$ determination was conducted discretely, in steps of 25 MHz.

Knowing $f_{\text{max}}^{\text{FPGA}}$, we can determine $t_{\text{cycle}}^{\text{FPGA}}$ for different number of simultaneously calculated NLFSRs in FPGA. Based on these data, we can calculate the time $t_{0}^{\text{FPGA}}$ that is necessary to spend for the analysis of all $k_{\text{NLFSR}}^{\text{FPGA}}$ polynomials in the search for the M-NLFSR in hardware. Elapsed time (in hours) will be calculated by the following relationship:

$$t_{0}^{\text{FPGA}} = \frac{1}{3600 \text{sec}} \cdot k_{\text{NLFSR}}^{\text{FPGA}} \cdot t_{\text{cycle}}^{\text{FPGA}}.$$  

The dependence of the estimated time for M-NLFSR search for $L=29$ and restrictions imposed, in which $n=28$, $k_{0}^{\text{FPGA}}=17619713$, on the number of simultaneously processed polynomials in FPGA is shown in Fig. 5.

As can be seen in the graph, maximum system performance can’t be achieved with maximum use of all FPGA elements due to the need to reduce the clock frequency. The optimal number of simultaneously tested NLFSRs in this case is in the range from 79 to 115, after which the time for search begins to increase.

In [27], there is a generation time of 1 GB by the generator, written in assembler and optimized for different NLFSRs, including LFSRs. To compile all the examples FASM was used (flat assembler version 1.71.51). The calculations were performed on a personal computer (64 bit Windows 7 SP 1, Intel Core i5-3210M CPU 2,5GHz processor). According to the research results,
the time to generate the 1 GB ranges from 16 to 76 seconds, depending on the size and form of the polynomial.

For comparison, Table 1 shows the time for generation of 1 GB NLFSR implemented in FPGA on different clock frequencies ($f_{\text{FPGA}}$).

![Fig. 5. Dependence of $t_{\text{FPGA}}$ (in hours) on $k_{\text{NLFSR}}$ for $L = 29$](image)

<table>
<thead>
<tr>
<th>$f_{\text{FPGA}}$</th>
<th>$t$</th>
</tr>
</thead>
<tbody>
<tr>
<td>50 MHz</td>
<td>172 s</td>
</tr>
<tr>
<td>100 MHz</td>
<td>86 s</td>
</tr>
<tr>
<td>200 MHz</td>
<td>43 s</td>
</tr>
<tr>
<td>400 MHz</td>
<td>21.5 s</td>
</tr>
<tr>
<td>600 MHz</td>
<td>14.3 s</td>
</tr>
<tr>
<td>700 MHz</td>
<td>12.3 s</td>
</tr>
<tr>
<td>725 MHz</td>
<td>11.8 s</td>
</tr>
</tbody>
</table>

As can be seen from Table 1, the use of the FPGA-based hardware components for PRSs generation can improve performance (two and more times) as compared to using only the CPU of a personal computer.

It is worth noting that these results are valid for used FPGA model. Using another FPGA model, it is likely to get higher system efficiency. Thus, [28] shows the results of the efficiency of hardware implementations of ciphers of GRACE-S family for various FPGA families. Unfortunately, investigated families didn’t contain Cyclone IV, but tested FPGAs of Stratix IV and Stratix V family showed more than two times greater efficiency than FRGAs of Cyclone II or Cyclone V family.

6. Search results for M-NLFSRs

M-NLFSRs search with a size of $L = 26, 27, 28$ and $29$ with the above described method was carried out. Obtained polynomials, as well as time expenditures, are shown below.

$L = 26$:

\[
\begin{align*}
  x^{26} + x^{24} + x^{23} + x^{20} + x^{18} + x^{14} + x^{12} + x^6 + x^5 + x^4 + x^3 + x^2 + x^1 + x^0, \\
  x^{26} + x^{25} + x^{22} + x^{21} + x^{19} + x^{17} + x^{16} + x^{15} + x^{13} + x^6 + x^5 + x^4 + x^3 + x^2, \\
  x^{26} + x^{14} + x^{12} + x^{10} + x^{8} + x^{6} + x^{4} + x^{3} + x^{2} + x^{1} + x^{0}, \\
  x^{26} + x^{22} + x^{21} + x^{20} + x^{19} + x^{17} + x^{16} + x^{15} + x^{14} + x^{13} + x^{12} + x^{11} + x^{8} + x^{7} + x^{6} + x^{5} + x^{2} + x^{1} + x^{0}.
\end{align*}
\]

Search time – 43 hours 48 minutes.
The use of hardware component will significantly reduce M-NLFSRs search time. As an example, for software search at L=25 was spent 18 days, and by using hardware components for L=26 – less than two days. Estimated time for which would have made a similar search for L=26, using a software method, would constitute more than 3 months. Taking into account the modifications, introduced in the search algorithm for L=27, 28 and 29, we can expect another three-time increase in performance for L=26, while using hardware components.

Thus, using this method, we get a general performance increase for search of M-NLFSRs generating M-sequences by more than 150 times as compared with the method using only PC CPU resources. The use of more expensive and productive FPGAs will further reduce the time to search of M-NLFSRs.

7. Conclusions

The method for M-NLFSRs search using hardware-software system is given. Based on the method, as an example, the result of the search of non-linear 26, 27, 28 and 29 degrees polynomials for generation of M-sequences is given.

Use of FPGA-based hardware component can significantly improve the performance of complex for search of M-NLFSR. The time for M-NLFSR search is increased by more than 150 times as compared to using only the computing power of PC.

FPGA-based NLFSR performance capabilities are given. Recommendations on optimizing the performance of the complex to search of M-NLFSR are given.
A performance comparison of NLFSR generators performed by using the software and hardware implementation is carried out. It is shown that the hardware implementation of single FPGA-based NLFSR isn’t inferior to the performance of its program analogue and in the case of a generator with several parallel NLFSRs can significantly exceed its.

Stable performance of FPGA-based NLFSR is implemented with a clock frequency (generation speed in NLFSR), equal to 725 MHz.

On the basis of the results it can be developed stream ciphers with enhanced cryptographic strength. The application of obtained non-linear polynomial allows to obtain stream ciphers, which are analogues of modern developments in this field, and the use of the described search method allows to find higher degree M-NLFSR.

References


