Cryptographic Algorithms
This page lists commonly used cryptographic algorithms and
methods and explains the basic underlying concepts. It also tries to give references to implementations and textbooks.
Where available, comments are also made about the usefulness or other aspects of the algorithms.
Public Key Cryptosystems
Public key cryptosystems were invented in late 1970's, possibly
with help from the development of complexity theory of algorithms around that time. It was observed that based
on a problem so difficult that it would need thousands of years to solve, and with some luck, a cryptosystem could
be developed which would have two keys, the secret and the public. With the public key one could encrypt messages,
and decrypt them with the private key. Thus the owner of the private key would be the only one who could decrypt
the messages, but anyone knowing the public key could send them in privacy.
Another idea that was observed was that of a key exchange.
In a two-party communication it would be useful to generate a common secret key for bulk encryption using a secret
key cryptosystem (e.g. some block cipher).
Indeed, Whitfield Diffie and Martin Hellman used ideas from
number theory to construct a key exchange protocol that started the era of public key cryptosystems. Shortly after
that Ron Rivest, Adi Shamir and Leonard Adleman developed a cryptosystem that was the first real public key cryptosystem
capable of encryption and digital signatures.
Later several public cryptosystems followed using many different
underlying ideas (e.g. knapsack problems, different groups on finite fields and lattices). Many of them were soon
proven to be insecure. However, the Diffie-Hellman protocol and RSA appear to remained two of the strongest up
to now.
Terminology
Basic ingredient in any public key cryptosystem is a difficult
computational problem. The security of the cryptosystem is based on the fact that the secret key can be computed
from the public key only by solving this difficult problem. We will introduce enough terminology that all the basic
problems in public key cryptography can be explained.
- Algorithms.
An algorithm is an explicit description how a particular computation should be performed (or a problem solved).
The efficiency of algorithms can be measured as the number of elementary steps it takes to solve the problem. So
if we claim that the algorithm takes time O(n) then we mean that it takes n elementary steps, but we do not specify how long one step takes.
- Computational complexity.
Let a problem be polynomial time or in P if it can be solved by an algorithm which takes less than O(nt) steps,
where t
is selected appropriately (i.e. large enough) number and n variable that measures the size of the problem instance. If a guessed
solution to a problem can be verified in polynomial time then the problem is said to be in NP (non-deterministic
polynomial time). It is NP-hard if there
is no other problem in NP that is easier to solve. There is no known polynomial time algorithm for any
NP-hard
problem, and it is believed that such algorithms in fact do not exist.
In public key cryptography the attacker is interested in
solving particular instances of a problem (factoring some given number), rather than providing a general solution
(an algorithm to factor any possible number efficiently). This causes some concern for cryptographers, as some
instances of a problem that is NP-hard in general may be easily solvable.
- Primes.
A prime is such an number that is not divisible by any other number than by itself and 1. Thus the integers
2,3,5,7,11,...
and so on are primes. There are infinitely many primes, and (one of) the biggest prime numbers currently
known is (26,972,593)-1.
- Factoring.
Every integer can be represented uniquely as a product of prime numbers. For example, 10 = 2 * 5 (the
notation *
is common for multiplication in computer science) and it is unique (except for the order of the factors 2 and
5).
The art of factorization is almost as old as mathematics itself. However, the study of fast algorithms for factoring
is only few decades old.
The immediate algorithm tries to divide the input by all
small prime numbers iteratively until the remaining number is prime. This is sufficient only for integers that
are, say, about 1016 as this already requires trying all primes up to 108. In public key cryptosystems, using the problem of factoring, numbers
are of size 10300 and this would require trying all primes up to 10150 and there are about 10147
such prime numbers according to the prime number theorem. This exceeds the number of atoms in the universe, and
is unlikely to be enumerated by any effort.
The easy instance of factoring is the case where the given
integer has only small prime factors. For example, 759375 is easy to factor as we can write it as 35* 55. In cryptography we want to use only those integers that have only
large prime factors. Preferrably we select an integer with two large prime factors, as is done in the RSA cryptosystem.
Currently one of teh best factoring algorithms is the Number
field sieve (NFS) that consists of a sieving phase and a matrix step. The sieving phase can be distributed (and
has been several times) among a large number of participants, but the matrix step needs to be performed on large
supercomputers. The effectiveness of NFS becomes apparent as it can factor any integers around 10150 in a few months time. The NFS algorithm takes subexponential time
(which is still not very efficient).
There is no known proof that factorization of integers would
be NP-hard
nor that it would not be polynomial time solvable. If any NP-hard would be polynomial time solvable, then also factoring would,
but there is very little hope that this would happen. It is plausible under current knowledge that factoring is
not polynomial time solvable.
- Discrete logarithm.
Another important class of problems is the problem of finding n given only some y such that y
= gn.
The problem is easy for integers, but when we are working in a slightly different setting it becomes very hard.
To obscure the nature of n in gn,
we divide the infinite set of integers into a finite set of remainder
classes. Intuitively, we take the string of integers
and wrap it on a roll of, say, toilet tissue (which has perimeter of length m), looking at it from the side.
The numbers 0,
m, 2m, 3m, ... all cover the same spot on the roll,
and therefore are said to be in the same equivalence class (we also write "0
= m = 2m = ... (mod m)"). Each equivalence class
has a least representative in 0 .. m-1. So you can write integer n as t
+ km for any integer t, where 0
<= t < m. It is a convention to write n = t (mod m)
in this case. Here m
is said to be the modulus.
It can be shown that you can add, multiply and exponentiate
with these classes of integers (modulo some m).
This structure, when m
= p with p a prime number, is often called a prime field or a Galois field GF(p).
According to the proper mathematical terminology it is a finite
field of characteristic p, where p is
the modulus. If m
is not a prime number then the structure is called a (finite) ring. More information about groups, fields and rings can be read from
any good elementary text in algebra.
The discrete logarithm problem in a finite field (of characteristic
p,
where p
is a prime number) is then stated as follows: given two positive non-zero integers a,
g (both less than p), compute n such that a
= gn
(mod p). We may require that the answer exists. To
make this problem cryptographically hard p should be a large prime number (about 10300
and n,
in general, of same magnitude.
This problem is currently considered as hard as factoring.
The best method known at this time is the Number field sieve for discrete logarithm (uses similar ideas as NFS
for factoring). But for discrete logarithm problem there are also other important methods.
Pollard's rho and lambda algorithms and also Shanks' baby-step-giant-step.
These are generic discrete logarithm algorithms (and not just against the above structure). They use the basic
group structure and require only squared root of the order of the element g steps to find the discrete logarithm in the general case. There is
also the important Pohlig-Hellman algorithm for case where the order of g can be factored into small primes exclusively.
The discrete logarithm problem may appear more complicated
than integer factoring, but in many ways they are similar. Many of the ideas that work for factoring can be also
applied in the setting of discrete logarithms. There is little hope to find a polynomial time algorithm to discrete
logarithm computation (for example, in the finite field of characteristic p). In such a case it would be likely that factoring problems also
could be efficiently solved.
Many other structures for difficult discrete logarithm problems
include elliptic curves and other group structures over finite fields. The generic methods work in all the structures,
but the NFS algorithm does not. This has also the effect that there are some key size benefits for using discrete
logarithm based public key cryptosystems rather than factoring based.
- Knapsacks.
Given a small set of integers, the knapsack problem consists in determining a subset of these integers such that
their sum is equal to a given integer. For example, given (2,
3, 5, 7) and 10, we can find the solution 2
+ 3 + 5 = 10, and thus solved the knapsack problem,
by brute force search.
The general knapsack-problem is provably NP-hard,
and thus appears superior to factorization and discrete logarithm for use in public key cryptosystems. Unfortunately,
all cryptosystems that have used this underlying idea have been broken - as the used instances of the problem have
not been really NP-hard.
- Lattices.
Now we define a vector basis wi
= < w1,
..., wn> for i
= 1, ..., m, and the lattice that is generated by
the basis. That is, elements of the lattice are of the form t1w1
+ t2w2
+ ... + tmwm,
where ti are integers.
The problem of finding the shortest vector in a lattice (using
the usual Euclidean distance) is a NP-hard problem (for lattices of sufficiently large dimension).
However, the celebrated LLL-algorithm by Lenstra, Lenstra
and Lovasz computes an approximate solution in polynomial time. The effectiness of the LLL-algorithm comes from
the fact that in many cases approximate solutions are good enough, and that surprisingly often the LLL-algorithm
actually gives the shortest vector. Indeed, this algorithm has been often used to break cryptosystems based on
lattice problems or knapsacks. It has been applied also to attacks against RSA and DSS.
Practical cryptosystems
The wide interest in public key cryptography has produced
several practically important cryptosystems. In the following they are listed in order of the underlying problem.
As a basic guideline, a public key cryptosystem is build
from a difficult problem as follows: take a difficult problem (for example, NP-hard) to which you can find an instance that can be solved in polynomial
time. To encrypt
a message, convert the message into such an easy instance of the difficult problem, then use the public key to
convert the easy problem into a difficult one. The result is then sent to the recipient through an insecure channel.
To decrypt
use the private key to convert the difficult problem into the easy one and solve it. All public key systems use
this principle, although they differ significantly in the details (like the underlying problem or the structure
of public and private key).
For good survey on appropriate key lengths see Lenstra and
Verheul's Selecting Cryptographic Key Sizes (appeared in Public
Key Cryptography 2000). They present a complete
analysis of key sizes for almost all cryptosystems.
Below, along with each cryptosystem you will find the current
recommendations for key sizes where appropriate. These recommendations are not always equal to the Lenstra's and
Verheul's.
Factorization: RSA, Rabin
- RSA (Rivest-Shamir-Adleman) is probably the most commonly used public
key algorithm. It can be used both for encryption and for digital signatures. The security of RSA is generally
considered equivalent to factoring, although this has not been proved.
RSA computation takes place with integers modulo n = p * q,
for two large secret primes p, q. To encrypt a message m, it is exponentiated with a small public exponent e. For
decryption, the recipient of the ciphertext c = me
(mod n) computes the multiplicative reverse d = e-1 (mod (p-1)*(q-1))
(we require that e
is selected suitably for it to exist) and obtains cd
= m e * d
= m (mod n). The private key consists of n, p, q, e, d
(where p
and q
can be forgotten); the public key contains only of n,
e. The problem for the attacker is that computing
the reverse d
of e
is assumed to be no easier than factorizing n. More details are available in many sources, such as in the Handbook of Applied Cryptography.
The key size (the size of the modulus) should be greater
than 1024
bits (i.e. it should be of magnitude 10300)
for a reasonable margin of security. Keys of size, say, 2048 bits should give security for decades.
Dramatic advances in factoring large integers would make
RSA vulnerable, but other attacks against specific variants are also known. Good implementations use redundancy
(or padding with specific structure) in order to avoid attacks using the multiplicative structure of the ciphertext.
RSA is vulnerable to chosen
plaintext attacks and hardware and fault attacks.
Also important attacks against very small exponents exist, as well as against partially revealed factorization
of the modulus.
The proper implementation of the RSA algorithm with redundancy
is well explained in the PKCS standards (see rfcs 2314,
2315, 2437).
Those give detailed explanation about how to implement encryption and digital signatures, as well as formats to
store the keys. The plain RSA algorithm should not be used in any application. It is recommended that implementations
follow the standard as this has also additional benefit of interoperability with most major protocols.
RSA is currently the most important public key algorithm.
It is patented in the United States (patent will expire 2000).
- The Rabin cryptosystem may be seen as a relative of RSA, although it has a
quite different decoding process. What makes it interesting is that breaking Rabin is provably equivalent to factoring.
Rabin uses the exponent 2 (or any even integer) instead of odd integers like RSA. This has
two consequences. First, the Rabin cryptosystem can be proven to be equivalent to factoring; second, the decryption
becomes more difficult - at least in some sense. The latter is due to problems in deciding which of the possible
outcomes of the decryption process is correct.
As it is equivalent to factoring the modulus, the size of
the modulus is the most important security parameter. Moduli of more than 1024 bits are assumed to be secure.
There are currently no standards for the Rabin algorithm
but it is explained in several books. The IEEE P1363 project
might propose a standard and thus make it more widely used.
The equivalance to factoring means only that being able to
decrypt any message encrypted by the Rabin cryptosystem gives out a method for factoring the modulus. Thus it is
no guarantee of security in the a strong sense. Of course, it might be possible that the attacker can exploit,
for example, some implementation details.
- References:
Discrete logs: Diffie-Hellman, ElGamal, DSS
- Diffie-Hellman is a commonly used protocol for key exchange.
In many cryptographical protocols two parties wish to start
communication. However, assume they do not initially possess any common secret and thus cannot use secret key cryptosystems.
The key exchange by Diffie-Hellman protocol remedies this situation by allowing the construction of a common secret
key over insecure communication channel. It is based on a problem related to discrete logarithms, namely the Diffie-Hellman
problem. This problem is considered hard, and it is in some instances as hard as the discrete logarithm problem.
The Diffie-Hellman protocol is generally considered to be
secure when an appropriate mathematical group is used. In particular, the generator element used in the exponentiations
should have a large period (i.e. order).
Discrete logarithm algorithms can be used to attack Diffie-Hellman,
and with passive attacks that is best what is currently possible - assuming correcly chosen parameters. If Diffie-Hellman
is applied with usual arithmetic modulo a prime number, then it suffices to select a large enough prime and taking
some care in selecting the generator element. Subtle problems may be caused by bad choices of the generator. Many
papers have been written on the problems that may occur, one of the more well-known references is Oorschot and
Wiener's On Diffie-Hellman key agreement with short
exponents (Eurocrypt'96).
Attacks against Diffie-Hellman include also the man-in-the-middle attack. This attack requires adaptive intervention, but is in practice very
easy if the protocol doesn't use countermeasures such as digital signatures.
Usually Diffie-Hellman is not implemented on hardware, and
thus hardware attacks are not an important threat. This may change in future, when mobile communication becomes
more widespread.
- DSS (Digital Signature Standard). A signature-only mechanism endorsed
by the United States Government. The underlying algorithm DSA (Digital Signature Algorithm) is similar than the
one used by ElGamal or by the Schnorr signature algorithm. Also it is fairly efficient and does not leave behind
other signature algorithms, although RSA necessarily wins when doing verification. Standard defines DSS to use
SHA-1 hash function exlusively to compute message digests.
The main problem with DSS is the fixed subgroup size (the
order of the generator element), which limits the security to ballpark of only 80 bits. Hardware
attacks can be a concern to some implementations
of DSS. However, it is widely used and accepted as a good algorithm.
- The ElGamal public key cipher. This is the straight-forward extension of Diffie/Hellman's
original idea on shared secret generation. Essentially, it generates a shared secret and uses it as a one-time
pad to encrypt one block of data. ElGamal is the predecessor of DSS and perfectly usable today, although no widely
known standard has been created for it.
- Elliptic curve cryptosystems are just another way of implementing discrete logarithm methods.
Elliptic curve in cryptography is basically a set of points that satisfy the equation y2 = x3 + ax + b
when considered in finite field of characteristic p (where p must be larger than 3). Slightly different equation is needed for the case with small characteristic
p = 2
and p = 3.
The points on elliptic curves can be added together and they
form a structure called group (infact an abelian
group). This is just a way of saying that you can
do arithmetic with them as you can do with integers when using just addition and subtraction.
They have some theoretic benefits but also are also very
practical. There is no known subexponential algorithm for computing discrete logarithms of elliptic curves unlike
discrete logarithms in (the multiplicative group of) a finite field, in hyperelliptic curves (of large genus) or
in many other groups. One practical benefit from the non-existence of fast discrete logarithm computation for elliptic
curves is that the key size, as well as the produced digital signatures and encrypted messages are small. Indeed,
very simplistical way of computing the security limit for the key size is to take a key size to a secret key cryptosystem
in bits and then just multiply it by 2. This gives rough estimate, that is good at the moment for a generic
instance of elliptic curves.
Elliptic curves can be implemented very efficiently in hardware
and software, and they compete well in speed with cryptosystems such as RSA and DSS. There are several standardization
attempts for elliptic curve cryptosystems (for example, ECDSA by ANSI). At the moment elliptic curves are widely
known, but not very widely used in practice.
The security of elliptic curve cryptosystems has been rather
stable for years, although significant advances have been achieved in attacks against special instances. Nevertheless,
these have been conjectured by the leading researchers several years ago and no great surprises have yet emerged.
The algorithm XTR recently introduced by Lenstra and Verheul
might become a good competitor for elliptic curves. However, elliptic curves appear to be slightly better in performance,
and definitely scale better in the key size.
- LUC is a public key cryptosystem that uses a special group based on Lucas sequences
(related to Fibonacci series) as its basic building block. It can implement all the common discrete logarithm based
algorithms, and in a sense LUC is a class of public key algorithms.
It is possible to view the underlying structure of the algorithm
as a certain multiplicative group of a finite field of characteristic p with degree 2 (written shortly as Fp2) - and this can be used to prove that there exists a subexponential
algorithm for computing discrete logarithms and thus attacking the LUC algorithm. Thus it might seem that LUC is
of little interest, as it is basically just another way of implementing discrete logarithm based algorithm on finite
fields. However, LUC uses the specific arithmetic operations derived from Lucas sequences that might be slightly
faster (by a constant factor) than what would be a more direct approach.
The different public key algorithms based on LUC arithmetic
are called LUCDIF (LUC Diffie-Hellman), LUCELG (LUC ElGamal), and LUCDSA (LUC Digital Signature Algorithm). Several
of these are patented.
The fact that values used in LUC algorithm can be represented
as pair of values gives some additional advantage against just using integers modulo p. The computations only involve numbers needing half the bits that
would be required in the latter case. As the LUC group operation is easy to compute this makes LUC algorithms competitive
with RSA and DSS.
There appears to be little reason to use LUC cryptosystems,
as they offer little benefit over elliptic curves or XTR.
- XTR
is a public key cryptosystem developed by Arjen Lenstra and Eric Verheul. The XTR is similar to LUC in that it
uses a specific multiplicative group of a particular finite field (in fact Fp6) as its underlying group. However, XTR has novel features such as
needing only, something like, 1/3 of the bits for signatures and encrypted messages. This is achieved
using a specific trace-representation of the elements of this group, and performing all computations using this
representation.
All discrete logarithm based public key algorithms can be
implemented with XTR ideas. So in a way XTR is a generic name for a class of public key algorithms, similarily
to LUC.
Perhaps surprisingly, the algorithm is efficient and according
to Lenstra and Verheul it might be a good substitute to elliptic curves, DSS and even RSA. It has the advantage
over elliptic curves that it is based essentially on the same discrete log problem as, say, DSS, which may help
cryptographers and others to accept it faster as a strong algorithm.
- References:
Knapsacks
There are only few interesting knapsack public key cryptosystems,
none of which are of practical importance.
- Rivest-Chor cryptosystem is based on particular variant of the knapsack problem.
This is one of the knapsack cryptosystems that has best resisted attacks.
- Merkle-Hellman. This was the first knapsack cryptosystem. It was based on the simple
idea of hiding the easy super-increasing knapsack problem by masking. However, it was broken already in 1980.
- References:
- M. E. Hellman and R. C. Merkle: Public Key Cryptographic
Apparatus and Method. US Patent 4,218,582, 1980.
- B. Chor and R.L. Rivest: A knapsack type public key cryptosystem
based on arithmetic in finite field, Crypto '84.
Lattices
In recent years large interest has been directed towards
lattice based cryptosystems. One of the reasons is that certain classes of lattice problems are NP-hard, and several
efficient cryptosystems have been proposed and appear strong.
Secret Key Cryptosystems (Symmetric
Ciphers)
Secret key algorithms use a the same key for both encryption
and decryption (or the one is easily derivable from the other). This is the more straightforward approach to data
encryption, mathematically less problematic, and has been used for many centuries.
The following terminology is often used when symmetric ciphers
are examined.
- S-boxes:
lookup tables that map n bits to m bits (where n and m often are equal).
There are several ways of constructing good S-boxes for ciphers,
as well as several ways of measuring them. Some designers use rigorous mathematical approach by using bent functions
(or related) to create S-boxes which can be proven to be resistant against some particular attacks. Other designers
use heuristic approaches, which may lead to S-boxes that are more difficult to handle in mathematical proofs, but
can have additional benefits (such as several different implementation options).
The S-box may even be the only non-linear part of the cipher.
This is the case in the block cipher DES, and thus may be regarded as the single most important part of the
algorithm. Infact, many consider DES's S-boxes so good that they use them in their own designs (for example, Serpent).
- Feistel networks:
a Feistel network is a general device of constructing block ciphers from simple functions. The original idea was
used in the block cipher Lucifer, invented by Horst Feistel. Later, several variations have been devised.
Simply put, the standard Feistel network takes a function
from n
bits to n
bits and produces an invertible function from 2n bits to 2n bits. The basic function upon which the structure is based is often
called the round function. The essential property of Feistel networks that makes them so useful in cipher design
is that the round function need not be invertible, but the resulting function always is.
If the round function depends on, say, k bits of a key,
then the Feistel cipher requires rk bits of the key where r is the number of rounds used. The security of the Feistel structure
is not obvious, but analysis of DES has shown that it is a good way to construct ciphers. It is compulsory that
a Feistel cipher has enough rounds, but just adding more rounds does not always guarantee security.
- The operation of taking the user key and expanding it into
rk
bits for the Feistel rounds is called key scheduling. This is often a non-linear operation, so that finding out any of
the rk
bits of the round keys does not directly provide any information about the actual user key. There are several ciphers
that have this basic structure; Lucifer, DES, and Twofish, just to name a few.
- Expansion, Permutation:
these are common tools in mixing bits in a round function. They are linear operations, and thus not sufficient
to guarantee security. However, when used with good non-linear S-boxes (as in DES) they are vital for the security
because they propagate the non-linearity uniformly over all bits.
- bit slice operations
(bitwise logic operations XOR, AND, OR, NOT and bit permutations): The idea of bitslice implementations of block
ciphers is due to Eli Biham. It is common practice in vector machines to achieve parallel operation. However, Biham
applied it on serial machines by using large registers as available in modern computers. The term "bitslice"
is due to Matthew Kwan.
Basically all block ciphers can be written in bitslice manner,
but operations such as addition and multiplication may become very slow. On the otherhand, permutations are almost
free as they just require naming of the registers again and this can be done one the coding level.
Thus, for example, in DES exhaustive key search using bitslice techniques, one can increment
the current key in a fraction of the time than is usually needed for key scheduling.
The new AES candidate Serpent is designed
to be implemented using bitslice operations only. This makes it particularly efficient on modern architectures
with many registers.
The One-Time Pad
One-time pad (OTP) is the only cipher that has been proven
to be unconditionally secure, i.e. unbreakable in practice. It has also be proven that any unbreakable, unconditionally
secure, cipher must in principle be a one-time pad.
The Vernam cipher (invented by G. Vernam in 1917) is a famous
instance of an OTP. This cipher is very simple: take a stream of bits that contains the plaintext message, and
a secret random bit-stream of the same length as the plaintext which is considered the key. To encrypt the plaintext
with the key, sequentially exclusive-or each pair of key bit and plaintext bit to obtain the ciphertext bit. If
the key is truly random, it can be proven that the attacker has no means of deciding whether some guessed plaintext
is more likely than any other when having only the ciphertext and no knowledge of the plaintext.
The practical problem is that the key does not have small
constant length, but the same length as the message, and one part of a key should never be used twice (or the cipher
can be broken). So, we just have traded the problem of exchanging secret data for the problem of exchanging secret
random keys of the same length. However, this cipher has allegedly been in widespread use since its invention,
and even more since the security proof by C. Shannon in 1949. Although admittedly the security of this cipher had
been conjectured earlier, it was Shannon who actually found a formal proof for it.
More information can found, for example, in the book by D.
Stinson Cryptography: Theory
and Practice.
DES
The Data Encryption Standard (DES) is an algorithm developed in the mid-1970s. It was turned into a
standard by the US National
Institute of Standards and Technology (NIST), and
was also adopted by several other governments worldwide. It was and still is widely used, especially in the financial
industry.
DES is a block cipher with 64-bit block size. It uses 56-bit
keys. This makes it suspectible to exhaustive key search with modern computers and special-purpose hardware. DES
is still strong enough to keep most random hackers and individuals out, but it is easily breakable with special
hardware by government, criminal organizations, or major corporations. In large volumes, the cost of breaking DES
keys is of the order of tens of dollars. DES is getting too weak, and should not be used in new applications.
A variant of DES, Triple-DES (also 3DES) is based
on using DES three times (normally in an encrypt-decrypt-encrypt sequence with three different, unrelated keys).
The Triple-DES is arguably much stronger than (single) DES, however, it is rather slow compared to some new block
ciphers.
Nevertheless, even though DES seems to be of little interest
for applications of today there are many reasons for considering it still important. It was the first block cipher
which was widely deployed on the public sector. Thus it played an important role in making strong cryptography
available to the public.
Also the design was exceptionally good for a cipher that
was meant to be used only a few years. DES proved to be a very strong cipher and it took over a decade for any
interesting cryptanalytical attacks against it to develop (not to underestimate the pioneering efforts that lead
to this breakthrough). The development of differential cryptanalysis and linear cryptanalysis opened ways to really
understand the design of block ciphers.
Although at the time of DES's introduction its design philosophy
was held secret, it did not discourage its analysis - to the contrary. Some information has been published about
its design, and one of the original designers, Don Coppersmith, has commented that they discovered ideas similar
to differential cryptanalysis already while designing DES in 1974. However, it was just matter of time that these
fundamental ideas were re-discovered.
Even today, when DES is no longer considered a practical
solution, it is often used to describe new cryptanalytical techniques. It is remarkable that even today, there
are no cryptanalytical techniques that would completely break DES in a structural way, indeed, the only real weakness
known is the short key size (and perhaps the small block size).
AES
In response to the growing feasibility of attacks against
DES,
NIST has launched a call for proposals for an official successor that meets 21st
century security eeds. This successor will be called the Advanced Encryption Standard (AES),
and the decision will be made in the summer of 2001. Five algorithms have made it into the second round,
from which one or more will be turned into the final standard. We will now have a quick look at each of them and
their cryptographic peculiarities. All the ciphers have 128 bit block size and they support 128, 192 and 256 bit keys. The rather large key sizes are probably required to give
means for construction of efficient hash functions.
- MARS by Zunic et al. (IBM).
This is an interesting new design (using a special type of
a Feistel network), which depends heavily on the instruction sets available on modern 32-bit processors. This has
the benefit that on these target machines it is efficient, but it may lead to implementation difficulties in cheaper
architectures like smart cards.
- RC6 by Rivest, Robshaw and Yin from RSA Laboratories.
RC6 follows the ideas of RC5 - implementing many improvements.
For example, it attempts to avoid some of the differential attacks agains RC5's data dependent rotations. However,
there are some attacks that get quite far, and it is unclear whether RC6 is well enough analysed yet.
- Rijndael
by Joan Daemen and Vincent Rijmen.
Rijndael follows the tradition of square ciphers (i.e. it
is based on ideas similar to the Square cipher). It is very fast - one of the fastest AES candidates on any platform.
This seems to be very good, but there have been comments that it has too few rounds.
- Serpent by Anderson, Biham and Knudsen (Cambridge University).
Serpent has a conservative but also innovative design. It
may be implemented by bitslice (or vector) gate logic throughout. This makes it rather complicated to implement
from scratch, and writing it in a non-bitslice way involves an efficiency penalty.
The 32 rounds lead to the probably highest security margin on all AES candidates,
while it is still fast enough for all purposes.
- Twofish Schneier et al. (Counterpane)
Twofish is a new block cipher designed by Counterpane (CEOed
by Bruce Schneier). The design is highly delicate, with many alternative was of implementation. It is cryptanalysed
in much detail, by the very authoritative "extended Twofish team". It is basically a Feistel cipher,
but utilizes many different ideas.
This cipher has key depended S-boxes like Blowfish
(another cipher by Bruce Schneier).
Blowfish
Blowfish was designed by Bruce Schneier. It is a block cipher
with 64-bit block size and variable length keys (up to 448 bits). It has gained a fair amount of acceptance in
a number of applications, including Nautilus and PGPfone.
Blowfish utilizes randomized S-box idea: while doing key
scheduling, it generates large pseudo-random look-up tables by doing several encryptions. The tables depend on
the user supplied key in a very complex way. This approach has been proven to be highly resistant against many
attacks such as diffential and linear cryptanalysis. Unfortunately it also means that it is not the algorithm of
choice for environments where large memory space (something like than 4096 bytes) is not available..
The only known attacks against Blowfish are based on its
weak key classes.
IDEA
IDEA (International Data Encryption Algorithm) is an algorithm
developed at ETH Zurich in Switzerland by Xuejia Lai. It uses a 128 bit key, and it is generally considered to
be very secure. It has been one of the best public known algorithms for some time (before the AES standard started
its second round, see below). It has been around now for several years, and no practical attacks on it have been
published despite of numberous attempts to analyze it.
One of the best attacks uses the impossible differential
idea of Biham, Shamir and Biryukov. They can attack only 4.5 rounds of IDEA and this posess no threat to the total of 8.5
rounds used in IDEA. Although Lai claimed that IDEA is secure against differential attacks after just the 4 rounds,
and this attack already gets (almost) past it.
IDEA is patented in the United States and in most of the
European countries.
RC4
RC4 is a cipher designed by Ron Rivest at RSA Data Security,
Inc. It used to be a trade secret, until someone posted source code for an algorithm on the usenet, claiming it
to be equivalent to RC4. There is very strong evidence that the posted algorithm is indeed equivalent to RC4. The
algorithm is very fast. Its security is unknown, but breaking it does not seem trivial either. Because of its speed,
it may have uses in certain applications. It accepts keys of arbitrary length.
RC4 is essentially a pseudo random number generator, and
the output of the generator is exclusive-ored with the data stream. For this reason, it is very important that
the same RC4 key never be used to encrypt two different data streams.
Modes Of Operation
Many commonly used ciphers are block ciphers.
Block ciphers transform a fixed-size block of data (usually 64 bits) it into another fixed-size block (possibly
64
bits wide again) using a function selected by the key. If key, input block and output block have all n bits,
a block cipher basically defines a one-to-one mapping from n-bit integers to permutations of n-bit integers.
If the same block is encrypted twice with the same key, the
resulting ciphertext blocks are also the same (this mode of encryption is called electronic
code book, or ECB). This information could be useful for an attacker. To cause identical
plaintext blocks being encrypt to different ciphertext blocks, two standard modes are commonly used:
- CBC
(cipher block chaining): a ciphertext block is obtained by first XORing the plaintext block with the previous ciphertext
block, and encrypting the resulting value. This way leading blocks influence all trailing blocks, which increases
the number of plaintext bits one ciphertext bit depends on, but also leads to synchronization problems if one block
is lost.
- CFB
(cipher feedback): a kth
ciphertext block is obtained by encrypting the (k-1)th ciphertext block and XORing the result onto the plaintext. Interestingly,
an CFB feedback loop can also be used as a pseudo-random number generator if one simply feeds one block of true
random data with trailing blocks of zeroes into the encryption routine (although the expected period of this PRNG
would be only about 2n/2 where n is the block size of the cipher).
Block ciphers have also interesting relationships to other
classes of ciphers. For example:
- Stream ciphers.
A stream cipher consists of a state machine that outputs at each state transition one bit of information. This
stream of output bits is commonly called the running
key. The encryption can be implemented by just exclusively-oring
the running key to the plaintext message.
The state machine is nothing more than a pseudo-random number
generator. For example, we can build one from a block cipher by encrypting repeatedly its own output. Typically,
more elaborate constructions are used for stream ciphers to obtain high-speed.
Some of the more well-known stream ciphers are RC4 and SEAL.
Several stream ciphers are based on linear-feedback shift registers (LFSR), such as A5/1 used in GSM. These have
the benefit of being very fast (several times faster than usual block ciphers).
- SSSC
(self-synchronizing stream ciphers): The class of self-synchronizing
stream ciphers has the convenient property that
it forgets about bit flips or even dropped bits after a short time (say, a few hundred bits).
SSSC's can be constructed using block ciphers in a CFB-related
way. Assume that we have already encrypted n bits of the message and know that much of the ciphertext (where n denotes
the block length of the cipher). Then we produce a new running key (see above) bit by encrypting the n ciphertext
bits. Take one bit of the output of the cipher to be the new running key bit. Now moving one bit further we can
iterate this procedure for the whole length of the message.
It is easy to see that a one bit error in a ciphertext cannot
affect the decrypted plaintext after n bits. This makes the cipher self-synchronizing.
The block cipher used should have sufficiently large block
size to avoid substitution attacks, for example.
More information on cipher modes can be found in the Handbook
by Menezes et al.
Before 1970's
In this section some of the famous ciphers of the past are
listed, with links to more complete information where possible.
- Fish
was used by Germans in the WWII to encipher high-command communications. It was produced by a stream cipher called
Lorentz machine. Fish was the name given to it by British cryptanalysts. It was important because it caused difficulties
for British analysts, who finally developed a machine called Colossus, which was arguably the first, or one of
the first, digital computer.
The Colossus machine may have been an important factor in
the planning and success of the Allied attack on Normandia. Given intelligence produced by Fish cryptanalysis Allied
forces knew the positions of pretty much every German division.
- Enigma was the cipher used by the Germans in World War II. The machine used
several rotors and looked like a typing machine. However, first Polish and then later British mathematicians were
able to keep up with the development of the machine. Most communication using the basic version of Enigma was deciphered
by British analysts at Bletchley Park within few hours of the interception. One of the strongest Enigma's were
used in submarine communication, but British analysts managed to break them with great implications to the battle
on Atlantic.
There are several good books about Enigma, and Bletchley
Park. Also the work of the major figure of British cryptanalysis, Alan Turing, has been explained in many articles
and books. Recently his original notes about cryptanalysis from that time has been released for public.
This cipher, or one variant of it, is used by the unix crypt(1)
program. It is unlikely that any variant of Enigma could be considered very secure by todays standards.
- Vernam cipher is described in detail above.
- Substitution
cipher. This is one of the basic pencil-and-paper
methods. Make a table by first listing your alphabet in order on the first row, and then making a random permutation
of the alphabet on the second row. You can then encrypt any character of the alphabet by first looking it up on
the first row, and the writing down the random character on the second row. The key of this method is the permutation
of the alphabet on the second row. Decryption works in reverse.
This method is suspectible to frequency analysis, as long
as the alphabet size is small. Modern block ciphers can be seen as a variant of this idea, in the sense that they
try hide the message under a very large alphabet that depends on the key. In block cipher case the permutation
is generated by the secret key and the key space might not cover all the possible permutations.
- Vigenere. This cipher uses the clock arithmetic to add together the key and
the message. The difference between OTP and Vigenere is that in Vigenere we explictly reuse the short key several
times for one message.
Methods for attacking Vigenere ciphers are the Kasiski test,
index of coincidence etc. These lead to effective methods which break even very short message (relative to the
key size of course).
- Hill cipher. The Hill cipher uses matrices in clock arithmetic, and are highly suspectible
to known plaintext attacks.
Cryptographic Hash Functions
- SHA-1 (Secure Hash Algorithm) (also SHS, Secure Hash Standard): This is a cryptographic hash algorithm published
by the United States Government. It produces an 160 bit hash value from an arbitrary length string. Many people
consider it quite good.
The official standard text can be found here.
- Tiger
is a recent hash algorithm developed by Anderson and Biham. It is available from ftp.funet.fi:/pub/crypt/hash/tiger.
- RIPEMD-160
is a hash algorithm designed to replace MD4 and MD5 (see below). It produces a digest of 20 bytes (160
bits, hence the name), reportedly runs at 40 Mb/s on a 90 MHz Pentium and has been placed in the public domain by its designers.
The RIPEMD-160 homepage is at http://www.esat.kuleuven.ac.be/~bosselae/ripemd160.cfm
- MD5 (Message Digest Algorithm 5) is a cryptographic hash algorithm developed
at RSA Laboratories. It can be used to hash an arbitrary length byte string into a 128
bit value.
MD5's ancestor, MD4 has been broken, and there are some concerns
about the safety of MD5 as well. For instance, "keyed MD5" (typically used for authentication by having
a shared secret, and computing an authentication value by hashing first the secret (as a key), and then the data
to be hashed) has been reported to be broken. It is also reported that one could build a special-purpose machine
costing a few million dollars to find a plaintext matching given hash value in a few weeks.
Despite these problems, MD5 is still in wide use and reasonbly
safe for non-cryptographic applications of hash-functions.
- MD2,
MD4:
These are older hash algorithms from RSA Data Security. They have known flaws (Hans Dobbertin, FSE'96, LNCS 1039),
and their use is not recommended.
Random Number Generators
Cryptographic systems need cryptographically strong (pseudo)
random numbers that cannot be guessed by an attacker. Random numbers are typically used to generate session keys,
and their quality is critical for the quality of the resulting systems. The random number generator is easily overlooked,
and becomes easily the weakest point of the system.
Some machines may have special purpose hardware noise generators.
Noise from the leak current of a diode or transistor, least significant bits of audio inputs, times between interrupts,
etc. are all good sources of randomness when processed with a suitable cryptographical hash function. It is a good
idea to acquire true environmental noise whenever possible.
One cryptographical random number generator is Yarrow by
Counterpane. A good page to search for further material on (statical) pseudo-random
number generators is the pLab
site. Any cryptographically good pseudo-random number
generator should pass all the basic test for statistical randomness.
|