Notice that more accurate information can be found in "New Methods in Hard Disk Encryption", Chapter 3, available from http://clemens.endorphin.org/cryptography.

Linux hard disk encryption settings

This page intends to educate the reader about the existing weaknesses of the public-IV on-disk format commonly used with cryptoloop and dm-crypt (used in IV-plain mode). This page aims to facilitate risk calculation when utilising Linux hard disk encryption. The attacks presented on this page may pose a thread to you, but at the same time may be totally irrelevant for others. At the end of this document, the reader should be able to make a good choice according to his security needs.

A good quote with respect to this topic is ''All security involves trade-offs'' from Beyond Fear (Bruce Schneier). You should keep in mind that perfect security is unachievable and by all means shouldn't be your goal. For instance, when using pass phrase based cryptography, you have to trust in that the underlying system is secure, the computer system has not been tampered with, and nobody is watching you. The most obvious weakness is the last one, but even if you make sure nobody nor any camera is around, how about the keyboard you're typing on? Has it been manipulated while you have been getting your lunch?

So security comes for a price, and the price when designing cryptography security algorithms is performance. You will be introduced to the fastest of all setups available, the "public-IV", which sacrifices security properties for speed. After that we will talk about ESSIV, the newest of IV modes implemented. It comes for a small price, but it can deal with watermarking for a relatively small price. Then you'll be introduced to the draft specifications of the Security in Storage Working Group (SISWG). Currently SISWG is considering EME and LRW for standardisation. EME along with it's cousin CMC seems to provide the best security level, but imposes additional encryption steps. Plumb-IV is discussed only for reference, because it has the same performance penalty as CMC, but in constrast suffers from weaknesses of CBC encryption.

As convention, this document will use the term "blocks", when it referes to a single block of plain or cipher text (usually 16 byte), and will use the term "sectors", when it refers to a 512-byte wide hard disk block.

CBC Mode: The basic level

Most hard disk encryption systems utilise CBC to encrypt bulk data. Good descriptions on CBC and other common cipher modes are available at

Wikipedia
Connected: An Internet Encyclopedia
NIST: Recommendation for Block Cipher Modes of Operation (CBC is at PDF Page 17)

Please make sure you're familiar with CBC before proceeding.

Since CBC encryption is a recursive algorithm, the encryption of the n-th block requires the encryption of all preceding blocks, 0 till n-1. Thus, if we would run the whole hard disk encryption in CBC mode, one would have to re-encrypt the whole hard disk, if the first computation step changed, this is, when the first plain text block changed. Of course, this is an undesired property, therefore the CBC chaining is cut every sector and restarted with a new initialisation vector (IV), so we can encrypt sectors individually. The choice of the sector as smallest unit matches with the smallest unit of hard disks, where a sector is also atomic in terms of access.

For reference, I will give a formal definition of CBC encryption and decryption. Note, that decryption is not recursive, in contrast to encryption, since it's a function only of C_n-1 and C_n.

Encryption:

C_-1 = IV
C_n = E(P_n ⊕ C_n-1)

Decryption:

C_-1 = IV
P_n = C_n-1 ⊕ D(C_n)

The next sections will deal with how this IV is chosen.

The IV Modes

The "public-IV"

The IV for sector n is simply the 32-bit version of the number n encoded in little-endian padded with zeros to the block-size of the cipher used, if necessary. This is the most simple IV mode, but at the same the most vulnerable.

ESSIV

E(Sector|Salt) IV, short ESSIV, derives the IV from key material via encryption of the sector number with a hashed version of the key material, the salt. ESSIV does not specify a particular hash algorithm, but the digest size of the hash must be an accepted key size for the block cipher in use. As the IV depends on a none public piece of information, the key, the sequence of IV is not known, and the attacks based on this can't be launched.

plumb IV

The IV is computed by hashing (or MAC-ing) the plain text from the second block till the last. Additionally, the sector number and the key are used as input as well. If a byte changes in the plain text of the blocks 2 till n, the first block is influenced by the change of the IV. As the first encryption effects all subsequent encryption steps due to the nature of CBC, the whole sector is changed.

Decryption is possible because CBC is not recursive for decryption. The prerequisites for a successful CBC decryption are two subsequent cipher blocks. The former one is decrypted and the first one is XOR-ed into the decryption result yielding the original plain text. Therefore independent of the IV scheme, decryption is possible from the 2nd to the last block. After the recovery of these plain text blocks, the IV can be computed, and finally the first block can be decrypted as well.

The only weakness of this scheme is it's performance. It has to process data twice: first for obtaining the IV, and then to produce the CBC encryption with this IV. With the same performance penalty CMC is able to achieve better security properties (CMC is discussed later), thus plumb-IV will remain unimplemented.

The attack arsenal

Content leaks

This attack can be mounted against any system operating in CBC Mode. It rests on the property, that in CBC decryption, the preceding cipher block's influence is simple, that is, it's XORed into the plain text. The preceding cipher block, C_n-1, is readily available on disk (for n > 0) or may be deduced from the IV (for n = 0). If an attacker finds two blocks with identical cipher text, he knows that both cipher texts have been formed according to:

C_m = E(P_m ⊕ C_m-1 )
C_n = E(P_n ⊕ C_n-1 )

Since he found that C_m = C_n, it holds

P_m ⊕ C_m-1 = P_n ⊕ C_n-1

which can be rewritten as

C_m-1 ⊕ C_n-1 = P_n ⊕ P_m

The left hand side is known to the attacker by reading the preceding cipher text from disk. If one of the blocks is the first block of a sector, the IV must be examined instead (when it's available as it is in public-IV). The attacker is now able to deduce the difference between the plain texts by examining the difference of C_m-1 and C_n-1. If one of the plain text blocks happens to be zero, the difference yields the original content of the other related plain text block.

Another information is available to the attacker. Any succeeding identical pair of cipher text, that follows the initial identical cipher pair, is equal. No information about the content of those pairs can be extracted, since the information is extracted from the respective preceding cipher blocks, but those are all required to be equal.

Let's have a look at the chance of succeeding with this attack. Assuming the output of a cipher forms an uniform distribution, the chance, p, of finding an identical block is 2^-blocksize. For instance, p = 1/2^128 for a 128-bit cipher. Because the number of possible pairs develops as an arithmetic series in n, the number of sectors, the chance of not finding two identical blocks is given by

(1-p)^n(n-1)/2

As p is very small, but in contrast the power is very big, we apply the logarithm to get meaningful answers, that is

n(n-1)/2 ln (1-p)

An example: The number of cipher blocks available on 200GB disk with known C_n-1 is 200GB × 1024² KB/GB × 64/1KB ¹. Or in other words, a 128-bit block is 16 bytes, so the number of 16-byte blocks in a 200GB hard disk is 13.4 billion. Therefore, n = 1.342e10. For a 128-bit cipher, p = 2^-128. Hence,

ln(1-p) = -2.939e-39
n(n-1)/2 = 9.007e19

n(n-1)/2 ln (1-p) = -2.647e-19
1-e^-2.776e-13 = 2.647e-19

The last term is the chance of finding at least one pair of identical cipher blocks. But how does this number grow in n? Obviously exponentially. Plotting a few a decimal powers shows that the chance for finding at least on identical cipher pair flips to 1 around n = 10^20 (n = 10^40 for a 256-bit cipher). This inflexion point is reached for a 146 million TB storage (or a hundered thousand trillion trillions TB storage for a 256-bit cipher).

¹The blocks with available preceding cipher blocks is 62/1KB for all non-public IV schemes, i.e. ESSIV/plumb IV

Data existence leak: The Watermark

No IV format discussed on this page allows the user to deny the existence of encrypted data. Neither cryptoloop nor dm-crypt is an implementation of a deniable cryptography system. But the problem is more serious with public-IV.

With public IV and the predicable difference it introduces in the first blocks of a sequence of plain text, data can be watermarked, which means, the watermarked data is detectable even when the key has not been recovered. As shown in the paragraph above, the existence of two blocks with identical cipher text is very unlikely and coincidence can be excluded, which is relevant when somebody tries to demonstrate before the law that certain data is in an encrypted partition.

As the IV progresses with a foreseeable pattern and is guaranteed to change the least significant bit ever step, we can build identical pair of cipher text by writing three consecutive sectors each with a flipped LSB relative to the previous. (The reason it's three instead of two is, that the second least significant bit might change as well.) This "public-IV"-driven CBC encryption will output exactly the same cipher text for two consecutive sectors. An attacker can search the disk for identical consecutive blocks to find the watermark. This can be done in a single pass, and is much more feasible than finding to identical blocks, that are scattered on the disk, as in the previous attack. A few bits of information can be encoded into the watermarks, which might serve as tag to prove the existence copyright infringing material.

A complete description of watermarking can be found in Encrypted Watermarks and Linux Laptop Security. The attack can be defeated by using ESSIV.

Data modification leak

CBC encryption is recursive, so the n-th block depends on all previous blocks. But the other way round would also be nice. Why? The weakness becomes visible, if storage on a remote computer is used, or more likely, the hard disk exhibits good forensic properties. The point is, the attacker has to have access to preceding (in time) cipher text of a sector, either by recording it from the network, or by using forensic methods.

An attacker can now guess data modification patterns by examining the historic data. If a sector is overwritten with a partial changed plain text, there is an amount of bytes at the beginning, which are unchanged. This point of change² is directly reflected in the cipher text. So an attacker can deduce the point of the change in plain text by finding the point where the cipher text starts to differ.

This weakness is present in public-IV and ESSIV.

²aligned to the cipher block size boundaries

Malleable plain text

The decryption structure of CBC is the source of this weakness. Malleability (with respect to cryptography) is defined as a modification of the cipher text that will resulting in a predictable change in plain text. To put it formally, there is a function f(C), that, if applied to the cipher text, C' = f(C), will result in a known function f', which will predict the resulting plain text, P' = D(C'), correctly assuming P is known, that is P' = f'(P).

As we can see in it's definition, CBC decryption depends on C_n-1. An attacker can flip arbitrary bits in the plain text by flipping bit in C_n-1. More formally³, if

P = P₁ || P₂ || ... || P_i || ... || P_n
C = E_CBC(P)
C = C₁ || C₂ || ... || C_i-1 || ... || C_n

the function

f(C₁ || ... || C_n) = C₁ || ... || C_i-1 XOR M || ... || C_n

follows the function f', which predicts the resulting plain text correctly as,

f'(P₁ || ... || P_n) = P₁ || ... || P_i XOR M || ... || P_n

The first block of the CBC cipher text stream is not malleable, because it depends on the IV, which is not modifiable for an attacker.

³The IV parameter for E_CBC has been intentionally omitted.

Movable

On the expense of one block decrypting to garbage, an attacker can move around plain text as he likes. CBC decryption depends on two variables, C_n-1 and C_n. Both can be modified at free will. To make meaningful modifications, an attacker has to replace the pair C_n-1 and C_n with other cipher text pair from disk. The first block C_n-1 will decrypt to garbage, but the second block C_n will yield a copy of the plain text of the copied cipher block. This attack is also known as copy&paste attack. This attack is mountable against any CBC setup. The only limitation is, the first block, C₀, can't be replaced with something meaningful, as C_-1 can't be modified, because it's the IV.

CMC and EME: Tweakable wide block cipher modes

CMC is a new chaining mode. It stands for ''CBC-Mask-CBC''. It works by processing the data in three steps, first CBC, then masking the cipher text, and then another CBC step, but this time backwards. The last step introduces a dependency from the last block to the first block. The authors of the CMC paper provide a prove for the security of this mode, making a secure 128-bit cipher a secure 4096-bit cipher (sector size). As in normal CBC, this scheme also takes an IV, but the authors call it tweak.

EME is CMC's cousin. EME has also been authored by Haveli and Rogaway as well been authored for the same purpose. The difference to CMC is, that EME is parallelizable, that is, all operations of the underlying cipher can be evaluated in parallel. To introduce an interdependency among the resulting cipher blocks, the encryption happens in two stages. Between these stages a mask is computed from all intermediate blocks and applied to each intermediate block. This step causes an interdependency among the cipher blocks. After applying the mask, another encryption step diffuses the mask.

The interdependency among the resulting blocks allow CMC and EME to be nonmovable, nonmalleable, to prevent content leaks and in-sector data modification patterns. The tweaks are encrypted by both cipher modes, thus both are nonwatermarkable.

For simplicity, the EME description above omitted the pre- and post- whitening steps as well as the multiplications in GF(2^128). An in-depth specification can be found at the Cryptology ePrint Archive. An applicable draft specification for EME-32-AES can be found at SISWG. I have written an EME-32-AES test implementation for Linux 2.6. It's available here. The CMC paper is available from the Cryptology ePrint Archive as well.

LRW: A tweakable narrow block cipher mode

EME as well as CMC are comparatively secure cipher modes, but heavy in terms of performance. LRW tries to cope with most of security requirements, and at the same time provide a good performance. LRW is a narrow block cipher mode, that is, it operates only on one block, instead of a whole sector. To make a cipher block tied to a location on disk (to make it unmovable), a logical index is included in the computation. For LRW you have to provide two keys, one for the cipher and one for the cipher mode. The second key is multiplied with a logical index under GF(2^128) and used as pre- and post- whitening for encryption. With those whitening steps the block is effectively tied to a logical index. The logical index is usually the absolute position on disk measured with the block size of the cipher algorithm. The different choice of the measuring unit is the only different between the logical index and the public-IV.

The LRW draft is available from the SISWG mailing list archive.

Summarising

The following table shows a comparison between the security properties of different encryption setups and their computational costs. The number of cipher calls, XOR operations and additional operations are stated in terms of encryption blocks, n.

IV mode	cipher mode	content leaks	watermarkable	malleable	movable	modification detection⁵	cipher calls	XOR ops	additional op.
public-IV	CBC	Yes	Yes	Yes	Yes	Yes	n	n	None
ESSIV	CBC	Yes	No	Yes	Yes	Yes	n+1	n	None
Plumb-IV1⁴	CBC	Yes	No	Yes	Yes	No	2n-1	2n	None
public-IV	CMC	No	No	No	No	No	2n+1	2n+1	1 LW GF ⊗
public-IV	EME	No	No	No	No	No	2n+1	5n	3n-1 LW GF ⊗
public-IV	LRW	No	No	No	No	Yes	n	2n	n HW GF ⊗

Legend:

LW GF ⊗: light-weight Galois field multiplication, that is, a multiplication with a constant x^2ⁱ, which can be computed in θ(1).
HW GF ⊗: heavy-weight Galois field multiplication, that is, a multiplication with an arbitrary constant, which can be computed in θ(bits).

⁴plumb-IV1 uses CBC-MAC instead of hashing, so we can make a good comparison with other ciphers in terms of cipher/XOR calls.
⁵detectable partial in-sector modification

Clemens Fruhwirth, , also author of LUKS and ESSIV, porter of cryptoloop, aes-i586 for 2.6., twofish-i586, and implementor of EME-32-AES. This text is an excerpt of my diploma thesis.

This page has been reviewed by

Dr. Ernst Molitor
Arno Wagner
James Hughes , "Security in Storage Working Group" chair

Additional thanks to Pascal Brisset, for pointing out an error in the Bernoulli estimation in an earlier version of this document, further Adam J. Richter for pointing out an error in the KB/GB ratio.