Backup trick: moving user caches into /var/cache

By Bear GilesNo Comments

I’ve been refining my backup scripts and want to share a quick trick.

There are many caches under the ~/.cache directory. Pretty much by definition caches only contain information that can be downloaded or recomputed again… and in fact it might cause problems if caches are restored. This is such an important point that tar(1) explicitly recognizes the CACHEDIR.TAG file as a marker if you specify the –excluded-caches tag and its variants.

One drawback to this approach is that you need to remember to set that flag. If you don’t your archives will explode in size.

Fortunately there’s a straightforward solution.

  1. Create /var/cache/users/user directory, change its ownership to user:user.
  2. Move chromium, evolution and mozilla caches (as appropriate):
    1. mv ~/.cache/chromium /var/cache/users/user/chromium.
    2. ln -s /var/cache/users/user/chromium /home/user/.cache/chromium.
  3. Move thunderbird cache (as appropriate):
    1. mv ~/.thunderbird /var/cache/users/user/thunderbird.
    2. ln -s /var/cache/users/user/thunderbird /home/user/.thunderbird.

As a developer I want to do the same thing to my maven repository at .m2/repository.

At this point routine backups of my home directory won't pick up the caches unless I add an explicit --dereference (or -h) flag. This makes backups faster and leaner.

It should go without saying that you should only do this with true caches that you can easily regenerate. You'll also want to document the need to recreate the /var/cache/users tree when restoring the system.

linux

Some Simple Questions

By Bear GilesNo Comments

Some simple questions for the developers of Garmin Training Center, the bundled software with Garmin GPS sports watches.

Why does your graph include paces of -15 or even -30 minutes/mile?

You can NEVER have a negative pace. It’s a positive pace going in the opposite direction. All this does is compress the range of the meaningful information, often to the point where it’s unusable.

Why does your graph include paces of more than 60 minutes/mile?

This one is a little more ambiguous. I’ve never seen my pace slower than 60 minutes/mile when I have it set to auto-pause. (I’m not actually going that slowly, it’s when I’m stopped at traffic lights or because my dog is doing dog stuff.) So for me anything beyond that, again, compresses the useful data to where it’s unusable.

I suppose it’s possible that you can see a slower pace if the watch isn’t set to auto-pause… but if someone is going that slowly does a chart even make sense? Isn’t this in the realm of the drift we see from the nature of the GPS system itself?

Why does your graph of heart rates include 250 beats-per-minute?

Can someone have a HR over 200 BPM?

Outside of a hospital?

Okay, I’ll grant that a teenager with a naturally high heart rate range could have a HR just over 200. But NOBODY is going to have a HR over 250 BPM. So why does the chart include that wasted space?

The bigger lesson

The technical lesson, of course, is that auto-ranging needs to be used intelligently. A lot of the time there are natural constraints on the meaningful range, be them physical, legal, or anything else. If you ignore that you can seriously compromise the usefulness of your charts.

This application has had this problem for a very long time and it’s very visible to anyone who uses it. Why wasn’t it fixed years ago? Is it a case where the only thing worse than learning your website has been down for a day is realizing that your website has been down for a day and nobody noticed? Is the app so bad that nobody ever uses it?

For what it’s worth this is a big reason why I stopped using Training Center a few years ago. I only noticed this because I recently bought a newer watch and that required a newer version of Training Center (to download data from the watch) and it still has the same problem.

Uncategorized

Using Sequences as Encryption IVs

By Bear GilesNo Comments

Anyone using low-level encryption libraries should know that you need both a SecretKey and a “random” IV. The easiest way to get a good IV is to use a SecureRandom instance to generate the necessary value.

This approach has one major drawback – it requires you to store or transmit a 16- or even 32-byte value. This is not always a trivial concern. As one example many applications require the user to enter a BASE-32-encoded license key and including a full IV can double the length of the license key. Longer keys can be a real headache when dealing with low-bandwidth channels such as a telephone call to customer support or stickers on optical media – people are more likely to mis-enter the key, you have to use a smaller font which could cause problems for visually impaired users, etc.

Another example is database records. A full IV can significantly increase the size of a record that contains only an integer primary key and an encrypted value – that means fewer records per page. Records/page isn’t a big concern on smaller databases but it matters as you scale up.

What if the full IV could be replaced by a 4-byte counter? That drops 12 bytes per record, or a staggering 20 characters when using BASE-32 encoding.

  • Using the counter directly is insecure. Merging the counter with a salt, e.g. by using XOR or adding it using BigInteger math, is better but still relatively insecure.
  • Encrypting the counter with the same encryption key as the data is also insecure.
  • Encrypting the counter using a separate encryption key is secure. You don’t have to mix the counter with a salt but there’s no downside to it.

The last approach gives us another benefit – it’s a natural way to split the effective encryption key into two pieces that can be stored independently. (E.g., one key is stored on the filesystem outside of the webapp and the other key is provided via the application container using JNDI.) If we use a salt with the counter we have three pieces that can be stored independently.

This gives us a lot of flexibility when using a low-level library like JCE. Higher-level libraries like ESAPI or GPG handle IVs themselves but it’s often possible to explicitly set the IV. For instance the ESAPI CipherSpec class provides a mechanism to set the IV.

Does this mean that we should always use a sequence to generate an IV? No – it requires a little more effort to generate an IV from a sequence than to use a random byte array. You also have to consider the possibility that someone will “improve” the implementation later and use the same encryption key for IV and data, or remove the encryption altogether. However it’s a good tool to have in your arsenal for the times when it is appropriate.

java, security

More on Password Encryption

By Bear GilesNo Comments

I was recently reading Spring Security 3 and it made an interesting point about some LDAP servers.

The servers never provide the (hashed) password.

That raises the immediate question of how you can possibly authenticate the user. The answer is straightforward – the underlying database query becomes

select prop1, prop2, prop3 from users where username='user' and password='hashed.password'

instead of

select * from users where username='user'

This is hard to enforce in a standard database unless you use stored procedures or, maybe, use views and column-based access permissions. The former would work well, the latter would undoubtably be fragile and quietly relaxed by someone who “didn’t see the harm”.

However the benefits of this approach should be obvious. If an attaker is limited to SQL injection the first approach prevents attackers from seeing the hashed passwords (provided the column is properly protected). The second approach does not. Of course this is not a panacea since there are many other attack vectors.

How do you handle hashing?

This approach immediately raises the question of how you handle hashing. There are three immediate possibilities.

Use a system-wide salt. This is the easiest approach but it’s also the weakest.

Use a hybrid username-based/system-wide salt. Salts based on usernames alone are fairly weak since they’re easy to guess. Hybrid salts based on usernames and a system-wide salt are stronger. A minimum salt would be H(username.secret).

Use a random salt. Random salts are the strongest but they require two queries. The first gets the salt by username, the second gets the rest of the data by username and hashed password.

In all cases the standard hashing rules apply. If you use a different app, e.g., LDAP, to manage the authentication information then you should do whatever it expects. If you’re rolling your own you should use a trusted library like bcrypt or modern hashing techniques.

A quick hand-wave of the latter:

  1. byte[] hash(String password, String username, String secret) {
  2.    byte[] salt = crypto.hash(username + secret);
  3.    byte[] hash = crypto.hash(password + salt);
  4.  
  5.    for (int round = 0; round < 1000; round++) {
  6.       hash = crypto.hash(hash ^ salt);
  7.    }
  8.  
  9.    return hash;
  10. }
byte[] hash(String password, String username, String secret) {
   byte[] salt = crypto.hash(username + secret);
   byte[] hash = crypto.hash(password + salt);

   for (int round = 0; round < 1000; round++) {
      hash = crypto.hash(hash ^ salt);
   }

   return hash;
}

The multiple hashes are required since serious attackers now have access to complete rainbow tables and GPU farms.

LDAP

Going back a step – why does the spring security book discuss LDAP?

There are several reasons. First, there are now several good embedded LDAP implementations. It can be a separate .rar in your .ear file, it could even be a single .war file. This should be no more surprising than an embedded database implementation like Jetty or H2.

Second, the implementation has already been written. You have to configure it, of course, but you don’t have to write any code.

Finally, it integrates with enterprise systems. This isn’t an issue with a public-facing site but is very important with internal sites.

java, security

Database Encryption Using JPA Listeners

By Bear GilesNo Comments

I recently had to add database encryption to a few fields and discovered a lot of bad advice out there.

Architectural Issues

The biggest problem is architectural. If your persistence manager quietly handles your encryption then, by definition, your architecture demands a tight and unnecessary binding between your persistence and security designs. You can’t touch one without also touching the other.

This might seem to be unavoidable but there is a respected school of thought that the best architecture is one where you have independent teams of app developers and security developers. The app developers can’t be sloppy but overall their sole focus is feature completion. The security developers are responsible for designing and implementing the security. The only places where both pieces are considered is at the architectural and top-level design.

In the past this wasn’t very practical but Aspect-Oriented Programming (AOP) and similar concepts have changed that. Now it’s entirely reasonable to inject an interceptor between the service and persistence layer so values the caller isn’t authorized to see are quietly dropped. A list of 10 items might be cut to 7, or an update might throw an exception instead of modifying a read-only value. It’s a little more complicated when persisting collections but the general approach should be clear.

The key thing here is that the app developers never need to see the security code. It can all be handled by AOP injection added via configuration files at the time of deployment. More importantly it can be changed at any time without requiring modification of the application itself. (You may need to perform an update process that will change values in the database.)

An interceptor can even prevent calls to undocumented methods – one less worry about rogue programmers.

In practice many sites will have several developers wear both hats instead of having a dedicated security team. That’s not a problem as long as they can keep their distinct responsibilities in mind.

Transparent encryption in JPA or Hibernate fields is definitely better than putting the encryption/decryption code in your POJO but it still imposes a high level of unnecessary binding between the security and persistence layers. It also has serious security issues.

Security Issues

There is a critical question any time you’re dealing with encryption – can this object be written to disk? The most obvious threat is serialization, e.g., by an appserver that is passivating data to free up memory or to migrate it to a different server.

In practice this means that your keys and plaintext content must be marked ‘transient’ (for the serialization engine) and ‘@Transient’ (for JPA or Hibernate). If you’re really paranoid you’ll even override the implicit serialization method writeObject so you can absolutely guarantee that these fields are never written to disk.

This works… but it blows the transparent encryption/decryption out of the water since the entire point of that code is to make these fields look like just another field. You must maintain two fields – a persistent encrypted value and a transient unencrypted value – and have some way to keep them in sync. All done without putting any crypto code into your pojo.

A more subtle problem is that your objects may still be written to disk if an attacker can trigger a core dump by crashing the appserver. Careful site administrators will disable core dumps but many overlook it. It’s harder to work around this but it’s possible if the AOP decrypts/encrypts values immediately around the methods that need the decrypted values. Your application shouldn’t care where the decryption occurs as long as it’s decrypted when it needs it. This is the type of decision that should be left to the security team.

A third way objects can be written to disk is via OS swap files but that should be a non-issue since swap files are usually encrypted now.

JPA EntityListeners

A solution is JPA EntityListeners or the corresponding Hibernate class. These are Listener classes that can provide methods called before or after database object creation, deletion or modification.

Sample Code

This is easiest to see with some sample code. Consider a situation where we must keep a user’s password for a third-party site. In this case we must use encryption, not hashing.

(Note: I doubt this is the actual information required by Twitter for third-party apps – it’s solely for the purpose of illustration.)

The entity

  1. /**
  2.  * Conventional POJO. Following other conventions the sensitive
  3.  * information is written to a secondary table in addition to being
  4.  * encrypted.
  5.  */
  6. @Entity
  7. @Table(name="twitter")
  8. @SecondaryTable(name="twitter_pw", pkJoinColumns=@PrimaryKeyJoinColumn(name="twitter_id"))
  9. @EntityListeners(TwitterUserPasswordListener.class)
  10. public class TwitterUser {
  11.    private Integer id;
  12.    private String twitterUser
  13.    private String encryptedPassword;
  14.    transient private String password;
  15.  
  16.    @Id
  17.    @GeneratedValue(strategy = GenerationType.IDENTITY)
  18.    public Integer getId() { return id; }
  19.  
  20.    @Column(name = "twitter_user")
  21.    public String getTwitterUser() { return twitterUser; }
  22.  
  23.    @Column(name = "twitter_pw", table = "twitter_pw")
  24.    @Lob
  25.    public String getEncryptedPassword() { return encryptedPassword; }
  26.  
  27.    @Transient
  28.    public String getPassword() { return password; }
  29.  
  30.    // similar definitions for setters....
  31. }
/**
 * Conventional POJO. Following other conventions the sensitive
 * information is written to a secondary table in addition to being
 * encrypted.
 */
@Entity
@Table(name="twitter")
@SecondaryTable(name="twitter_pw", pkJoinColumns=@PrimaryKeyJoinColumn(name="twitter_id"))
@EntityListeners(TwitterUserPasswordListener.class)
public class TwitterUser {
   private Integer id;
   private String twitterUser
   private String encryptedPassword;
   transient private String password;

   @Id
   @GeneratedValue(strategy = GenerationType.IDENTITY)
   public Integer getId() { return id; }

   @Column(name = "twitter_user")
   public String getTwitterUser() { return twitterUser; }

   @Column(name = "twitter_pw", table = "twitter_pw")
   @Lob
   public String getEncryptedPassword() { return encryptedPassword; }

   @Transient
   public String getPassword() { return password; }

   // similar definitions for setters....
}

The DAO

  1. /**
  2.  * Conventional DAO to access login information.
  3.  */
  4. @LocalBean
  5. @Stateless
  6. public class TwitterDao {
  7.    @PersistenceContext
  8.    private EntityManager em;
  9.  
  10.    /**
  11.     * Read an object from the database.
  12.     */
  13.    @TransactionAttribute(TransactionAttributeType.SUPPORTS)
  14.    public TwitterUser getUserById(Integer id) {
  15.       return em.find(TwitterUser.class, id);
  16.    }
  17.  
  18.    /**
  19.     * Create a new record in the database.
  20.     */
  21.    @TransactionAttribute(TransactionAttributeType.REQUIRED)
  22.    public saveTwitterUser(TwitterUser user) {
  23.       em.persist(user);
  24.    }
  25.  
  26.    /**
  27.     * Update an existing record in the database.
  28.     *
  29.     * Note: this method uses JPA semantics. The Hibernate
  30.     * saveOrUpdate() method uses slightly different semantics
  31.     * but the required changes are straightforward.
  32.     */
  33.    @TransactionAttribute(TransactionAttributeType.REQUIRED)
  34.    public updateTwitterUser(TwitterUser user) {
  35.       TwitterUser tw = em.merge(user);
  36.  
  37.       // we need to make one change from the standard method -
  38.       // during a 'merge' the old data read from the database
  39.       // will result in the decrypted value overwriting the new
  40.       // plaintext value - changes won't be persisted! This isn't
  41.       // a problem when the object is eventually evicted from
  42.       // the JPA/Hibernate cache so we're fine as long as we
  43.       // explicitly copy any fields that are hit by the listener.
  44.       tw.setPassword(user.getPassword());
  45.  
  46.       return tw;
  47.    }
/**
 * Conventional DAO to access login information.
 */
@LocalBean
@Stateless
public class TwitterDao {
   @PersistenceContext
   private EntityManager em;

   /**
    * Read an object from the database.
    */
   @TransactionAttribute(TransactionAttributeType.SUPPORTS)
   public TwitterUser getUserById(Integer id) {
      return em.find(TwitterUser.class, id);
   }

   /**
    * Create a new record in the database.
    */
   @TransactionAttribute(TransactionAttributeType.REQUIRED)
   public saveTwitterUser(TwitterUser user) {
      em.persist(user);
   }

   /**
    * Update an existing record in the database.
    *
    * Note: this method uses JPA semantics. The Hibernate
    * saveOrUpdate() method uses slightly different semantics
    * but the required changes are straightforward.
    */
   @TransactionAttribute(TransactionAttributeType.REQUIRED)
   public updateTwitterUser(TwitterUser user) {
      TwitterUser tw = em.merge(user);

      // we need to make one change from the standard method -
      // during a 'merge' the old data read from the database
      // will result in the decrypted value overwriting the new
      // plaintext value - changes won't be persisted! This isn't
      // a problem when the object is eventually evicted from
      // the JPA/Hibernate cache so we're fine as long as we
      // explicitly copy any fields that are hit by the listener.
      tw.setPassword(user.getPassword());

      return tw;
   }

The EntityListener

To keep a clean separation between the persistence and security layers the listener does nothing but call a service that handles the encryption. It is completely ignorant of the encryption details.

  1. public class TwitterUserPasswordListener {
  2.    @Inject
  3.    private EncryptorBean encryptor;
  4.  
  5.    /**
  6.     * Decrypt password after loading.
  7.     */
  8.    @PostLoad
  9.    @PostUpdate
  10.    public void decryptPassword(Object pc) {
  11.       if (!(pc instanceof TwitterUser)) {
  12.          return;
  13.       }
  14.  
  15.       TwitterUser user = (TwitterUser) pc;
  16.       user.setPassword(null);
  17.  
  18.       if (user.getEncryptedPassword() != null) {
  19.          user.setPassword(
  20.             encryptor.decryptString(user.getEncryptedPassword());
  21.       }
  22.    }
  23.  
  24.    /**
  25.     * Decrypt password before persisting
  26.     */
  27.    @PrePersist
  28.    @PreUpdate
  29.    public void encryptPassword(Object pc) {
  30.       if (!(pc instanceof TwitterUser)) {
  31.          return;
  32.       }
  33.  
  34.       TwitterUser user = (TwitterUser) pc;
  35.       user.setEncryptedPassword(null);
  36.  
  37.       if (user.getPassword() != null) {
  38.          user.setEncryptedPassword(
  39.             encryptor.encryptString(user.getPassword());
  40.       }
  41.    }
  42. }
public class TwitterUserPasswordListener {
   @Inject
   private EncryptorBean encryptor;

   /**
    * Decrypt password after loading.
    */
   @PostLoad
   @PostUpdate
   public void decryptPassword(Object pc) {
      if (!(pc instanceof TwitterUser)) {
         return;
      }

      TwitterUser user = (TwitterUser) pc;
      user.setPassword(null);

      if (user.getEncryptedPassword() != null) {
         user.setPassword(
            encryptor.decryptString(user.getEncryptedPassword());
      }
   }

   /**
    * Decrypt password before persisting
    */
   @PrePersist
   @PreUpdate
   public void encryptPassword(Object pc) {
      if (!(pc instanceof TwitterUser)) {
         return;
      }

      TwitterUser user = (TwitterUser) pc;
      user.setEncryptedPassword(null);

      if (user.getPassword() != null) {
         user.setEncryptedPassword(
            encryptor.encryptString(user.getPassword());
      }
   }
}

The EncryptorBean

The EncryptorBean handles encryption but does not know what’s being encrypted. This is a minimal implementation – in practice we’ll probably want to pass a keyId in addition to the ciphertext/plaintext. This would allow us to quietly rotate encryption keys with minimal disruption – something that is definitely not possible with the usual ‘easy encryption’ approaches.

This class uses OWASP/ESAPI for encryption since 1) it should already be used by your application and 2) the portable format allows other applications to use our database as long as they also use the OWASP/ESAPI library.

The implementation only covers Strings – a robust solution should have methods for all primitive types and possibly domain-specific classes such as credit cards.

  1. import org.owasp.esapi.ESAPI;
  2. import org.owasp.esapi.Encryptor;
  3. import org.owasp.esapi.codecs.Base64;
  4. import org.owasp.esapi.crypto.CipherText;
  5. import org.owasp.esapi.crypto.PlainText;
  6. import org.owasp.esapi.errors.EncryptionException;
  7. import org.owasp.esapi.reference.crypto.JavaEncryptor;
  8.  
  9. @Stateless
  10. public class EncryptorBean {
  11.    private static final String PBE_ALGORITHM = "PBEWITHSHA256AND128BITAES-CBC-BC";
  12.    private static final String ALGORITHM = "AES";
  13.  
  14.    // hardcoded for demonstration use. In production you might get the
  15.    // salt from the filesystem and the password from a appserver JNDI value.
  16.    private static final String SALT = "WR9bdtN3tMHg75PDK9PoIQ==";
  17.    private static final char[] PASSWORD = "password".toCharArray();
  18.  
  19.    // the key
  20.    private transient SecretKey key;
  21.  
  22.    /**
  23.     * Constructor creates secret key. In production we may want
  24.     * to avoid keeping the secret key hanging around in memory for
  25.     * very long.
  26.     */
  27.    public EncryptorBean() {
  28.       try {
  29.          // create the PBE key
  30.          KeySpec spec = new PBEKeySpec(PASSWORD, Base64.decode(SALT), 1024);
  31.          SecretKey skey = SecretKeyFactory.getInstance(PBE_ALGORITHM).generateSecret(spec);
  32.          // recast key as straightforward AES without padding.
  33.          key = new SecretKeySpec(skey.getEncoded(), ALGORITHM);
  34.       } catch (SecurityException ex) {
  35.          // handle appropriately...
  36.       }
  37.    }
  38.  
  39.    /**
  40.     * Decrypt String
  41.     */
  42.    public String decryptString(String ciphertext) {
  43.       String plaintext = null;
  44.  
  45.       if (ciphertext != null) {
  46.          try {
  47.             Encryptor encryptor = JavaEncryptor.getInstance();
  48.             CipherText ct = CipherText.from PortableSerializedBytes(Base64.decode(ciphertext));
  49.             plaintext = encryptor.decrypt(key, ct).toString();
  50.          } catch (EncryptionException e) {
  51.             // handle exception. Perhaps set value to null?
  52.          }
  53.       }
  54.  
  55.       return plaintext;
  56.    }
  57.  
  58.    /**
  59.     * Encrypt String
  60.     */
  61.    public String encryptString(String plaintext) {
  62.       String ciphertext= null;
  63.  
  64.       if (plaintext!= null) {
  65.          try {
  66.             Encryptor encryptor = JavaEncryptor.getInstance();
  67.             CipherText ct = encryptor.encrypt(key, new PlaintText(plaintext));
  68.             ciphertext = Base64.encodeBytes(ct.asPortableSerializedByteArray());
  69.          } catch (EncryptionException e) {
  70.             // handle exception. Perhaps set value to null?
  71.          }
  72.       }
  73.  
  74.       return ciphertext;
  75.    }
  76. }
import org.owasp.esapi.ESAPI;
import org.owasp.esapi.Encryptor;
import org.owasp.esapi.codecs.Base64;
import org.owasp.esapi.crypto.CipherText;
import org.owasp.esapi.crypto.PlainText;
import org.owasp.esapi.errors.EncryptionException;
import org.owasp.esapi.reference.crypto.JavaEncryptor;

@Stateless
public class EncryptorBean {
   private static final String PBE_ALGORITHM = "PBEWITHSHA256AND128BITAES-CBC-BC";
   private static final String ALGORITHM = "AES";

   // hardcoded for demonstration use. In production you might get the
   // salt from the filesystem and the password from a appserver JNDI value.
   private static final String SALT = "WR9bdtN3tMHg75PDK9PoIQ==";
   private static final char[] PASSWORD = "password".toCharArray();

   // the key
   private transient SecretKey key;

   /**
    * Constructor creates secret key. In production we may want
    * to avoid keeping the secret key hanging around in memory for
    * very long.
    */
   public EncryptorBean() {
      try {
         // create the PBE key
         KeySpec spec = new PBEKeySpec(PASSWORD, Base64.decode(SALT), 1024);
         SecretKey skey = SecretKeyFactory.getInstance(PBE_ALGORITHM).generateSecret(spec);
         // recast key as straightforward AES without padding.
         key = new SecretKeySpec(skey.getEncoded(), ALGORITHM);
      } catch (SecurityException ex) {
         // handle appropriately...
      }
   }

   /**
    * Decrypt String
    */
   public String decryptString(String ciphertext) {
      String plaintext = null;

      if (ciphertext != null) {
         try {
            Encryptor encryptor = JavaEncryptor.getInstance();
            CipherText ct = CipherText.from PortableSerializedBytes(Base64.decode(ciphertext));
            plaintext = encryptor.decrypt(key, ct).toString();
         } catch (EncryptionException e) {
            // handle exception. Perhaps set value to null?
         }
      }

      return plaintext;
   }

   /**
    * Encrypt String
    */
   public String encryptString(String plaintext) {
      String ciphertext= null;

      if (plaintext!= null) {
         try {
            Encryptor encryptor = JavaEncryptor.getInstance();
            CipherText ct = encryptor.encrypt(key, new PlaintText(plaintext));
            ciphertext = Base64.encodeBytes(ct.asPortableSerializedByteArray());
         } catch (EncryptionException e) {
            // handle exception. Perhaps set value to null?
         }
      }

      return ciphertext;
   }
}

Final Thoughts

There is no reason why you must have a one-to-one relationship between unencrypted and encrypted fields. It is perfectly reasonable to bundle related fields into a single value – in fact it is probably preferable to encrypting each field individually. The values could be represented in CSV, XML, JSON, or even a properties file.

Uncategorized

PL/Java Code Finally Available

By Bear GilesNo Comments

About a year ago I published a number of articles on PL/Java:

I had always intended to publish the code but never had the time to clean it up for publication – fleshing out the unit tests, adding the copyright and licensing notices, etc.

No longer – I’ve created a googlecode project for my project. At the moment it only has two user-defined types (Rational and Complex) and the unit tests are far from complete but I’ll flesh it out. Check back in… 10 months! :-)

Google Code Project: PostgreSQL PL/Java examples.

I should point out that there’s a known bug in both UDTs when performing implicit casts. If the first implicit cast is from a string everything works. If the first implicit cast is from an int then all implicit casts are screwed up. I’m following up on this on the pl/java mailing list.

(Sidenote: there’s also a project containing the code I was using in my discussion on digital certificates. It’s much more ambitious and still needs a lot of work but it’s reached the ‘minimally useful’ threshold. Google Code Project: Otter CA.)

java

Not Responsible For Broken Windshields

By Bear GilesNo Comments

My wife says I get grouchy when I’m studying or doing security. I have no idea why….

Today’s beef is with the open-bed truck companies that have “not responsible for broken windshield” stickers on the back of their trucks.

It’s bullshit.

Seriously. Grade A bullshit.

In the first place the state law is absolutely clear that drivers are responsible for anything that falls from their vehicles. Full stop.

In the second place you can’t unilaterally impose contracts. Full stop. That statement is as unenforceable as one saying “The driver of the following car must run forward and give driver of this truck $100 at next red light.”

(A broken windshield doesn’t directly benefit the trucking company but in both cases you’re out hard cash that you could have used elsewhere.)

In the final place you can’t impose restrictions on other’s use of public spaces. Full stop. There is an apparent exception when you reserve a picnic table or campsite but that restriction is actually imposed by the city etc., not the person who rented the space.

As a practical matter it is nearly impossible to prove that your broken windshield was caused by something that blew off of a particular truck. You might have seen it happen but can you prove it in a courtroom? So is there any real harm?

I think there – it reduces moral risk. If a driver is told that he’s going to be docked $1000 by his employer if something flies out of his truck and damages a car behind him then he’s going to take a lot more care to ensure the load is properly capped than if he thinks that he won’t face any consequences no matter how poorly he secured his load. This won’t change the behavior of a conscientious driver but not all drivers are conscientious.

What does this have to do with software? It comes back to the first three items. Things aren’t true just because you say they are. The law trumps your terms of service. The courts may say that your users weren’t actually bound by your terms of service if nothing of value was exchanged. (This is a particular concern with free sites.)

So how many corners are you cutting because you’re sure your TOS will protect you?

security

What’s On My Desk

By Bear GilesNo Comments

A few people have asked what’s on my desk since I’m posting (irregularly) on unusual topics. The answer is actually pretty boring. My posts are mostly the result of some lateral thinking and being unable to find any answers in google and stack exchange searches.

Scala

I’m taking the 7 week Functional Programming Principles in Scala course at Coursera. I’m also reading the book Programming In Scala. (I have the first edition.) I’ll probably also pick up Scala in Depth and/or Scala in Action.

As a rule I’ve found that it takes about 6 months in a new language to become familiar with the standard libraries and at least two years to become familiar with the ecosystem. (That’s the difference between being familiar with java.util.* etc and being familiar with Spring/EJB3, hibernate, at least one web framework, plus whatever you need for your specific tasks.)

So do I think I’ll be fluent in Scala in two months? No, of course not, but I should be able to work in the language even if I don’t yet have a good familiarity with, e.g., Play or Akka.

Sidenote: Ruby and JRuby are popular in many shops in the Boulder area.

Information Security

I’m also taking the 10 week Information Security and Risk Management in Context course at Coursera. The Coursera class is free but you can also enroll at the University of Washington for a certification program or for graduate level credit.

I’m still on the fence on this class. I’m a techie but have been studying for PMP and CISSP certs to get a broader perspective (but see below). The first week focused on the role of the CISO (C-level executive for information security – think CEO, CFO, CIO, CTO, etc.) and that’s a bit too far from my world. But we’ll see how the next few classes go.

Do Certificates Have Value?

There are two answers to this question. The less important one is that they can get you past the HR gatekeepers. Today the tech job market is extremely hot but at times in the past, and undoubtably at times in the future, there were far more applicants than positions and the HR gatekeepers would use things like certifications to winnow the resumes. The cert wouldn’t get you the job but it might get you the interview.

The more important answer is that studying for a cert forces you to take a broader view. I’ve never used much of what I learned when studying for my Security+ and java certs… but I did use things that I would have never seen unless I had studied for those exams.

Hence studying for the PMP and CISSP exams. I don’t have the practical experience for either cert but I’ve learned a tremendous amount by studying for them. And who knows – I’m still on the fence about getting a CISSP (Assoc) cert. I could have probably passed an earlier revision of the exam but it’s a moving target.

Prep Work For Next Job

EJB3 in Action. I know the Spring framework very well but some sites use EJB3. This should go quickly since I’ve read the first edition of the book and took a 3-day class on EJB 3.1 on the Sun Oracle campus in Broomfield, CO.

PCI DSS specification. These are the security requirements for any system that manages credit card data. I’ve touched on many of them previously. Again this should go quickly (cough) since I’ve already read the specification and I’ll just be refreshing my memory.

java, security

Adding Native Encryption to ‘dump’, part 2: Cryptanalysis

By Bear GilesNo Comments

This is the second of a two articles on adding native encryption to the Unix ‘dump’ application. We conclude with a cryptanalysis of our options. N.B., this is an overview and not a formal paper.

Threat Analysis

DUMP files are most commonly used in two situations:

  • Backups which are used for disaster recovery and must only be kept for a week to month. The main concern is unauthorized disclosure.
  • Archives which are used in legal proceedings and must be kept indefinitely. The main concerns are unauthorized disclosure, data integrity and nonreputability.

The second case is more restrictive so it will be used in subsequent analysis.

A few requirements follow directly from the nature of our problem:

  • We cannot encrypt the full archive as a whole – besides performance issues archives are often multiple gigabytes in size and a single bit error could cause a massive loss of data.
  • We cannot run a third-party app such as GPG on individual files or tape segments due to the performance hits involved in launching an app thousands of times. We may be able to use their supporting libraries.
  • We must use standard algorithms.

Some typical threat scenarios are:

  • An employee leaves a backup tape in an unlocked car and it is stolen. (Note: “backup tape” includes laptops containing backup files.)
  • A rogue employee duplicates a backup tape and provides it to a third party.
  • A rogue employee replaces a valid backup tape with one provided by a third party and “restores” files from it.
  • A lawyer prepares to respond to a wrongful termination suit and retrieves the appropriate archive tapes. He does not have the necessary decryption keys.
  • Same lawyer – he retrieves the appropriate archive tapes but is unable to state with full confidence that the files have not been modified.
  • Same lawyer – he retrieves the appropriate archive tapes and believes they have been modified but has no way to prove it.
  • Same lawyer – but one of the archive tapes is missing.

(On rogue employees – everyone seeks to hire the best but all it takes is one person turned by the classics – cash, drugs or hookers.)

These scenarios show us the following requirements:

  • The archive must be encrypted with keys and algorithm appropriate for long term (10+ year) storage. Call it AES encryption for symmetric encryption, 2048-bit keys for RSA encryption, for archives written today.
  • We should have a mechanism to rekey archives, if possible. (There may be legal reasons why archives can’t be rekeyed regardless of circumstances.)
  • The archive must provide digital signatures to detect modifications.

There are several additional practical requirements.

Key Management

The necessary key management for a solid encryption system is straightforward.

Session Key – a unique, random session key is created for each archive. The session key contains two (AES) keys. One key is used to encrypt the data, the second key is used to encrypt a nonce to be used as the Initial Vector (IV) for the data encryption.

Nonce – a known value used to produce the Initial Vector (IV). We have two good candidates – the inode value when performing per-file encryption and the tapea value when performing per-tape-segment encryption.

Key Encryption – the session key can be stored, encrypted by a public key, can be stored in either the standard TAPE segment header or in a new DUMP segment. In the former case the encrypted session key can be stored in every tape segment, otherwise it will need to be written once at the top of each volume. The session key should be encrypted by multiple public keys to facilitate recovery.

Digital Signatures I – the data must be digitally signed at the level of the individual (per-file or per-tape segment) level.

Digital Signatures II – the data must be digitally signed at the volume level.

Finally there’s the possibility of desiring to perform no compression or authentication, just block-to-block substitution in the same fashion as disk-wide encryption. There are several solutions to this problem but its hard to foresee much demand for it.

Approach 1: Per-Tape Segment Encryption

This approach requires the least change to the existing format.

Encryption

  • Write the TAPE segment header. The header should contain extensions that indicate 1) that encryption has been used and 2) the encryption algorithm, e.g., AES/CBC/PKCS5Padding. (Anyone suggesting ECB will be taken out and shot.)
  • Create an Initial Vector by encrypting the TAPE segment’s tapea value with the IV session key.
  • Write the payload of the tape segment to a buffer. Append an HMAC value for the payload for each private key. This provides non-repudiation.
  • Compress the results.
  • Encrypt the results using the data session key and the IV determined above. Use padding.
  • Write the encrypted data to the tape. This provides data secrecy.

The order of these steps is important in order to prevent information disclosure.

Decryption

  • Read the TAPE segment header and check whether encryption has been used. Assuming it has been…
  • Create an Initial Vector by encrypting the TAPE segment’s tapea value with the IV session key.
  • Read the encrypted data and attempt to decrypt it into a buffer. If padding has been used a bad encryption key will cause the decryption to fail. (Strictly speaking there’s a remote chance that decryption will still ‘succeed’ but that’s handled below.)
  • Decompress the buffer.
  • Attempt to verify the HMAC values(s) using one or more public keys.
  • If an HMAC matches then we have confidence that the contents of this tape segment have been unmodified and we can write them to disk.

The primary benefit to this approach is that it requires the least change to the existing format. The primary drawback is that it requires the developer to be comfortable using cryptographic libraries correctly. That’s a serious hurdle – cryptographic libraries are notorious for nuances that can make the difference between a strong system and one that’s easily cracked.

Approach 2: Per-Tape Segment Encryption using OpenPGP Payloads

A slightly more complex approach is to replace the standard tape segment payload with an OpenPGP Message (RFC 4880) segment. Specifically we want to create a Sym. Encrypted Integrity Protected Data Packet that contains a Compressed Data Packet.

If this approach is taken then the archive can also provide the keying material in OpenPGP packets.

The primary benefit to this approach is that open source libraries are widely available that implement the OpenPGP specification so developers are less likely to misuse the cryptographic libraries. The drawback is that developers must learn how to use a new library in a non-standard way.

Approach 3: Per-File Encryption using OpenPGP Payloads

A much more complex approach is to encrypt the contents of the file with one or more Sym. Encrypted Integrity Protected Data Packets. The compressed data will have to be suitably padded to a full data block.

The primary benefit to this approach is that a minimally modified ‘restore’ application can still work with these files. Extracted files will still be encrypted but with an easy change I believe they could be converted to standard PGP/GPG encrypted files. This provides substantially stronger security since it would permit relatively untrusted parties to extract encrypted files.

The primary drawback is that it requires a substantially more complex process and will probably lose information about ‘holes’ in the file.

N.B., if this approach is used the encrypted files should not include the standard PGP/GPG headers, key material, etc. since it is unnecessary and will only serve to bloat the archive.

Conclusion

The first approach follows the spirit of the existing format the best but it can be tricky to code correctly and will require a lot of work to develop the necessary key management tools.

The second approach is a compromise that introduces a new dependency but which allows the application to use standard PGP/GPG software for key management.

The final approach is arguably the most useful but would only be appropriate during a major refactor. That should never be undertaken casually but the size of the dump/restore application is modest and adding support for a GUI interface (e.g., gnome) would require a refactorization anyway.

The best choice comes down to policy. If you see the DUMP format as dying as new filesystems are introduced then the best choice is probably #2 (due to the key management issues). If you see the format having a future and plan to introduce a GUI then the best choice is clearly #3. It is possible to do both (setting a header bit appropriately) but I believe that would introduce unnecessary confusion.

Sidenote: digital certificates and keystores are more likely to be used in a corporate environment. That’s a moot point though since it’s easy to get keypairs from either GPG/PGP keys and PKI keystores, the main thing is that there needs to be infrastructure to support key management and the mechanism used for it is somewhat irrelevant.

Addendum 9/21/2012

Another approach came to me shortly after publishing this page (naturally) – an OpenPGP packet could be used for each 1024 byte block instead of a full tape buffer. This will require minimum changes to the existing software but still allow individual encrypted files to be extracted. It also allows the INODE block to be written uncompressed. That results in a modest amount of data leak but it will make it much easier to recover from a corrupted archive.

linux, security

Adding native encryption to ‘dump’, part 1: Fast File System and DUMP format

By Bear Giles1 Comment

This is the first of a two articles on adding native encryption to the Unix ‘dump’ application. We begin with an overview of the Fast File System (FFS) and the format of DUMP tapes/files.

Fast File System

Most non-journaled Linux filesystems, including ext2, are ultimately derived from the Unix Fast File System (FFS). Some journaled fileystems, e.g., ext3, are also ultimately derived from the FFS. This format strongly drives the format of DUMP tapes/files. The FFS dates to the late 70s, a time when memory was extremely constrained and a large hard disk might be in the single megabyte range.

The most important thing about the the FFS is that it separates the disk into inodes and data blocks. The disk space required for inodes is allocated during formatting and cannot be modified. In the past files were typically fairly small and it was possible to run out of inodes before you filled a disk but today files are so large that many people recommend reducing the amount of space allocated to inodes since it can represent a substantial loss of usable memory.

Inodes

An contains all of the standard metadata about a file (owner, permissions, etc.) and a list of the data blocks indexes that contain the contents of the file. The inode does not contain the name of the file.

The structure of the data block list is highly optimized for small files. If more than 14 (iirc) data blocks are required then the final two slots contain the index of data blocks that contain the first- and second-tier indirect tables. (I’m not 100% certain that all filesystems store the indirect tables in the data blocks – some or all may store them in the inode space.)

Sparse files

The FFS does not require that all data blocks be allocated – it allows there to be ‘holes’ in a file. These are called sparse files. This might seem useless but it can greatly simplify some tasks. In fact the SQLite3 library used by, e.g., firefox, will create sparse files.

In these files a read operation simply returns a buffer containing null values. A write operation will trigger allocation of the necessary space. This is important to keep in mind since a sparse file can be far larger than the size of the media. The backup format needs to keep this mind or else you may have a file that can’t be backed up or restored because of its size.

Data blocks

The data blocks contain directories and files. Directories are standard files that contain a simple list of filename and inode numbers. The first two entries in a directory are always ‘.’ (the directory itself) and ‘..’ (the parent directory). An inode can be referenced by more than one directory entry, each reference is called a ‘hard link’. (The inode itself may contain a reference to a directory entry. In this case the reference is called a ‘soft link’.)

The original FFS used a simple unordered list of data blocks and performance could be seriously degraded if there were more than a hundred entries. More recent designs added optimizations such as keeping the directory entries sorted or even creating a tree structure.

Files are… files. Arbitrary data.

More recent designs also store extended attributes in data blocks. These are arbitrary attributes associated with an inode, e.g., SELinux labels.

Limitations of TAR and ZIP formats

There are two major limitations of the TAR and ZIP formats with FFS files. The first limitation is that most implementations of TAR do not intelligently handle sparse files. This means that sparse files are blown up to their full size in the archives and subsequently extracted at their full size. A real-world example of this is creating a 20 GB sparse file, mounting it via a loop back device and then installing a Linux distribution on the virtual device. The virtual system will see itself on a 20 GB disk but the actual disk space required may be only 3-4 GB.

The second limitation is that TAR does not support extended attributes. I think ZIP does have support for extended attributes via a standard extension but many implementations will not implement it.

DUMP format

The DUMP format is a streaming format based on a low-level understanding of the FFS format. It has four segments:

  1. CLRI (start of volume marker) bitmap)
  2. BITS (bitmap)
  3. list of INODE (inode information) and ADDR (file data)
  4. END (end of volume marker)

(In addition there’s a TAPE segment discussed below.)

The CLRI and BITS bitmaps contain information used by incremental backups. I will not discuss them.

The next segment contains a sequence of INODE and ADDR segments. The INODE segment contains the file attributes and a bitmap indicating any ‘holes’ in the file. This means that an INODE segments may require more than a single block. If the inode is associated with data the INODE segment is followed by one or more ADDR segments that contain the contents of the file, extended attributes, etc. Like TAR the data blocks must contain the entire disk block and are neither compressed or encrypted. (ZIP allows the file contents to be compressed and/or encrypted although the record header and footer must not be.) There can be more than one ADDR segment per inode, e.g., if there are extended attributes.

The DUMP utility has an optimization that all directory entries are written to the tape before the other regular (and special) files. This can be used, together with an external index file, to seek directly to any desired file.

The END record indicates the end of the volume. For historic reasons involving the limitations of tape drives there will typically be multiple END records. There is no ‘end of archive’ marker.

“Tape” modifications

Magnetic tapes were the only backup media used in the early days of Unix. (Earlier backup media included punch cards and paper tape!). The earliest media were reel-to-reel tapes, with self-contained cartridges introduced later.

Tape media has a number of physical limitations – it can be stretched or broken, different drives may be calibrated slightly differently, etc. Sequential access, e.g., for backups, is straightforward but in actual use it’s often necessary to seek to a specific location on the tape, read it, do some processing, and then overwrite the tape. There’s always positioning errors. Finally even if you’re careful a spot defect in the media during manufacturing may cause single-bit errors.

To address all these problems tapes are written in small segments with a bit of empty space, a header, a payload of ca. 10 blocks, then some more empty space. This is a practical tradeoff between performance (you want larger tape segments) and reliability (you can recover data starting at the next tape segment). This is handled by a mixture of software and hardware.

The DUMP format discussed above is modified so that a stream is tokenized in the same manner. That is, the data stream is broken into a number of tape segments consisting of a TAPE segment header followed by a ‘blocksize’ number of data blocks. Historic archives may have a small blocksize but new files will routinely have a blocksize of 32k blocks or even higher.

The TAPE segment contains the dump date, hostname, devname (e.g., “/dev/sda3″), filesys (e.g., “/home”), dump level (0-9), human readable label, and misc. other values.

It cannot be overemphasized that breaking the archive into tape segments is entirely blind to the contents of that archive. It is nothing like the per-file headers and footers in other archive formats. In practice everyone writes their software to use the format specified above and then wrap it in a shell that handles nothing but the tape segments.

Tape compression

Tape compression was originally supported at the hardware level – the tape drive would write an uncompressed header followed by the compressed data. This is blind compression and makes no effort to understand what the data actually contains. This is very different from ZIP compression where each file is compressed separately and in total.

linux, security
Blue Taste Theme created by Jabox