Notebook: Database/Webapp Security
Tips of the trade for database and related webapp security. These tips are not presented in any particular order. I assume you’re already familiar with the concepts of ‘sql injection‘ and ‘little bobby tables’;.
Know how to write secure code
Know your threat model
Are you worried about fumble-fingered employees? Disgruntled employees? Script kiddies? Professional hackers motivated by money? Many of these ideas are cheap to implement but some aren’t and you want to make sure your effort goes to the best place.
Know your architecture
Some people argue that an ideal architecture has two elements:
- The application itself that is free of all security checks. You can go a long way by ensuring that you only present the user with valid choices, e.g., in search results.
- An set of Aspect Oriented Programming (AOP) classes that are interwoven at runtime to provide all of the security checks. These classes are virtually ignorant of the details of the application – they do everything though Subject IDs (SIDS) – users, Granted Authorities (GA) – roles, and Access Control Lists (ACL).
Know your tools
Modern databases are extremely powerful tools. Don’t treat it as just a file system. Learn about permissions, triggers, stored procedures, and more.
Always use positional parameters or an ORM
It’s been a long time since you could safely write your own sanitizer code. It’s easy in ASCII but a lot of people have been tripped up by unicode. You should leave this to the database drivers by using positional parameters or an ORM. Even this isn’t a guarantee since there’s very occasionally a disconnect between what the JDBC driver thinks needs to be escaped and what the database server actually needs to be escaped. (IIRC PostgreSQL and its JDBC driver were bit by this a few years ago.)
There’s one notable but non-portable exception to this. IF your database driver provides a sanitizer method AND IF you can make a compelling argument then you can create the query in a string buffer. But don’t do this casually – it’s easy for code rot to introduce a vulnerability.
What’s a compelling argument? How many positional parameters can you use? You might exceed that limit with a long multi-row insert.
Consider using stored procedures
Some people recommend using stored procedures in place of insert, update and delete. It’s not hard to revoke these permissions and prevent direct manipulation of the data via SQL. It’s true that a knowledgeable attacker could still get a list of stored procedures and figure out what procedures to call but that takes time and is much harder to do via SQL injection.
A few people go as far as recommending revoking select permissions in addition to the other three.
Quote arguments in stored procedures
Related to the last two items is the perception that you don’t need to worry about escaping data passed in via stored procedures. That’s not necessarily true – stored procedures still use those arguments in internal queries so there’s the potential for abuse. This is a limited (but real) potential problem with stored procedures that return a single record but a real cause for concern with stored procedures that return a SETOF records.
Fortunately it’s easy to quote passed values within the stored procedure. In PostgreSQL it’s the plpgsql commands quote_ident(value) and quote_literal(value).
Use different roles for database ‘owner’ and ‘user’
You can eliminate all DDL attacks by using a different roles for the database owner and user. Think landlord and tenant – the landlord (owner) owns the property and controls the structure (tables, triggers, etc.) The tenant (user) owns the contents (data) but can’t change the structure itself. If you take this to its logical conclusion the database owner shouldn’t have the basic CRUD permissions to the data either.
The webapp (or its container) needs the credentials for the database user. The database owner credentials should never be on the system. It’s also important to remember that there may be more than one ‘user’ in practice, e.g., the webapp itself, a batch processing task, even individuals accessing the underlying database via a swing application.
Consider using audit logs
Consider, for discussion, that you just discovered that you’ve been attacked. You don’t know when so you can’t simply restore a known-good backup, and that wouldn’t work anyway since you can’t discard all of the changes since then.
What do you do?
A good start is to maintain an audit log on critical tables. An audit log is simply a record of what was changed during database updates. It may still take a lot of work to pull out the information you need but you CAN see everything that changed. In a highly-secured environment, e.g., law enforcement, even an attempt to read a record is logged.
An application-level audit log is easiest to write and has the most information (e.g., current webapp user) but it will only work when the change occurs through the normal application. On the other hand application-level audit logs can be built into the application itself, e.g., providing a change log to users. The downsides to application-level audit logs are that it may not catch SQL Injection details and it definitely won’t catch any direct access to the database. This approach also requires mingling application and security code, something that’s poor architecture.
A database-level audit log is done via INSERT, UPDATE and DELETE triggers on the protected tables. Less information is available but it will be run no matter how the database is accessed. Database-level audit logs should never be accessed by the application.
Use a different schema
In either case you should use a different database schema for your audit logs. The webapp should never be able to update or delete audit log entries. In a highly-secured environment the webapp shouldn’t even be able to see what information was captured – don’t let the bad guy see what you know about what he did.
Secure audit logs
If you must have the highest confidence in your audit logs you should do two things. First, include a sequence number. Second, calculate an HMAC on the entire message including sequence number. This will allow you to detect any attempts to insert or delete records or to modify the contents of a record.
Use container-based authentication or a webapp filter
Very simple – the webapp needs to know who its users are. It doesn’t need to know how the details about how they are authenticated. The solution is to pull entirely authentication outside of the application. Let the container do it, or a webapp filter, or even an application on a separate system (e.g., SiteMinder).
There’s no need for the application to have access to the authentication information so even if you handle authentication within your application you should keep that data in a separate schema.
The authority information, that is, what they can do, is a little more complex. If it’s simple role-based authorizations you can pass it in at the same time you provide the authentication information. There’s no need for the application to see these tables either.
Access Control Lists
Access Control Lists (ACLs) are a different matter. There are too many to pass in via the container so they have to be loaded dynamically. Fortunately we can use a hybrid model where we create a canonical ACL on demand from the user’s roles and only keep exceptions in the database.
Authentication (authn)/Authorization (authz) tables
There should usually be audit triggers (on INSERT and UPDATE at a minimum) on the tables in the security schema.
Use a separate schema or in-memory database for session information
One common design is to keep little more than the user id in the container-managed session and to load additional information via a filter. This requires an additional database hit but is much more scalable than using heavy sessions.
This information is neither fish nor fowl. It’s not container-managed but it’s not content either. Unlike user credentials and content it’s ephemeral since it only lasts as long as the session. Finally it’s something of high value to an attacker seeking privilege escalation so it must be well-protected.
Put everything together you want this information to be maintained in a separate schema than your content. Better yet a single webapp container could use an in-memory database like H2 or Derby. This may not be practical if you have more than container since they must all access the same database.
Use strong hashes
If you must store passwords or other sensitive information that you don’t need to recover you should either use a salted hash or an HMAC digest. The options, in order of increasing strength, are:
- a common salt
- a non-random per-user salt (e.g., using the username as salt)
- a random per-user salt
- a random per-user salt + compound hashing (highly debated)
- a random per-user salt + HMAC
A HMAC is similar to a message digest (MAC) but is cryptographically signed. The main benefit is that you can detect if somebody has somehow slipped unauthorized authentication information into the database, e.g., via a direct database connection. It comes at the cost of additional complexity.
If you do not use any salt an attacker (with the hash values) could use an established ‘rainbow’ table to determine the user’s effective password. (It may not be what the user types but it has the same hash value.)
If you do not use a per-user salt the attacker can look for entries with the same hashed value. This could give him insight into the common password.
Consider using stored procedures to access user credentials
This goes back to the thoughts that
- It’s much more difficult for an attacker to identify and use stored procedures than the standard SQL commands
- User credentials and authz information never needs to be combined with other records via joins
- Access to user credentials should be a rare event that’s handled outside of your main application.
Given all of this the cost/benefit ratio of using stored procedures to access this information is pretty desirable.
Use encryption wisely
Encryption is difficult to use properly. Think hard whether the data can be partitioned somehow, e.g., into a separate schema that only a handful of methods can access. If you still need encryption read on.
Security classification levels
The first question is always whether you need to encrypt something. A good touchstone are three standard classification levels:
- Confidential – sensitive information that doesn’t cause direct harm if disclosed. (E.g., the encryption algorithm used.)
- Secret – operational details will materially harm someone until the details are changed.
- Top Secret – design details that will materially harm everyone and cannot easily be changed.
As an example consider a HTTPS site. Each user’s password is secret since anyone can impersonate them until the password is changed. The website’s certificate is top secret since anyone can impersonate the website itself until the certificate is invalidated and replaced — and even then many browsers won’t or can’t verify that the old certificate has been revoked.
Confidential information can be protected by usual access control.
Secret information should be encrypted.
Top secret information should be encrypted and kept outside of the database.
Know your legal restrictions
You cannot keep some information. For instance the CVV code on the back of the credit card must never be kept in any manner – you can only keep it long enough to get the authorization code from the bank and then it must be totally forgotten. If you’re discovered keeping it anyway the credit card companies will come down on you like a ton of bricks for breach of contract.
Keep only what you must
Do you need to keep a credit card number? Do you need to make recurring payments – if so, then keep it encrypted. If you need it to process easy refunds, then keep it encrypted but only for 30 days.
Do you only need it to verify receipts? Can you get by with just the last 4 digits of the card? What about a hash of the entire CC number?
The key point is that you don’t have to worry about protecting what you don’t have. Always ask yourself if you need to (or legally can) keep the data. Then ask yourself if a hash of the value will be enough.
Partition your sensitive data
We saw this with the user authentication information earlier. If you have a table with both sensitive and non-sensitive data split the data across two tables. The odds are that much of your activity will only require the non-sensitive information and you can avoid hassles by simply not loading it unless you actually need it.
Keep your keys secure
Don’t make it easy – break your master key into two or more pieces. This can be as simple as creating a random block the same length as your keyfile and XORing them together. Store the random file in one place (e.g., the database), store the XORed key in a second place (e.g., the filesystem) and re-XOR them when you need the key.
Boojums: Cryptographic Engineering (Section 8.3.4) covers a similar approach when discussing how to minimize data retention in memory. If we’re only protecting 256 bits, say, then we XOR the key with a 256-bit hash of the random value. That is, our two blocks are R and h(R)^key instead of R and R^key. I don’t know what the benefits are to adding the hash but when these experts talk I listen.
Use working keys
Don’t use your master key to encrypt your data. Use your master key to protect working keys. These are the keys used to encrypt the data.
Periodically change your keys
Assume your working keys will be stolen. Periodically change them. That is – go through the database and for each encrypted field decrypt it with the old key and re-encrypt it with the new key. This sounds easy but there are a lot of devils in the details since you don’t want to have to hit every record in a database with hundreds of millions or billions of record.
In practice a sliding windows of working keys is used and the row contains the encryption key id. Data is always written or updated with the most recent encryption key. When the key is finally decommissioned you only have to rekey the rows that haven’t been touched in months.
Where to handle encryption
The final question is where you should handle encryption. The easiest approach is to use Jasypt + Hibernate 3 to transparently handle everything but key management within the hibernate layer. (tutorial)
For truly sensitive information you want to perform the encryption and decryption in the business logic. (You can still use a DAO interface when pulling this information from the filesystem.)
A third approach I’ve been exploring is using AOP interceptors to transparently encrypt/decrypt data as it passes between methods. (E.g., the logic bean will pass unencrypted data to the DAO but the AOP layer will intercept it and encrypt it before passing it on to the DAO.) I suspect this is more effort than it’s worth.
Put presentation layer information in a separate (read-only) schema or database instance
This is a logical continuation of the ideas above. The presentation layer information (internationalized text, etc.) is only used by the presentation layer. The application data is only used by the application business logic. There’s almost no connection between them. So why not use a separate schema?
Benefits: it’s cleaner (since each schema does only one thing). It’s more resistant to ‘defacement’ attacks. It makes it easier to scale since you can pull the presentation information from a local database (read-only replication from parent) instead of having to call central database server. It lets the business layer feed multiple independent front ends without needing to make any backend changes.
Drawbacks: there is one place where the presentation and business logic overlap – labels for enumerated values. You want to make sure that every enumeration has a label but you can only do that if you know all possible values. This isn’t insurmountable, just the one place where you need to know something about the application.
Consider using a frozen in-memory database like H2
We can take this a step further – there’s no reason why the presentation and content must share a single database instance… in fact there are compelling architectural reasons why they should not. I think the main reason people have put everything into the same instance was the difficulty and expense in setting up and maintaining multiple database instances.
This has changed with high-quality in-memory databases like H2 and Derby. It’s now easy to create these databases and use them independently of your primary database instance.
Don’t put the admin user in the first row of the user table
How many webapps bootstrap by putting the administrative user in the first row of the user table? How many attackers know this?
There are a few simple solutions. Start the sequence at 100 – this will stop anyone who hardcodes ‘id = 0′ but not anyone who does ‘id = (select min(id) from users)’.
Put in a few tossaway accounts. Put ‘guest’ at id=0, not ‘admin’.
Related tip: don’t give the admin account an obvious name. Again the attackers know to try ‘admin’, ‘root’, ‘sa’, etc.
A typical query (using SQL injection) could be
- select * from user
- where username='username'
- and password='password'
- limit 1;
select * from user where username='username' and password='password' limit 1;
The problem occurs if the username is bob’ or user_id = 0; –
- select * from user
- where username='bob' or user_id = 0; --' and password='password' limit 1;
select * from user where username='bob' or user_id = 0; --' and password='password' limit 1;
Likewise a username of admin'; –
- select * from user
- where username='admin'; --' and password='password' limit 1;
select * from user where username='admin'; --' and password='password' limit 1;
We all use positional parameters so this isn’t a problem… but an attacker may have direct access to the database by other means.
A program could cycle through a few hundred or few thousand possible user ids. A script kiddie will probably only try a few common values.
Create a tripwire account
What do you put in the first row (id = 0) if not the admin account? What about a tripwire account? This can be done in two ways. First the user table can have a SELECT trigger that raises an alarm whenever somebody tries to read this record. You won’t have many details but WILL know that security has been breached somewhere. This also works if the attacker has circumvented your application, e.g. a direct connection to the database.
The second approach is to handle this in the application. Again the application has more information about what’s going on but it requires the security code to be tied into the application code. That’s often problematic.
The third approach is to handle this in the application via aspect injection. In this case the tripwire user would be a valid (but powerless) account and would behave identically to any ‘guest’ account. However AOP methods would be injected that watch for any activity on the tripwired account and take appropriate action.
There’s one gotcha on the last approach – you still need to modify the application data so the user doesn’t appear in any search results. The account should only be visible to someone using SQL injection.
All of this has to be balanced with the extra effort caused during troubleshooting on a live system. It’s far too easy for an innocent support query to be misinterpreted as an attack. Perhaps this is a case for different users – one for the webapp (which trips triggers) and a second for support staff (which does not). You can’t do just one thing though – if an attacker is able to get the support staff credentials then he won’t set off the triggers either.
Beware malicious insertions
We’re usually concerned with unauthorized disclosure and modification. Unauthorized deletions is usually a denial-of-service attack, not a more targeted attack. We shouldn’t overlook the possibility of malicious insertions. The best example of this ‘privilege escalation’ if the authorization table is attacked. This is maliciously adding ‘roles’ so someone with a guest account may suddenly have the same rights as a user, even an administrator’s rights.
Another example would be adding an ‘authorized user’ to a credit card account. There’s no need to extract a user’s password if you can get the system to give you access anyway.
Both of these are examples of why it’s so critical to protect the authentication/authorization information.
Encrypt JSF view state (session information)
JSF has the ability to store user session information in a comment at the bottom of the web page. It’s a simple serialization so it could be modified and returned by the user. As a result all JSF implementations have a way to encrypt the data and usually have it enabled by default. That’s enough to stop the clueless but not anyone who knows the default encryption key for that server release – so set your own key!
System Administration Tips
- Encrypt your backups.
- Do egress filtering so your webapp server can only connect to your database server – no other database servers.
- Do ingress and egress filtering so your database server can only talk to your webapp servers – no other servers.
- Don’t trust DNS.
- Don’t trust NTP.