Invariant Properties

  • rss
  • Home

Rethinking ‘dump’

Bear Giles | May 18, 2011

Backup programs are interesting critters. You need one that backs up what you do but you also end up only doing things that you can back up. The most commonly used formats, tar and zip, are great for what they do but there are some serious limitations for a modern system.

  • Sparse File – a sparse file allows you to create a file that seems to be quite large but that only actually consumes disk space when you write data to it. A good modern use is creating a disk image – you can create a  lage sparse file, put a filesystem on it, and only require a bit more disk space than the files you put in it. The file will expand to the nominal size when you burn it to a CD or DVD but you want your backups to be more memory-efficient.
  • Extended Attributes – boolean extended attributes beyond the standard Unix discretionary attributes (e.g., read/write/execute). Some examples are:
    • immutable – nobody can modify the file, not even root.
    • append-only – the file cannot be modified other than appending data to it. You’ll often see this attribute on log files.
    • secure undelete – the disk sectors are zeroed out when the file is deleted
    • do not backup – don’t back up this file. You’ll often see this attribute on private key files.
  • Access Control Lists – extended attributes that provide finer access control than the standard Unix controls. E.g., you can say that a file is read-only except for user ids 1003, 1073 and 1083 who can read it and user 1077 can’t read the file at all.
  • Mandatory Access Control Labels (SELinux) – these are mandatory access control labels. E.g., you can label a file to say that it’s used by the web server and all of the policies associated with the web server should be applied to it.

Many if not all of these attributes are being standardized but there’s no support in (standard) tar and zip formats. We don’t need this functionality on the typical home system but they can be critical on servers.

So what’s wrong with ‘dump’ for Ext2/3/4 filesystems?

  • No Indexing – it should be possible to quickly determine whether a file is present in an archive and to retrieve it. ‘Dump’ provides limited support with a proprietary data format but it’s not easily to create an index spanning multiple archives.
  • No Error Detection – there is no way to determine that an archived file has been corrupted.
  • No Encryption – there is no native encryption for the archives. This is important if you write your archive directly to tape or are unable to load the complete archive for decryption before restoring files.

There is one additional issue when performing disk-based backups – a modern kernel will cache a great deal of information and the raw block device may not be fully consistent. LVM-based partitions will help tremendously – we can sync(1) the filesystem and immediately create a snapshot. Applications may still have unwritten caches but we can’t do anything about that without taking the system down to a quiet state and unmounting the partition.

Important: do not attempt to create file-based backups of running databases! Use the backup program provided by the database if you need to back up a running database.

So why doesn’t anyone do something about this? Funny you should ask…. In fact I’ve started working on this and hope to submit a patch to the maintainers soon.  The first patch will provide a SQLite index to the archive in addition to the existing format. The second patch will provide error detection.

Encryption support is much more difficult. It’s easy to do something, but it’s also easy to screw up and have a much weaker system than you realized.

Categories
linux, security
Comments rss
Comments rss
Trackback
Trackback

« Configuring Ubuntu 11.04 To Send Mail via Google Apps Buffer Encryption with OpenSSL »

Leave a Reply

Click here to cancel reply.

You must be logged in to post a comment.

Archives

  • May 2020 (1)
  • March 2019 (1)
  • August 2018 (1)
  • May 2018 (1)
  • February 2018 (1)
  • November 2017 (4)
  • January 2017 (3)
  • June 2016 (1)
  • May 2016 (1)
  • April 2016 (2)
  • March 2016 (1)
  • February 2016 (3)
  • January 2016 (6)
  • December 2015 (2)
  • November 2015 (3)
  • October 2015 (2)
  • August 2015 (4)
  • July 2015 (2)
  • June 2015 (2)
  • January 2015 (1)
  • December 2014 (6)
  • October 2014 (1)
  • September 2014 (2)
  • August 2014 (1)
  • July 2014 (1)
  • June 2014 (2)
  • May 2014 (2)
  • April 2014 (1)
  • March 2014 (1)
  • February 2014 (3)
  • January 2014 (6)
  • December 2013 (13)
  • November 2013 (6)
  • October 2013 (3)
  • September 2013 (2)
  • August 2013 (5)
  • June 2013 (1)
  • May 2013 (2)
  • March 2013 (1)
  • November 2012 (1)
  • October 2012 (3)
  • September 2012 (2)
  • May 2012 (6)
  • January 2012 (2)
  • December 2011 (12)
  • July 2011 (1)
  • June 2011 (2)
  • May 2011 (5)
  • April 2011 (6)
  • March 2011 (4)
  • February 2011 (3)
  • October 2010 (6)
  • September 2010 (8)

Recent Posts

  • 8-bit Breadboard Computer: Good Encapsulation!
  • Where are all the posts?
  • Better Ad Blocking Through Pi-Hole and Local Caching
  • The difference between APIs and SPIs
  • Hadoop: User Impersonation with Kerberos Authentication

Meta

  • Log in
  • Entries RSS
  • Comments RSS
  • WordPress.org

Pages

  • About Me
  • Notebook: Common XML Tasks
  • Notebook: Database/Webapp Security
  • Notebook: Development Tips

Syndication

Java Code Geeks

Know Your Rights

Support Bloggers' Rights
Demand Your dotRIGHTS

Security

  • Dark Reading
  • Krebs On Security Krebs On Security
  • Naked Security Naked Security
  • Schneier on Security Schneier on Security
  • TaoSecurity TaoSecurity

Politics

  • ACLU ACLU
  • EFF EFF

News

  • Ars technica Ars technica
  • Kevin Drum at Mother Jones Kevin Drum at Mother Jones
  • Raw Story Raw Story
  • Tech Dirt Tech Dirt
  • Vice Vice

Spam Blocked

53,793 spam blocked by Akismet
rss Comments rss valid xhtml 1.1 design by jide powered by Wordpress get firefox