Data Security

Dorothy E. Denning and Peter J. Denning

Overview

Absolute data security is impossible (security is economics), but we can try
to make the computer hardware and software not be the weakest links. This
paper examines four kinds of controls which provide needed protections to
different areas: access controls, flow controls, inference controls, and
cryptographic controls.

Access Controls

Access controls govern the availability of objects to users for various
uses. For example, records in a database, or files in a filesystem, can be
read or written by some users but not others. Three main features are
necessary:

   * Proper user identification is available
        o Passwords ("something you know") are easy to implement, but not
          hard to thwart.
        o (The paper did not mention tokens or smartcards: "something you
          have".)
        o Biometrics ("something you are") are expensive.
   * No snooping is going on, either:
        o Network snooping: watching the data being accessed by another
          person
        o Data retrieval: stealing backup tapes or disks
     Use encryption to foil snoops.
   * Access control information is very privileged
        o No normal user should be able to modify the list of who can access
          what data.

Different kinds of systems have different requirements.

   * Transaction Processing Systems
        o Users cannot write their own programs, so only the user interface
          needs to be protected.
        o The user interface can keep track of the user and only issue
          queries for records to which the user has access
   * General Purpose Systems
        o Protections in the runtime environment and the hardware are
          usually needed.
        o Use the Principle of Least Privilege to protect against trojan
          horses (and bugs!): compare setuid to rings of privilege
        o Use ACLs and capabilities to grant access to objects.
        o Revoking access is easy with a centralized capability list (though
          most Unices don't allow this); harder with distributed capabilites
          (in this case, you can "link" capabilities through the owner.
        o The important question is the "safety problem": can user X read
          file Y? Unfortunately, this is unsolvable (the halting problem can
          be reduced to this).

Flow Controls

Flow controls govern the ability of information to be transmitted from one
part of the system to another (or, ultimately, from one user to another).
The main idea is to assign a "security class" to each piece of data, and to
require that the security class of data cannot be lowered.

   * Often, this is very coarse-grained: processes have a security class,
     and can only read data of that class or lower, and only write to that
     class or higher.
   * This is a problem for processes that need to manipulate data of
     different classes; data tends to become overclassified.
   * Data flow analysis can alleviate this problem, but is potentially
     complicated and expensive. (taintperl does something like this.)
   * Covert channels (transmitting data by some non-obvious means, such as
     run time, power consumption, or load average) are extremely hard to
     eliminate.

Inference Controls

Inference controls govern the ability of users to determine specific
information in a database, if they are allowed to query for summary
information. The systems must try to make the cost of reconstructing the
specific information to be prohibitive. Three types of controls are
possible:

   * Restrict queries
        o minimum query set control
             + If a query results in only one record having a certain
               combination of characteristics, more specific queries can
               determine other information about that particular record.
             + It turns out, though, that restricting queries to those with
               a certain min and max size isn't good enough.
        o partitioned database
             + Store records in groups instead of individually; allow
               queries about goups only, not individuals.
             + Bad groupings can cause misleading statistics.
             + Dynamically changing databases can be expensive to
               continually regroup.
   * Distort responses
        o Adding random values to the data can usually be defeated.
        o Introducing large enough errors to defeat an attacker usually
          produces bad statistics.
        o Data swapping is better, but it is usually hard to find
          appropriate records whose fields could be swapped.
   * Random Samples
        o Apply queries only to (pseudo-)random samples of the database.
        o Dynamic databases do not really benefit from this, apparently.
        o Combining this and minimum query set control works well.

As well, use threat monitoring to watch for suspicious queries in log files
(but then what about the privacy issues of this?!).

Cryptographic Controls

Cryptographic controls govern who can read data that isn't protected (by an
operating system, for example). This includes data being transmitted over a
network, and data stored on disks or tapes. There are two major classes of
encryption:

   * Symmetric encryption
        o The key must be transmitted to the recepient separately, and in a
          secure manner.
        o Different schemes for key management exist (kerberos, for
          example); this is one of the most important parts of the system.
   * Asymmetric encryption
        o Slower than symmetric encryption, but no need to have a secure
          channel to transmit encryption keys.
        o There is still the issue of verifying the authenticity of public
          keys (PGP web of trust vs. SSL certification authorities).

(Usually, the performance issue is mitigated by using a hybrid approach:
pick a random key, use it to do symmetric encryption, and transmit the
result of this encryption, as well as the result of encrypting the key
itself with public key encryption.)