Data Security Dorothy E. Denning and Peter J. Denning Overview Absolute data security is impossible (security is economics), but we can try to make the computer hardware and software not be the weakest links. This paper examines four kinds of controls which provide needed protections to different areas: access controls, flow controls, inference controls, and cryptographic controls. Access Controls Access controls govern the availability of objects to users for various uses. For example, records in a database, or files in a filesystem, can be read or written by some users but not others. Three main features are necessary: * Proper user identification is available o Passwords ("something you know") are easy to implement, but not hard to thwart. o (The paper did not mention tokens or smartcards: "something you have".) o Biometrics ("something you are") are expensive. * No snooping is going on, either: o Network snooping: watching the data being accessed by another person o Data retrieval: stealing backup tapes or disks Use encryption to foil snoops. * Access control information is very privileged o No normal user should be able to modify the list of who can access what data. Different kinds of systems have different requirements. * Transaction Processing Systems o Users cannot write their own programs, so only the user interface needs to be protected. o The user interface can keep track of the user and only issue queries for records to which the user has access * General Purpose Systems o Protections in the runtime environment and the hardware are usually needed. o Use the Principle of Least Privilege to protect against trojan horses (and bugs!): compare setuid to rings of privilege o Use ACLs and capabilities to grant access to objects. o Revoking access is easy with a centralized capability list (though most Unices don't allow this); harder with distributed capabilites (in this case, you can "link" capabilities through the owner. o The important question is the "safety problem": can user X read file Y? Unfortunately, this is unsolvable (the halting problem can be reduced to this). Flow Controls Flow controls govern the ability of information to be transmitted from one part of the system to another (or, ultimately, from one user to another). The main idea is to assign a "security class" to each piece of data, and to require that the security class of data cannot be lowered. * Often, this is very coarse-grained: processes have a security class, and can only read data of that class or lower, and only write to that class or higher. * This is a problem for processes that need to manipulate data of different classes; data tends to become overclassified. * Data flow analysis can alleviate this problem, but is potentially complicated and expensive. (taintperl does something like this.) * Covert channels (transmitting data by some non-obvious means, such as run time, power consumption, or load average) are extremely hard to eliminate. Inference Controls Inference controls govern the ability of users to determine specific information in a database, if they are allowed to query for summary information. The systems must try to make the cost of reconstructing the specific information to be prohibitive. Three types of controls are possible: * Restrict queries o minimum query set control + If a query results in only one record having a certain combination of characteristics, more specific queries can determine other information about that particular record. + It turns out, though, that restricting queries to those with a certain min and max size isn't good enough. o partitioned database + Store records in groups instead of individually; allow queries about goups only, not individuals. + Bad groupings can cause misleading statistics. + Dynamically changing databases can be expensive to continually regroup. * Distort responses o Adding random values to the data can usually be defeated. o Introducing large enough errors to defeat an attacker usually produces bad statistics. o Data swapping is better, but it is usually hard to find appropriate records whose fields could be swapped. * Random Samples o Apply queries only to (pseudo-)random samples of the database. o Dynamic databases do not really benefit from this, apparently. o Combining this and minimum query set control works well. As well, use threat monitoring to watch for suspicious queries in log files (but then what about the privacy issues of this?!). Cryptographic Controls Cryptographic controls govern who can read data that isn't protected (by an operating system, for example). This includes data being transmitted over a network, and data stored on disks or tapes. There are two major classes of encryption: * Symmetric encryption o The key must be transmitted to the recepient separately, and in a secure manner. o Different schemes for key management exist (kerberos, for example); this is one of the most important parts of the system. * Asymmetric encryption o Slower than symmetric encryption, but no need to have a secure channel to transmit encryption keys. o There is still the issue of verifying the authenticity of public keys (PGP web of trust vs. SSL certification authorities). (Usually, the performance issue is mitigated by using a hybrid approach: pick a random key, use it to do symmetric encryption, and transmit the result of this encryption, as well as the result of encrypting the key itself with public key encryption.)