A Hardware Architecture for Implementing Protection Rings 
--- Schroeder, Michael D. and Saltzer, Jerome H.

Criteria to judge access control mechanisms:

Functional capability (or functionality)
     A mechanism must to ``meet interesting protection needs'', which can be
     accomplished by the primitives provided, and do so ``in a natural
     way'', which qualifies the user interface.
Economy (or performance)
     The cost of specification and enforcement should be low enough not to
     be a concern in the selection of a mechanism. The actual cost should be
     proportional to the functionality actually used, not to the
     functionality available.
Simplicity
     Lack of simplicity often means lack of security (since complex systems
     are tough to understand and test).
Generality
     The internal structure of components should be independent of external
     access controls.

The above criteria are usually combined by keeping the latter three
constrained within reasonable ranges, and trying to provide as much of the
former.

In this work, the classic Multics segmentation architecture is assumed,
ignoring paging.

Protection rings were motivated by the need to change a process' privileges,
as it is executing different code. Note that in Multics, a process is
loosely equivalent to a user shell. Different segments are equivalent to
programs/processes in Unix.

A protection ring is a generalization of protection domains. Instead of only
having user and supervisor domains, there exist now r domains totally
ordered in terms of the privileges they convey (0 being the most powerful).
Each process executes in a single ring at any time. Each segment defines a
privilege bracket for every possible right (read, write, execute) when the
right is enabled by the associated flag. A process has the right if and only
if its ring lies within the associated bracket for the segment accessed.
 
A process can increase its ring freely, since ring increase means monotonic
reduction of privileges. Ring decrease, however, is heavily controlled,
since it imparts additional privileges. Special locations within a segment
are defined, called gates. A process can raise its ring number to that
necessary to execute a particular segment only when it jumps directly into
one of those gates. The ring decrease is only allowed when jumping into the
gates from a particular range of rings (called the gate extension). The gate
extension effectively describes the rings from which a downward jump into a
privileged segment is allowed. For rings above the gate extension, a
downward jump is disallowed. Gates and gate extensions are stored in the
descriptor of a segment and are originally read from the access control list
associated with a segment on storage.

Execute and write rights always overlap only in a single ring, if they are
both allowed on a segment. This makes sure that only the ring with the
highest execute privileges is allowed to write a segment.

For an example, consider a segment A with a 0-3 write bracket, a 3-4execute
bracket and a +2 gate extension. A process in ring 5 may jump into one of
A's gates, and lower its ring to 4, since ring 5 is within 2 of ring 4 (the
top of the execute bracket of A). However, a process in ring 7 may not jump
into one of A's gates (and therefore, may not lower its ring through A),
because ring 7 is not within 2 or ring 4 (the top of the execute bracket of
A).

Calls and returns can cause problems when the protection domain changes
between caller and callee. Caller-callee environments must be independent
(i.e., the environment of lower-ring segments should not be accessible by
higher-ring segments). For an example, consider caller-callee stacks should
be independent: Stack records written at ring n should not be accessible at
ring m>n. Otherwise, consider segment A executing at nand calling segment B
executing at m. If B has access to A's stack records, it can alter them,
thereby causing B to return to code it didn't intend to execute. This is
solved by having different stack segments for each ring, making sure that
the stack segment for ring n is not accessible at any ring m>n.

In the case where the call lowers the ring number, the callee must ensure it
doesn't rely on the caller for its correct operation (since the caller is,
in fact, less trusted). The caller must make sure it doesn't allow the
callee to circumvent trust boundaries:
   * The callee must be able to locate a stack area without relying on any
     information received by the caller. This is accomplished by reserving
     specific segment numbers for related ring stack segments (e.g., ring
     nhas its stack in segment f(n)). A special word in each segment
     (reserved by convention) stores the current stack pointer offset for
     that ring.
   * The callee shouldn't allow the caller to use as arguments for a call
     data the caller couldn't access in the first place (for example, a
     caller at m shouldn't trick the callee to use arguments accessible at
     l>m, even if the callee executes at n>l>m.). Therefore, the callee must
     validate arguments against the caller's ring number, to make sure they
     are legitimate arguments given the caller. The hardware allows the
     association of a higher ring number with specific instructions.
     Therefore, if the callee knows the caller's ring number, it can access
     the arguements using this more restricted ring number.
   * For reasons described in the previous problem, the callee must know the
     caller's ring. This is also important when returning from the call,
     since the ring should be restored to no lower than what the caller had.
     This is solved by causing the processor to put the ring from which is
     switched in a process-accessible register.

Calls at the same protection domain work exactly as described above.
However, upward calls also have problems.

   * It's possible that the caller's arguments are inaccessible by the
     callee. Some possible HW solutions are described but are all unpleasant
     and clumsy.
   * The return jump must proceed through a gate (since it advances from a
     higher to a lower ring). Such gates should be only used for the return
     jump (i.e., not for other jumps which could be performed further down
     the calling stack). This means that gates should be managed in a
     stack-like fashion. This is beyond what HW could do.

To fix the problems above (which seem to be difficult to fix correctly in
hardware), upward calls trap into the OS, which sets the environment
accordingly.

A suggested ring organization for Multics would be:

   * Hard-core supervisor routines (access control, I/O, virtual memory,
     scheduling) on ring 0.
   * Less critical supervisor routines (accouting, ...) on ring 1.
   * Procedures with access to supervisor routines (through defined gates)
     on rings 2-5. At this level, protected user-level subsystems could live
     on rings 2 and 3. All normal user processes would run on 4. Therefore,
     common shared libraries on 2 and 3 could be protected, without the need
     to be included in the supervisor. Ring 5 could be used for debugging
     (i.e., self protection).
   * Procedures without access to supervisor routines on rings 6 and 7. Such
     would be controlled-environment procedures, being checked by user-level
     procedures running on 4. For example, a generating procedure on 6 could
     be tested by an evaluator running on 4.

Protection rings don't deal with ``mutually suspicious programs'', running
with the same amount of privileges over a trusted arbiter.