Stanford University
Computer Science Department
Class Presentation
K. Loudon, V. Mehta, M. Patel
Self-Monitoring and Self-Adapting Operating Systems
Margo Seltzer and Christopher Small
Harvard University
Introduction
- An extensible OS permits modification of kernel behavior
- How do we know which parts of the system to extend?
Answer: Self-monitoring.
- By what mechanism is the system actually extended?
Answer: Self-adaptation.
VINO Approach to Self-Monitoring and Self-Adaptation
- VINO is an extensible system that can be made to do self-monitoring and self-adaptation.
- Extensibility is provided using grafts.
- Grafts allow the following types of extensibility:
- Replacing a method of an object.
- Registering a handler for a kernel event.
Features of self-monitoring and self-adaptation in VINO:
- Continuous monitoring of the system to construct a database of performance statistics.
- Correlation of the database by process, process type, and process group.
- Collection of traces and logs of process activity.
- Deriving heuristics and algorithms to improve performance for the observed patterns.
- In Situ simulation of new algorithms using statistics, logs and traces.
- Adapting the system according to the results of simulation.
Monitoring
- Each VINO subsystem includes a statistics module that maintains counts of events handled by the subsystem.
- A graft is created to poll each of these subsystems at a regular interval.
- SUIF compiler (Harvard) can also be used to allow useful profiling.
Traces and Logs
- Capture dynamic data about the system.
- A trace is a record of requests coming into a module.
- A log is a record of requests going out from a module.
Simulation
- Statistics, traces, and logs create a picture of what is happening on a system.
- VINO allows a module under investigation to be replaced, rerun, and the results compared.
- Simulations run along side of other processes on the system (in situ) but do not affect the state of the system.
System Analysis
- There are two types of analysis mechanisms, off-line and on-line.
- Off-line analysis is performed at night and is responsible for the following:
- How frequently measurements should be taken.
- Setting thresholds for each resource.
- Long-term performance of the system.
- On-line analysis is performed continuously and is responsible for the following:
- Determining instantaneous resource utilization and rate of change.
- Detects "red-flag" situations as presented by the off-line system.
- Selects adaptation heuristics or installs the appropriate trace-generating graft.
Adaptation
- Paper mentions that precisely how the system adapts itself is one of the key areas for future research.
- A few simple adaptation techniques are presented at the end of the paper.
Problems with the Paper
- Good techniques in theory: do they really work???
How much overhead required?
No Performance Measurements data at all.
- Very little discussion on alternative methods/grafts
- Source
- Security
- Number of Grafts
- Performance Overhead
- Discusses only the simple cases (i.g. Linear Paging)