Skip to content

Latest commit

 

History

History
334 lines (305 loc) · 54.9 KB

readinglist.md

File metadata and controls

334 lines (305 loc) · 54.9 KB

Contents

  1. OS Architecture
  2. Process&Thread&Scheduling
  3. Memory Management
  4. Concurrency/Sync&Mutex
  5. Distributed Systems
  6. Virtual Machine Monitor
  7. Network
  8. File System
  9. Scalability
  10. Bugs/Security/Fault-Tolerant/Recovery
  11. Encryption&Authentication
  12. Interface Design (API/ABI/ Software-Hardware Interface...)
  13. Verification/Proof
  14. Devices
  15. Language
  16. Overview

OS Architecture

  1. The Structure of the "THE"-Multiprogramming System, Edsger W. Dijkstra ,Technological University, Eindhoven, The Netherlands, SOSP, 1968

    • SIGOPS: The Hall of Fame Award The first paper to suggest that an operating system be built in a structured way. That structure was a series of layers, each a virtual machine that introduced abstractions built using the functionality of lower layer. The paper stimulated a great deal of subsequent work in building operating systems as structured systems.
  2. Tenex, A Paged Time Sharing System for the PDP-10,Daniel G. Bobrow, Jerry D. Burchfiel, Daniel L. Murphy and Raymond S. Tomlinson. Communications of the ACM 15(3), March 1972.

    • SIGOPS: The Hall of Fame Award The Tenex system pioneered many ideas that are prominent in modern operating systems. It included one of the first page based memory systems, copy on write sharing, mapping of files into virtual memory, and user/group/other file protection. It also had mnemonic commands with command-line completion and automatic file versioning. As one reviewer said, “Reading it now, I’m pleasantly surprised by how much is familiar — thanks to its successors.”
  3. On the criteria to be used in decomposing systems into modules, David. L. Parnas., Communications of the ACM 15(12), December 1972, 1053-1058.

    • SIGOPS: The Hall of Fame Award This paper introduced a technique for decomposing a complex system into modules. Through a simple example it showed that a modularization that emphasizes what is now known as “information hiding” is superior to more obvious module decompositions in terms of the software engineering lifecycle. The paper argues the beneficial decomposition can be achieved with minimal performance overheads. The “information hiding” approach has influenced software engineering in areas including operating systems, distributed systems, databases, and programming languages.
  4. On Micro-Kernel Construction,J. Liedtke., ACM SIGOPS Operating Systems Review 29(5):237-250 · December 1995

    • SIGOPS: The Hall of Fame Award This paper presented the core design ideas behind the L4 microkernel, especially the minimality principle, which states that functionality must only be implemented inside the kernel if moving it outside would prevent the implementation of required system functionality. This principle was at the heart of L4’s design, and supported a ruthless performance focus, which allowed L4 to outperform other microkernels by an order of magnitude. The core ideas of this paper led to a family of L4 microkernels which were commercially deployed on a large scale, and eventually enabled unprecedented assurance through formal verification.
  5. Exokernel: An Operating System Architecture for Application-Level Resource Management, Dawson R. Engler, M. Frans Kaashoek, and James O’Toole Jr. MIT, SIGOPS ’95, 1995

  6. Singularity: Rethinking the Software Stack Galen C. Hunt and James R. Larus, Microsoft Research Redmond, OSR2007, 2007

  7. The UNIX Time-Sharing System, SOSP 1973, The Bell System Technical Journal 57 no. 6, part 2 (July-August 1978)

  8. Extensibility, Safety and Performance in the SPIN Operating System, Brian N. Bershad,etc.University of Washington, 1995

  9. Secure Virtual Architecture: A Safe Execution Environment for Commodity Operating Systems, John Criswell, Andrew Lenharth, Dinakar Dhurjati, and Vikram Adve,University of Illinois at Urbana–Champaign, SOSP07, 2007

  10. Multiprogramming a 64 kB Computer Safely and Efficiently,Amit Levy and Bradford Campbell and Branden Ghena and Daniel Giffin and Pat Pannuto and Prabal Dutta and Philip Levis. SOSP 2017

  11. The benefits and costs of writing a POSIX kernel in a high-level language,Cody Cutler, M. Frans Kaashoek, and Robert T. Morris,OSDI 2018

  12. LegoOS: A Disseminated, Distributed OS for Hardware Resource Disaggregation, Yizhou Shan, Yutong Huang, Yilun Chen, and Yiying Zhang, Purdue University, OSDI 2018

  13. The Nucleus of a Multiprogramming System, P. B. Hansen, Communications of the ACM, Vol. 13, No. 4, April 1970, pp. 238-241, 250.

  14. [Do OS abstractions make sense on FPGAs?])(https://www.usenix.org/system/files/osdi20-korolija.pdf), Dario Korolija, Timothy Roscoe, and Gustavo Alonso, ETH Zurich, OSDI 2020

  15. A Linux in Unikernel Clothing,Hsuan-Chi Kuo (University of Illinois at Urbana-Champaign), Dan Williams, Ricardo Koller (IBM T.J. Watson Research Center), Sibin Mohan (University of Illinois at Urbana-Champaign),EUROSYS 2020

  16. Twizzler: a Data-Centric OS for Non-Volatile Memory,Daniel Bittman and Peter Alvaro, UC Santa Cruz; Pankaj Mehra, IEEE Member; Darrell D. E. Long, UC Santa Cruz; Ethan L. Miller, UC Santa Cruz / Pure Storage,USENIX ATC 2020

  17. Lightweight Preemptible Functions,Sol Boucher, Carnegie Mellon University; Anuj Kalia, Microsoft Research; David G. Andersen, Carnegie Mellon University; Michael Kaminsky, BrdgAI / Carnegie Mellon University, USENIX ATC 2020

Process&Thread&Scheduling

  1. Programming Semantics for Multiprogrammed Computations, Jack B. Dennis, Earl C. Van Horn. Communications of the ACM, Volume 9 Issue 3, March 1966.
    • SIGOPS: The Hall of Fame Award The paper lays out the conceptual foundations for multiprogramming and protection in computer systems.
  2. Lottery Scheduling: Flexible Proportional-Share Resource Management, OSDI94
  3. Stride Scheduling: Deterministic Proportional-Share Resource Management tech report, 1995
  4. Supporting Time-Sensitive Applications on a Commodity OS,OSDI2002
  5. Borrowed-Virtual-Time (BVT) scheduling: supporting latency-sensitive threads in a general-purpose schedulerSOSP1999
  6. Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler,PACT2007
  7. A Hierarchical CPU Scheduler for Multimedia Operating Systems,OSDI1996
  8. Reinventing scheduling for multicore systems,HOTOS 2009
  9. Addressing shared resource contention in multicore processors via scheduling,ASPLOS2010

Memory Management

  1. The working set model for program behavior,Peter J. Denning, SOSP 1967, In Communications of the ACM 11(5), May 1968
    • SIGOPS: The Hall of Fame Award This paper introduced the working set model, which has became a key concept in understanding of locality of memory references and for implementing virtual memory. Most paging algorithms can trace their roots back to this work.
  2. The Multics Virtual Memory: Concepts and Design, Andre Bensoussan, Charlie T. Clingen, Robert C. Daley, Communications of the ACM 15(5):308-318, May 1972.
    • SIGOPS: The Hall of Fame Award
  3. Memory Coherence in Shared Virtual Memory Systems, Kai Li(李凯), Paul Hudak.ACM TOCS 7(4), Nov 1989, pp 321–359.
    • SIGOPS: The Hall of Fame Award The paper shows how to simulate coherent shared memory on a cluster, and also introduces directory-based distributed cache-coherence. It spawned a entire research area, and introduced cache coherence mechanisms that are widely used in industry.
  4. Transactional memory: architectural support for lock-free data structures,Maurice Herlihy and J. Eliot B. Moss., ISCA 1993.
    • SIGOPS: The Hall of Fame Award This paper introduced transactional memory, an architectural concept intended to make lock-free synchronization as efficient and easy to use as conventional techniques based on mutual exclusion. This concept has found its way into commercial multicore processors, and has generated a large amount of follow-on work in software transactional memory.
  5. Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures, Richard Rashid, Avadis Tevanian, Michael Young, David Golub, Robert Baron, David Black, William Bolosky, and Jonathan Chew, ASPLOS 1987.
    • SIGOPS: The Hall of Fame Award
  6. WSClock - A Simple and Effective Algorithm for Virtual Memory Management,ACM SIGOPS Operating Systems Review 15(5):87-95 · December 1981
  7. Simple But Effective Techniques for NUMA Memory Management,SOSP 1989
  8. TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems,IN PROCEEDINGS OF THE 1994 WINTER USENIX CONFERENCE,1994
  9. The Slab Allocator: An Object-Caching Kernel Memory Allocator, USENIX SUMMER TECHNICAL CONFERENCE , 1994
  10. Mnemosyne: Lightweight Persistent Memory,ASPLOS2011
  11. Process-in-Process: Techniques for Practical Address-Space Sharing, Atsushi Hori, HPDC 2018
    • Best Paper Award of HPDC 2018
    • The idea of sharing address space between multiple processes is not new. So why is a new model needed? The answer lies with advances in high-performance computing, notably many-core computers with more parallelism in a node and frequent communication between processes. Unlike other models, PiP’s design is completely in user space.
    • PiP project
  12. Classifying Memory Access Patterns for Prefetching, Grant Ayers,Heiner Litz profile imageHeiner Litz, Christos Kozyrakis profile imageChristos Kozyrakis, Partha Ranganathan, ASPLOS 2020
  13. Learning-based Memory Allocation for C++ Server Workloads,Martin Maas, David G. Andersen profile imageDavid G. Andersen, Michael Isard profile imageMichael Isard,Mohammad Mahdi Javanmard, ASPLOS 2020
  14. Elastic Cuckoo Page Tables: Rethinking Virtual Memory Translation for Parallelism,Dimitrios Skarlatos, Apostolos Kokolis, Tianyin Xu, ASPLOS 2020

Concurrency/Sync&Mutex

  1. Experience with processes and monitors in Mesa, Butler W. Lampson and David D., SOSP 1979.
    • SIGOPS: The Hall of Fame Award When this paper was written, monitors had emerged as the synchronization method of choice. in programming languages conferences and operating systems textbooks. This paper was the first to look closely at the practical issues that monitors pose when used in a large production system. These issues remain contemporary, and indeed researchers working on transactional memory mechanisms would do well to reread this wonderful paper.
  2. On optimistic methods for concurrency control, H. T. Kung(孔祥重) and John T. Robinson., ACM Transactions on Database Systems (TODS) 6(2), June 1981, 213-226.
    • SIGOPS: The Hall of Fame Award This paper introduced the notion of optimistic concurrency control, proceeding with a transaction without locking the data items accessed, in the expectation that the transaction’s accesses will not conflict with those of other transactions. This idea, originally introduced in the context of conventional databases, has proven very powerful when transactions are applied to general-purpose systems.
  3. Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors,ACM Transactions on Computer Systems, Feb. 1991.
  4. Scalable Read-mostly Synchronization Using Passive Reader-Writer Locks.,Usenix ATC 2014
  5. Non-scalable locks are dangerous Linux Symposium 2012.
  6. Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System,OSDI1999
  7. Performance of Multithreaded Chip Multiprocessors and Implications for Operating System Design, USENIX 2005

Distributed Systems

  1. Time, Clocks, and the Ordering of Events in a Distributed System, Leslie Lamport, Communications of the ACM 21(7):558-565, July 1978.
    • SIGOPS: The Hall of Fame Award Perhaps the first true “distributed systems” paper, it introduced the concept of “causal ordering”, which turned out to be useful in many settings. The paper proposed the mechanism it called “logical clocks”, but everyone now calls these “Lamport clocks.”
  2. Implementing Remote Procedure Calls,ACM Transactions on Computer Systems 2(1):39-59, February 1984.
    • SIGOPS: The Hall of Fame Award This is the paper on RPC, which has become the standard for remote communication in distributed systems and the Internet. The paper does an excellent job laying out the basic model for RPC and the implementation options.
  3. Grapevine: An Exercise in Distributed Computing, Andrew D. Birrell, Roy Levin, Roger M. Needham, and Michael D. Schroeder, SOSP 1981.
    • SIGOPS: The Hall of Fame Award
  4. Scale and Performance in a Distributed File System, John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols, M. Satyanarayanan, Robert N. Sidebotham, and Michael J. West, SOSP 1987.
    • SIGOPS: The Hall of Fame Award
  5. VAXclusters: A Closely-Coupled Distributed System, Nancy P. Kronenberg, Henry M. Levy, and William D. Strecker, SOSP 1985.
    • SIGOPS: The Hall of Fame Award The VAX Clusters system was the first modern clustered system supporting such basic features as a distributed file system and a distributed locking service. The SOSP paper on VAX Clusters remains a classic today. VAXclusters was a huge commercial success, and set the stage for today.s massive data centers.
  6. Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems,rian M. Oki, Barbara H. Liskov. PODC 1988.
    • SIGOPS: The Hall of Fame Award The paper introduces a replication protocol very similar to what is now known as Paxos. That protocol has become the standard for consistent, fault-tolerant state-machine replication, and is widely used in data centers to keep the state consistent despite failures and reconfiguration.
  7. The Part Time Parliament,Leslie Lamport. ACM TOCS 16(2), May 1998, 133–169.
    • SIGOPS: The Hall of Fame Award The work (originally published in 1989) was independent and roughly concurrent with the Viewstamped Replication work also recognized this year. It describes the protocol in a more general setting, adds a correctness argument, and forms the basis for modern Paxos implementations.
  8. Distributed Snapshots: Determining Global States of a Distributed System, K. Mani Chandy and Leslie Lamport. ACM Transactions on Computer Systems 3(1), February 1985.
    • SIGOPS: The Hall of Fame Award This paper takes the idea of consistency for distributed predicate evaluation, formalizes it, distinguishes between stable and dynamic predicates, and shows precise conditions for correct detection of stable conditions. The fundamental techniques in the paper are the secret sauce in many distributed algorithms for deadlock detection, termination detection, consistent checkpointing for fault tolerance, global predicate detection for debugging and monitoring, and distributed simulation.
  9. Exploiting Virtual Synchrony in Distributed Systems, Kenneth P. Birman and Thomas A. Joseph. SOSP 1987.
    • SIGOPS: The Hall of Fame Award This paper describes a methodology for building distributed applications comprised of multiple components, each realized by a group of replicated servers. It defines a number of group communication primitives and then ties fault notification into the fabric of group services by introducing the virtual synchrony principle, which orders communication and fault notifications consistently among group members and across multiple groups.
  10. Managing update conflicts in Bayou, a weakly connected replicated storage system,D. B. Terry, M. M. Theimer, Karin Petersen, A. J. Demers, M. J. Spreitzer, and C. H. Hauser., SOSP 1995.
    • SIGOPS: The Hall of Fame Award Bayou is a replicated storage system that anticipated the world of numerous small mobile devices executing collaborative applications over unreliable networks. The paper describes a client-server storage structure supporting eventual consistency, anti-entropy protocols, disconnected operation, log-based recovery, and an application-centered approach to detecting and resolving update conflicts to arrive at consistent replicas. These concepts were backed up by a prototype implementation, two applications, and a simple performance evaluation. Bayou is still relevant to the problems faced by, and the solutions employed by, a large number of today’s modern applications.
  11. Chord: A scalable peer-to-peer lookup service for Internet applications, Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan., SIGCOMM 2001.
    • SIGOPS: The Hall of Fame Award This paper introduced a novel protocol that enables efficient key lookup in a large-scale and dynamic environment; the paper shows how to utilize consistent hashing to achieve provable correctness and performance properties while maintaining a simplicity and elegance of design. The core ideas within this paper have had a tremendous impact both upon subsequent academic work as well as upon industry, where numerous popular key-value storage systems employ similar techniques. The ability to scale while gracefully handling node addition and deletion remains an essential property required by many systems today.
  12. MapReduce: simplified data processing on large clusters, Jeffrey Dean and Sanjay Ghemawat, OSDI 2004.
    • SIGOPS: The Hall of Fame Award The paper proposed a simple yet highly effective approach for processing large data sets in a scalable and fault-tolerant manner. An impressive aspect of the design is its simplicity: it elegantly captures a common pattern that solves two critical problems faced by many developers today (scalability and fault tolerance), while still retaining a clean, easy-to-use interface that supports a wide range of applications. The impact of MapReduce has been huge. It is widely used in industry, with virtually every large company running MapReduce. As a sign of great system design, developers have adopted MapReduce in many use cases beyond its original goals and inspired many follow-on systems.
  13. Bigtable: A Distributed Storage System for Structured Data, Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber., OSDI 2006.
    • SIGOPS: The Hall of Fame Award
  14. Dynamo: Amazon’s Highly Available Key-value Store, Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels., SOSP 2007.
    • SIGOPS: The Hall of Fame Award Dynamo is a scalable and highly reliable distributed key-value store. The paper describes how Dynamo manages the tradeoffs between availability, consistency, cost-effectiveness, and performance, and explains how the system combines a variety of techniques: consistent hashing, vector clocks, sloppy quorums, Merkle trees, and gossip-based membership and failure detection protocols. In particular, the paper emphasizes the value of supporting eventual consistency in order to provide high availability in a distributed system. Dynamo evolved within Amazon to become the basis of a popular cloud service, and also inspired open-source systems such as Cassandra.
  15. The Chubby lock service for loosely-coupled distributed systems,Mike Burrows., OSDI 2006.
    • SIGOPS: The Hall of Fame Award The Chubby lock service provides coarse-grained locking and reliable, low-volume storage for a loosely-coupled distributed system, and is particularly useful for synchronizing activities between clients. Chubby uses Paxos internally, but exposes a lock-service API to its clients, intended to simplify its adoption by programmers. The paper was one of the first to discuss the challenges of engineering a high-availability service for use by a wide range of programmers in a globally-distributed environment. While Chubby itself is widely used only within Google, the paper inspired open-source implementations of similar services, such as Zookeeper, that provide similar functionality.

Virtual Machine Monitor

  1. A virtual machine time-sharing system,R. A. Meyer and L. H. Seawright. IBM Systems Journal 9(3), September 1970, 199- 218.
    • SIGOPS: The Hall of Fame Award This paper described the second generation of the very first virtual machine system. It was originally built in 1966 for an IBM 360/40 with custom virtual memory hardware and then ported to a 360/67, which had virtual memory built in. In addition to a virtual machine monitor called CP, the system included a single user interactive system called CMS, heavily influenced by MIT’s CTSS; to support multiple users the system ran CMS in a separate VM for each user. Because of the clean architecture of the 360, CP could virtualize the hardware perfectly (except for timing dependencies and self-modifying channel programs) without binary translation, though it did have to translate the channel programs. It could run most of the existing IBM operating systems in virtual machines. CP/67 evolved into VM/370, which became the main time-sharing system for IBM mainframes.
  2. Memory Resource Management in VMware ESX Server, Carl A. Waldspurger. , OSDI 2002.
    • SIGOPS: The Hall of Fame Award This paper introduced elegant and effective techniques of hypervisor memory management. Memory ballooning allows the hypervisor to reclaim memory from a virtual machine in accordance with the unmodified guest’s operating system policies. Transparent page sharing supports efficient memory use with small overhead. The combination of active memory estimation, idle memory tax, and proportional fair sharing, along with admission-controlled memory reservation,provides the basis for service level agreements and reasoned overcommitment. This paper has been highly influential; many of its techniques have been adopted by widely-used hypervisors.
  3. Virtual machine monitors: current technology and future trends ,IEEE Computer,vol38,issue 5, pp39-47, 2005
  4. Diagnosing performance overheads in the xen virtual machine environment, vee 2005
  5. A comparison of software and hardware techniques for x86 virtualization, asplos 2006
  6. Live migration of virtual machines, nsdi 2005
  7. Dune: Safe User-level Access to Privileged CPU Features OSDI 2012.
  8. COLO: COarse-grained LOck-stepping Virtual Machines for Non-stop Service, socc13, 2013
  9. Xen and the Art of Virtualization,SOSP2003
  10. My VM is Lighter (and Safer) than your Container SOSP 2017
  11. NEVE: Nested Virtualization Extensions for ARM SOSP 2017
  12. ACRN: a big little hypervisor for IoT development VEE 2019
  13. TEEv: Virtualizing Trusted Execution Environments on Mobile Platforms VEE 2019 (Best Paper)
  14. Disco: running commodity operating systems on scalable multiprocessors,Edouard Bugnion, Scott Devine, and Mendel Rosenblum, SOSP 1997, ACM Transactions on Computer Systems,1997
  15. Design of a Symbolically Executable Embedded Hypervisor,Jan Nordholz (TU Berlin / PTB), EUROSYS 2020

Network

  1. End-To-End Arguments in System Design, J. H. Saltzer, D. P. Reed, and D. D. Clark, ACM Transactions on Computer Systems 2(4):277-288, November 1984.
    • SIGOPS: The Hall of Fame Award This paper gave system designers, and especially Internet designers, an elegant framework for making sound decisions. A paper that launched a revolution and, ultimately, a religion.
  2. The Click Modular Router, Eddie Kohler, Robert Morris, Benjie Chen, John Jannotti and Frans Kaashoek. ACM Transactions on Computer Systems (TOCS), 18(3), August 2000.
    • SIGOPS: The Hall of Fame Award Click defines a simple, modular, and efficient framework for constructing network routers with different services and properties. Since this paper’s publication, Click has been an essential tool for the networking and systems research communities with dozens and perhaps hundreds of systems and papers built on it, including several commercially successful systems.
  3. The x-Kernel: An Architecture for Implementing Network Protocols,IEEE Transactions on Software Engineering,1991
  4. MegaPipe: A New Programming Interface for Scalable Network I/O, OSDI 2012
  5. Improving network connection locality on multicore systems, EUROSYS 2012
  6. IX: A Protected Dataplane Operating System for High Throughput and Low Latency, OSDI 2014.
  7. mTCP: a Highly Scalable User-level TCP Stack for Multicore Systems, NSDI 2014
  8. scalable Kernel TCP Design and Implementation for Short-Lived Connections, ASPLOS 2016
  9. ZygOS: Achieving Low Tail Latency for Microsecond-scale Networked Tasks SOSP 2017

File System/Storage

  1. All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent ApplicationsOSDI 2014.
  2. Rethink the Sync, OSDI 2006.
  3. Serverless Network File Systems,SOSP1995
  4. The Design and Implementation of a Log-Structured File System,Mendel Rosenblum, John K. Ousterhout. ACM TOCS 10(1), Feb 1992, pp 26–52.
    • SIGOPS: The Hall of Fame Award The paper introduces log-structured file storage, where data is written sequentially to a log and continuously de-fragmented. The underlying ideas have influenced many modern file and storage systems like NetApp’s WAFL file systems, Facebook’s picture store, aspects of Google’s BigTable, and the Flash translation layers found in SSDs.
  5. A Fast File System for UNIX,Marshall K. McKusick, William N. Joy, Samuel J. Leffler, and Robert S. Fabry, ACM Transactions on Computer Systems (TOCS) 2(3), August 1984, 181-197.
    • SIGOPS: The Hall of Fame Award This paper introduced techniques to make the file system “disk aware”, thus demonstrating the importance of understanding the interplay between hardware technology and file-system design. The structuring concept of the cylinder group, while simple, is found in some form in many current systems (including the widely-deployed Linux ext* family) and serves as an excellent example of the importance of locality in storage. The paper also introduced numerous functionality improvements, including symbolic links and atomic rename, which have since become commonplace features in modern file systems.
  6. Disconnected operation in the Coda File System,James J. Kistler and M. Satyanarayanan., ACM Transactions on Computer Systems (TOCS) 10(1), February 1992, 3-25.
    • SIGOPS: The Hall of Fame Award This paper was the first to describe the use of caching to provide availability in addition to improved performance in a distributed setting where clients use files stored at remote file servers, leading to potential loss of service during disconnection. The Coda design provided a thoughtful and elegant approach to supporting continued service during disconnection. Disconnected clients continued to service user requests using locally cached content; however all potential modifications performed while disconnected were logged locally, and when service was restored the system attempted to reconcile the local modifications with the current server state. The Coda design inspired much follow-on research on distributed file systems and its techniques were adopted in other systems.
  7. The Google File System,Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung.,SOSP 2003
    • SIGOPS: The Hall of Fame Award This paper presented an effective design for a large-scale distributed file system that provided fault tolerance while running on inexpensive commodity hardware. It provided very large amounts of I/O bandwidth by having all data transfers happen directly between client processes and machines storing the actual data, handled automatic recovery from failed disks and machines, and retained the simplicity of managing file system metadata in a single centralized master. GFS formed the basis for the design for the open-source HDFS system, as well the backbone for the evolution of large-scale distributed file systems at Google and elsewhere.
  8. Design Tradeoffs for SSD Performance,USENIX 2008
  9. A Study of Linux File System Evolution,FAST 2013
  10. F2FS: A New File System for Flash Storage,FAST 2015
  11. Hare: a file system for non-cache-coherent multicores,EUROSYS 2015
  12. CrossFS: A Cross-layered Direct-Access File System, Yujie Ren, Rutgers University; Changwoo Min, Virginia Tech; Sudarsun Kannan, Rutgers University, OSDI 2020
  13. Write Dependency Disentanglement with HORAE, Xiaojian Liao, Youyou Lu, Erci Xu, and Jiwu Shu, Tsinghua University, OSDI 2020
  14. AsymNVM: An Efficient Framework for Implementing Persistent Data Structures on Asymmetric NVM Architecture,Teng Ma, Mingxing Zhang, Kang Chen, Zhuo Song, Yongwei Wu, Xuehai Qian, ASPLOS 2020
  15. Pronto: Easy and Fast Persistence for Volatile Data Structures, mirsaman Memaripour, Joseph Izraelevitz, Steven Swanson, ASPLOS 2020

Scalability

  1. Corey: An Operating System for Many Cores,OSDI 2008
  2. An analysis of Linux scalability to many cores, OSDI 2010
  3. Locating cache performance bottlenecks using data profiling, EUROSYS 2010
  4. Non-scalable locks are dangerous Linux Symposium 2012.
  5. scalable Kernel TCP Design and Implementation for Short-Lived Connections, ASPLOS 2016
  6. Scalable Address Spaces Using RCU Balanced Trees, ASPLOS 2012
  7. mTCP: a Highly Scalable User-level TCP Stack for Multicore Systems, NSDI 2014
  8. The Multikernel: A new OS architecture for scalable multicore systems,SOSP2009
  9. [Disco: running commodity operating systems on scalable multiprocessors](http://rese arch.cs.wisc.edu/areas/os/Qual/papers/disco.pdf),Edouard Bugnion, Scott Devine, and Mendel Rosenblum, SOSP 1997, ACM Transactions on Computer Systems,1997
    • SIGOPS: The Hall of Fame Award
  10. Scaling a file system to many cores using an operation log SOSP 2017

Bugs/Security/Fault-Tolerant/Recovery/Update

  1. Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial, Fred B. Schneider, ACM Computing Surveys 22(4):299-319, December 1990.
    • SIGOPS: The Hall of Fame Award The paper that explained how we should think about replication … a model that turns out to underlie Paxos, Virtual Synchrony, Byzantine replication, and even Transactional 1-Copy Serializability.
  2. Leases: An Efficient Fault-Tolerant Mechanism for Distributed File Cache Consistency, Cary G. Gray and David R. Cheriton, SOSP 1989.
    • SIGOPS: The Hall of Fame Award The Gray and Cheriton paper pioneered through its analysis of the Leases mechanism, which has become one of the most widely-used mechanisms for managing distributed caches. The paper is particularly striking for its careful analysis of the semantics of leases, its detailed experiments, and its thoughtful discussion of fault-tolerance issues.
  3. Crash Recovery in a Distributed Data Storage System, Butler Lampson and Howard Sturgis, Technical report, Xerox Palo Alto Research Center, June 1979.
    • SIGOPS: The Hall of Fame Award
  4. The Recovery Manager of the System R Database Manager, Jim Gray, Paul McJones, Mike Blasgen, Bruce Lindsay, Raymond Lorie, Tom Price, Franco Putzolu, and Irving Traiger, ACM Computing Surveys, June 1981.
    • SIGOPS: The Hall of Fame Award
  5. Why Do Computers Stop And What Can Be Done About It?, Jim Gray, HP Labs Technical Report TR-85.7.
    • SIGOPS: The Hall of Fame Award The paper presents the first large scale quantitative study of computer failures in practice, of a system built using best practices at the time to achieve fault-tolerance.
  6. Reflections on Trusting Trust,Ken Thompson. Communications of the ACM, Volume 27 Issue 8, Aug 1984.
    • SIGOPS: The Hall of Fame Award The paper demonstrated that to have trust in a program, one cannot just rely on trust in the person who wrote it, or even on verifying the source code. One must also ensure that the entire tool chain used to produce and execute binaries is trustworthy.
  7. A NonStop Kernel, Joel Bartlett, SOSP 1981.
    • SIGOPS: The Hall of Fame Award Tandem was the first commercial database to achieve fault tolerance. To accomplish this, the Tandem system had to bring together many techniques — including message-passing, mirroring, fast failure detection, and failover — into a practical design and implementation.
  8. Efficient software-based fault isolation, Robert Wahbe, Steven Lucco, Thomas E. Anderson, and Susan L. Graham., SOSP 1993.
    • SIGOPS: The Hall of Fame Award This paper demonstrated that compiler or code-rewriting techniques could isolate untrusted code modules, preventing them from writing or jumping to addresses outside their “fault domain”, without the overhead of crossing hardware-enforced address space boundaries, and without much increase in execution time of code within a domain. The paper inspired substantial subsequent research, and the basic techniques have been implemented in widely-deployed software, such as Web browsers.
  9. ReVirt: Enabling intrusion analysis through virtual-machine logging and replay, George W. Dunlap, Samuel T. King, Sukru Cinar, Murtaza A. Basrai, and Peter M. Chen., OSDI 2002.
    • SIGOPS: The Hall of Fame Award The paper demonstrated that the execution of an arbitrary program inside a virtual machine can be replayed deterministically and efficiently. Originally intended primarily as a tool for intrusion analysis, record-and-replay has been used subsequently for debugging, fault-tolerance, to audit program executions, and other virtual machine services. The work has directly influenced commercial products and sparked a research area that continues to this day.
  10. An empirical study of operating systems errors, SOSP 2003.
  11. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs, OSDI 2008.
  12. PF-Miner: A New Paired Functions Mining Method for Android Kernel in Error Paths, COMPSAC 2014
  13. RID: Finding Reference Count Bugs with Inconsistent Path Pair Checking, ASPLOS 2016
  14. Learning from Mistakes A Comprehensive Study on Real World Concurrency Bug Characteristics,ASPLOS2008
  15. Triage: diagnosing production run failures at the user's site,SOSP2007
  16. MUVI: automatically inferring multi-variable access correlations and detecting related semantic and concurrency bugs, SOSP 2007
  17. Faults in linux: ten years later,ASPLOS 2011
  18. Linux kernel vulnerabilities: State-of-the-art defenses and open problems,APSYS 2011
  19. Understanding and detecting real-world performance bugs, PLDI 2012
  20. Production-run software failure diagnosis via hardware performance counters,ASPLOS 2013
  21. Early Detection of Configuration Errors to Reduce Failure Damage,OSDI 2016
  22. Towards optimization-safe systems: Analyzing the impact of undefined behavior, SOSP 2013
  23. Improving integer security for systems with Kint,OSDI 2012
  24. DeepXplore: Automated Whitebox Testing of Deep Learning Systems SOSP 2017
  25. Log20: Fully Automated Optimal Placement of Log Printing Statements under Specified Overhead Threshold SOSP 2017
  26. kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels usenixsecurity17
  27. Theseus: an Experiment in Operating System Structure and State Management
    • Theseus project,Kevin Boos, Rice University; Namitha Liyanage, Yale University; Ramla Ijaz, Rice University; Lin Zhong, Yale University, OSDI 2020
  28. RedLeaf: Isolation and Communication in a Safe Operating System,Vikram Narayanan, Tianjiao Huang, David Detweiler, Dan Appel, and Zhaofeng Li, University of California,etc. OSDI 2020
  29. Keystone: An Open Framework for Architecting Trusted Execution Environments,Dayeol Lee, David Kohlbrenner, Shweta Shinde, Krste Asanovic, Dawn Song (UC Berkeley), EUROSYS 2020
  30. SPINFER: Inferring Semantic Patches for the Linux Kernel,Lucas Serrano and Van-Anh Nguyen, *Sorbonne University/Inria/LIP6, etc. USENIX ATC 2020
  31. Occlum: Secure and Efficient Multitasking Inside a Single Enclave of Intel SGX, Youren Shen, Hongliang Tian, Yu Chen, Kang Chen, Runji Wang, Yi Xu,Yubin Xia, ASPLOS 2020
  32. Cross-Failure Bug Detection in Persistent Memory Programs,Sihang Liu,Korakit Seemakhupt,Yizhou Wei,Thomas Wenisch,Aasheesh Kolli,Samira Khan, ASPLOS 2020

Encryption&Authentication

  1. Using Encryption for Authentication in Large Networks of Computers,Roger Needham and Michael Schroeder, Communications of the ACM, December 1978.
    • SIGOPS: The Hall of Fame Award

Interface Design (API/ABI/ Software-Hardware Interface...)

  1. Hints for Computer System Design, Butler W. Lampson, SOSP 1983.
    • SIGOPS: The Hall of Fame Award A classic study of experience building large systems, distilled into a cookbook of wisdom for the operating systems researcher. As time has passed, the value of these hints has only grown and the range of systems to which they apply enlarged.
  2. End-To-End Arguments in System Design, J. H. Saltzer, D. P. Reed, and D. D. Clark, ACM Transactions on Computer Systems 2(4):277-288, November 1984.
    • SIGOPS: The Hall of Fame Award This paper gave system designers, and especially Internet designers, an elegant framework for making sound decisions. A paper that launched a revolution and, ultimately, a religion.
  3. A Critique of the Windows Application Programming Interface,Computer Standards & Interfaces, 20:1–8, November 1998
  4. Mars Code Magazine Communications of the ACM CACM Homepage archive Volume 57 Issue 2, February 2014 Pages 64-73
  5. The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors, SOSP 2013.
  6. MegaPipe: A New Programming Interface for Scalable Network I/O, OSDI 2012.
  7. Flexsc: Flexible System Call Scheduling with Exception-Less System Calls,OSDI 2010
  8. Light-Weight Contexts: An OS Abstraction for Safety and Performance,OSDI 2016
  9. A fork() is in the road, HotOS 2019

Verification/Proof

  1. Safe Kernel Extensions Without Run-Time Checking, George C. Necula and Peter Lee, SOSP 1996
    • SIGOPS: The Hall of Fame Award This paper introduced the notion of proof carrying code (PCC) and showed how it could be used for ensuring safe execution by kernel extensions without incurring run-time overhead. PCC turns out to be a general approach for relocating trust in a system; trust is gained in a component by trusting a proof checker (and using it to check a proof the component behaves as expected) rather than trusting the component per se. PCC has become one of the cornerstones of language-based security.
  2. A Logic of Authentication, Michael Burrows, Martin Abadi, and Roger Needham, ACM Transactions on Computer Systems 8(1):18-36, February 1990.
    • SIGOPS: The Hall of Fame Award This paper introduced to the systems community a logic-based notation for authentication protocols to precisely describe certificates, delegations, etc. With this precise description a designer can easily reason whether a protocol is correct or not, and avoid the security flaws that have plagued protocols. “Speaks-for” and “says” are now standard tools for system designers.
  3. seL4: Formal Verification of an OS Kernel, SOSP 2009.
  4. Jitk: A Trustworthy In-Kernel Interpreter Infrastructure OSDI 2014.
  5. Using Crash Hoare Logic for certifying the FSCQ file system SOSP 2015
  6. Push-Button Verification of File Systems via Crash Refinement.,OSDI 2016
  7. Specifying and Checking File System Crash-Consistency Models, ASPLOS 2016
  8. An Empirical Study on the Correctness of Formally Verified Distributed Systems.,EUROSYS 2017
  9. Hyperkernel: Push-Button Verification of an OS Kernel, SOSP 2017   - hyperkernel project
  10. Verifying a high-performance crash-safe file system using a tree specification SOSP 2017

DEVICES

  1. A Case for Redundant Arrays of Inexpensive Disks (RAID), David A. Patterson, Garth Gibson, Randy H. Katz.Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data.
    • SIGOPS: The Hall of Fame Award The paper shows how to achieve efficient, fault tolerant and highly available storage using cheap, unreliable disk hardware components.
  2. Improving the Reliability of Commodity Operating Systems, SOSP 2003
  3. Understanding modern device drivers,ASPLOS 2012
  4. Tolerating Hardware Device Failures in Software.,SOSP 2009
  5. Gdev: First-Class GPU Resource Management in the Operating System,USENIX ATC 2012
  6. GPUvm: Why not virtualizing GPUs at the hypervisor?,USENIX ATC 2014

LANGUAGE

  1. The Evolution of C Programming Practices: A Study of the Unix Operating System 1973–2015, Diomidis Spinellis, etc., ICSE 2016
  2. Early Experience with Mesa, Charles M. Geschke, Jr. James H. Morris, and Edwin H. Satterthwaite, Commununications of the ACM, 20(8):540–553, 1977.
  3. Extensibility, Safety and Performance in the SPIN Operating System, Brian N. Bershad,etc.University of Washington, 1995
  4. The nesC Language: A Holistic Approach to Networked Embedded Systems, David Gay, etc., PLDI 2003
  5. A Principled Approach to Operating System Construction in Haskell, Thomas Hallgren,etc. ICFP 2005
  6. Towards a Strongly Typed Functional Operating System,Arjen van Weelden and Rinus Plasmeijer, IFL 2002
  7. Language Support for Fast and Reliable Message-based Communication in Singularity OS, EuroSys 2006
  8. Singularity: Rethinking the Software Stack Galen C. Hunt and James R. Larus, Microsoft Research Redmond, OSR2007, 2007
  9. Multiprogramming a 64 kB Computer Safely and Efficiently, Amit Levy and Bradford Campbell and Branden Ghena and Daniel Giffin and Pat Pannuto and Prabal Dutta and Philip Levis. SOSP 2017
  10. The benefits and costs of writing a POSIX kernel in a high-level language,Cody Cutler, M. Frans Kaashoek, and Robert T. Morris,OSDI 2018
  11. MirageOS Towards a smaller and safer OS,Thomas Gazagnaire, Tech Talk, 2018
  12. The Case for Writing Network Drivers in High-Level Programming Languages, ANCS’19, 2019
  13. Practical Safe Linux Kernel Extensibility, Samantha Miller, etc., HotOS 2019
  14. go-pmem: Native Support for Programming Persistent Memory in Go,Jerrin Shaji George, Mohit Verma, Rajesh Venkatasubramanian, and Pratap Subrahmanyam, VMware, USENIX ATC 2020

OVERVIEW

  1. INTER-DISCIPLINARY RESEARCH CHALLENGES INCOMPUTER SYSTEMS FOR THE 2020s, Albert Cohen†, Xipeng Shen‡, Josep Torrellas∗, James Tuck‡, and Yuanyuan Zhou,etc., ASPLOS2018.