Type-Safe Stack Traversal for Garbage Collector Implementation

Colin Stebbins Gordon

June 2008

Abstract

Garbage collectors are an important part of many modern language runtimes. Essentially all tools for developing and debugging programs using garbage collection assume the correctness of the collector, and therefore provide no means for detecting garbage collector errors. As a result it is especially important that garbage collector implementations be free of errors. This goal is even more challenging in the face of the typical implementation strategy for collectors: implementation in C, making error-prone inferences from complex bit patterns, where an error could result in dereferencing an invalid pointer or corrupting program data. One approach to reducing errors in collector implementation is to improve both the type-safety and memory-safety of garbage collector implementations. Prior work in this direction has focused on the use of modern type systems to statically detect errors in the collector code at compile time, but has practical shortcomings. The prior work replaces the standard machine stack with a heap allocated data structure to avoid unsafe walks of the native stack. Traversal of the runtime stack is normally not possible in higher-level languages because they trade the flexibility of arbitrary memory access - typically used to gather a root set from a runtime stack - for the safety of being unable to cause memory access errors. We present a method for addressing the safe stack traversal problem at the compiler level, by lifting actual machine stack frames up to the level of explicit data structures in Standard ML, such that complete stack traversal can be performed with minimal unsafe code. We implement a garbage collector in the ML Kit using the techniques described and provide details on key parts of the implementation.

Bibtex

@misc{ugradthesis,
  author="Colin Stebbins Gordon",
  title={{Type-Safe Stack Traversal for Garbage Collector Implementation}},
  howpublished={Brown University Senior Honors Thesis},
  year=2008,
  month="May",
  note="Undergraduate Thesis",
  abstract = {
Garbage collectors are an important part of many modern language runtimes. Essentially all tools for developing and debugging programs using garbage collection assume the correctness of the collector, and therefore provide no means for detecting garbage collector errors. As a result it is especially important that garbage collector implementations be free of errors. This goal is even more challenging in the face of the typical implementation strategy for collectors: implementation in C, making error-prone inferences from complex bit patterns, where an error could result in dereferencing an invalid pointer or corrupting program data. One approach to reducing errors in collector implementation is to improve both the type-safety and memory-safety of garbage collector implementations. Prior work in this direction has focused on the use of modern type systems to statically detect errors in the collector code at compile time, but has practical shortcomings. The prior work replaces the standard machine stack with a heap allocated data structure to avoid unsafe walks of the native stack. Traversal of the runtime stack is normally not possible in higher-level languages because they trade the flexibility of arbitrary memory access - typically used to gather a root set from a runtime stack - for the safety of being unable to cause memory access errors. We present a method for addressing the safe stack traversal problem at the compiler level, by lifting actual machine stack frames up to the level of explicit data structures in Standard ML, such that complete stack traversal can be performed with minimal unsafe code. We implement a garbage collector in the ML Kit using the techniques described and provide details on key parts of the implementation.
  },
  bibtex_show = {true},
  pdf = {papers/ugrad_thesis.pdf},
  url = {http://cs.brown.edu/research/pubs/theses/ugrad/2008/gordon.pdf}
}