We are beginning to write new Shadow code in Rust instead of C. Before we get too far, we need to work out how to make Shadow's data structures play along with Rust's ownership models.
Conceptually, Shadow models many Hosts, each with multiple Processes, each with multiple Threads. These objects often need to access each-other; and sometimes need to mutate each-other. Ownership is mostly a tree, but occasionally references need to be stored outside the tree; e.g. in event objects that are stored in scheduler queues. Because of this, most objects have reference counts that are maintained manually.
Most of the simulation works without locking - safety is maintained by manually ensuring that a Host is the root of a partitioned graph of an objects, such that handing a Host over to a worker threads doesn't leave any pointers behind that might be accessed by some other thread.
How can we model these objects and their relationships in Rust?
Using raw pointers¶
Let's start by directly translating the existing structure. We'll start with just sketching out Host
:
mod literal_translation {
use std::collections::HashMap;
struct Host {
ref_count : i32,
processes : HashMap<i32, *mut Process>,
}
fn host_new() -> *mut Host {
// Allocate a pointer on the heap, and then convert to a raw pointer.
Box::into_raw(Box::new(Host{ref_count: 1, processes: HashMap::new()}))
}
unsafe fn host_inc(hostp : *mut Host) {
let host : &mut Host;
host = &mut *hostp;
host.ref_count += 1;
}
unsafe fn host_dec(hostp : *mut Host) {
let host : &mut Host;
host = &mut *hostp;
host.ref_count -= 1;
if host.ref_count == 0 {
// Transfer ownership into the Box object and let it get dropped.
Box::from_raw(hostp);
}
}
struct Process {}
}
Using Rc (reference counted objects)¶
A seemingly obvious improvement is to use Rust's reference counting instead of doing it manually. We don't want to introduce synchronization here, so we'll start with Rc
. While we're at it, let's replace host_new
with a more idiomatic Host::new
:
mod using_rc {
use std::rc::Rc;
use std::collections::HashMap;
struct Host {
// We now store ref-counted Process objects instead of raw pointers.
processes : HashMap<i32, Rc<Process>>,
}
impl Host {
// Our idiomatic constructor returns a literal Host objects; it's up to the caller
// to move that into an Rc<Host>.
fn new() -> Host {
Host{processes: HashMap::new()}
}
}
// Increment and decrement no longer needed!
struct Process {}
}
Storing references¶
Unfortunately things get complicated when a Process
needs to access the Host
to which it belongs. Let's try the simplest thing that might work: storing a reference. We have to do add some lifetime specifiers as a result, effectively telling Rust that the Host
must outlive the Process
, which seems reasonable enough. We can get the structure definitions and Process::new
to compile:
mod ref_to_parent {
use std::rc::Rc;
use std::collections::HashMap;
struct Host<'a> {
processes : HashMap<i32, Rc<Process<'a>>>,
}
struct Process<'a> {
host: &'a Host<'a>,
}
impl<'a> Process<'a> {
fn new(host : &'a Host<'a>) -> Process<'a> {
Process{host}
}
}
}
Once we actually try to spawn a new process, though, we run into trouble:
mod ref_to_parent {
use std::rc::Rc;
use std::collections::HashMap;
struct Host<'a> {
next_pid : i32,
processes : HashMap<i32, Rc<Process<'a>>>,
}
impl<'a> Host<'a> {
fn spawn_process(&'a mut self) {
self.processes.insert(self.next_pid, Rc::new(Process::new(self)));
self.next_pid += 1;
}
}
struct Process<'a> {
host: &'a Host<'a>,
}
impl<'a> Process<'a> {
fn new(host : &'a Host<'a>) -> Process<'a> {
Process{host}
}
}
}
Rust prevents a mutable reference and an immutable reference to a single object from existing at the same time. If we want to store a reference in a Process
to its Host
, we can never create a mutable reference to Host
again. We can get a bit further though with interior mutability - putting any mutations that need to happen inside a guarded data structure, such as RefCell
, such that those operations can be done from an immutable reference.
mod ref_to_parent_with_refcell {
use std::rc::Rc;
use std::collections::HashMap;
use std::cell::RefCell;
pub struct Host<'a> {
next_pid : RefCell<i32>,
processes : RefCell<HashMap<i32, Rc<Process<'a>>>>,
}
impl<'a> Host<'a> {
pub fn new() -> Host<'a> {
Host { next_pid : RefCell::new(0), processes : RefCell::new(HashMap::new())}
}
pub fn spawn_process(&'a self) {
self.processes.borrow_mut().insert(*self.next_pid.borrow(), Rc::new(Process::new(self)));
*self.next_pid.borrow_mut() += 1;
}
}
struct Process<'a> {
host: &'a Host<'a>,
}
impl<'a> Process<'a> {
fn new(parent : &'a Host<'a>) -> Process<'a> {
Process{host: parent}
}
}
}
{
use ref_to_parent_with_refcell as m;
let h = m::Host::new();
h.spawn_process();
}
How does Rust actually prevent the reference from process to host from dangling? Let's try it and see where things go wrong.
mod drop_ref_to_parent_with_refcell {
use std::rc::Rc;
use std::collections::HashMap;
use std::cell::RefCell;
pub struct Host<'a> {
next_pid : RefCell<i32>,
processes : RefCell<HashMap<i32, Rc<Process<'a>>>>,
}
impl<'a> Host<'a> {
pub fn new() -> Host<'a> {
Host { next_pid : RefCell::new(0), processes : RefCell::new(HashMap::new())}
}
pub fn spawn_process(&'a self) {
let mut next_pid = self.next_pid.borrow_mut();
self.processes.borrow_mut().insert(*next_pid, Rc::new(Process::new(self, *next_pid)));
*next_pid += 1;
}
pub fn get_process(&self, pid : i32) -> Rc<Process<'a>> {
self.processes.borrow().get(&pid).unwrap().clone()
}
}
pub struct Process<'a> {
host: &'a Host<'a>,
pid: i32,
}
impl<'a> Process<'a> {
fn new(parent : &'a Host<'a>, pid: i32) -> Process<'a> {
Process{host: parent, pid}
}
}
}
{
use drop_ref_to_parent_with_refcell as m;
let _p = {
let h = m::Host::new();
h.spawn_process();
h.get_process(0)
};
}
Ah ha - because the Process
's type is parameterized with its Host
's lifetime, Rust won't allow a Process
object to outlive its Host
.
I suspect we'd run into trouble though proving to the compiler that this constraint is satisfied. Interestingly, it looks like if we implement the drop trait, the host becomes undroppable after calling spawn_process
:
mod drop_ref_to_parent_with_refcell {
use std::rc::Rc;
use std::collections::HashMap;
use std::cell::RefCell;
pub struct Host<'a> {
next_pid : RefCell<i32>,
processes : RefCell<HashMap<i32, Rc<Process<'a>>>>,
}
impl<'a> Host<'a> {
pub fn new() -> Host<'a> {
Host { next_pid : RefCell::new(0), processes : RefCell::new(HashMap::new())}
}
pub fn spawn_process(&'a self) {
let mut next_pid = self.next_pid.borrow_mut();
self.processes.borrow_mut().insert(*next_pid, Rc::new(Process::new(self, *next_pid)));
*next_pid += 1;
}
pub fn get_process(&self, pid : i32) -> Rc<Process<'a>> {
self.processes.borrow().get(&pid).unwrap().clone()
}
}
impl<'a> Drop for Host<'a> {
fn drop(&mut self) {
println!("Dropped host!");
}
}
pub struct Process<'a> {
host: &'a Host<'a>,
pid: i32,
}
impl<'a> Process<'a> {
fn new(parent : &'a Host<'a>, pid: i32) -> Process<'a> {
Process{host: parent, pid}
}
}
}
{
use drop_ref_to_parent_with_refcell as m;
let h = m::Host::new();
h.spawn_process();
}
Storing weak pointers¶
We could relax the static constraints by having the Process
store an Rc<Host>
instead of a &Host
. This is closer to what we do in current Shadow code; it introduces a cycle in the object graph, but we break that cycle when tearing down a Host
by clearing its references to its Process
es.
In Rust it's probably better to store a Weak<Host>
instead. This ensures we don't have a cycle in the graph, but forces the Process
to do a (cheap) upgrade
to an Rc<Host>
before it can actually operate on it.
mod weak_ref {
use std::rc::{Rc, Weak};
use std::collections::HashMap;
use std::cell::RefCell;
pub struct Host {
next_pid : RefCell<i32>,
processes : RefCell<HashMap<i32, Rc<Process>>>,
}
impl Host {
pub fn new() -> Host {
Host { next_pid : RefCell::new(0), processes : RefCell::new(HashMap::new())}
}
pub fn get_process(&self, pid : i32) -> Rc<Process> {
self.processes.borrow().get(&pid).unwrap().clone()
}
}
// Implementing Drop is now ok.
impl Drop for Host {
fn drop(&mut self) {
println!("Dropped host!");
}
}
// To create a weak reference for the host, we need its `Rc` wrapper; `self` is insufficient.
// Since passing `self` would be redundant (and an opportunity to provide inconsistent inputs),
// I pulled this out to a standalone function.
pub fn spawn_process(host: &Rc<Host>) {
let weak = Rc::downgrade(host);
host.processes.borrow_mut().insert(*host.next_pid.borrow(), Rc::new(Process::new(weak)));
*host.next_pid.borrow_mut() += 1;
}
pub struct Process {
host: Weak<Host>,
}
impl Process {
fn new(parent : Weak<Host>) -> Process {
Process{host: parent}
}
pub fn run(&self) {
// We need to upgrade the weak pointer to an Rc pointer. The `unwrap` will panic
// at runtime if the `Host` no longer exists.
let h = self.host.upgrade().unwrap();
println!("Process method, accessing Host. Host's next_pid: {}", *h.next_pid.borrow())
}
}
}
{
use weak_ref as m;
use std::rc::Rc;
let host = Rc::new(m::Host::new());
m::spawn_process(&host);
let process = host.get_process(0);
process.run();
}
What about threads?¶
So far all of the above doesn't use any locks. There are some run-time checks in Rc
and RefCell
, but they are lockless. RefCell
does implement Send
, meaning we can transfer it and its containing structs between threads, but Rc
implements neither Send
nor Sync
. An Rc
can't be transferred between threads because there could be other outstanding Weak
or Rc
objects referring to the same underlying data, and it wouldn't be safe to have those being accessed by different threads.
{
use weak_ref as m;
use std::rc::Rc;
use std::thread;
let h = Rc::new(m::Host::new());
m::spawn_process(&h);
let worker = thread::spawn(move || {
let p = h.get_process(0);
p.run();
});
}
We could simply replace these with their thread-safe counterparts Arc
and Mutex
, but that would introduce a fair amount of locking overhead, which shouldn't be necessary given that the whole object graph conceptually only belongs to a single thread at once.
Unfortunately, with this design its difficult to prove to the Rust compiler that the whole object graph is transferred from one thread to another -- that there aren't any lingering references. We could circumvent Rust's thread-safety checks by using unsafe
to "smuggle" a Host
across threads as a raw pointer. This should be safe as long as we don't accidentally hang onto any references to the graph's internals, nor the graph stores references to any unlocked data structures that don't conceptually belong to that Host
.
{
use weak_ref as m;
use std::rc::Rc;
use std::thread;
let h = Rc::new(m::Host::new());
m::spawn_process(&h);
// Smuggle the host into the worker thread as a pointer, disguised as an integer.
// I *think* this is safe as long as graph of objects reachable from the host really
// is partitioned from all other objects. e.g. Rust wouldn't know to stop us here if
// we continued to hold another Rc to the host or its internals, but accessing those
// objects before we smuggle ownership of the host back to this scope would be undefined behavior.
let host_ptr = Rc::into_raw(h) as usize;
let worker = thread::spawn(move || {
let h = unsafe {
Rc::from_raw(host_ptr as *const m::Host)
};
let p = h.get_process(0);
p.run();
// Smuggle the host back to the scheduler.
Rc::into_raw(h) as usize
});
let h = unsafe { Rc::from_raw(worker.join().unwrap() as *const m::Host) };
}
The above approach is probably workable, but gives up some of Rust's safety. It's pretty much the model of how safety is maintained in the current C code, and it mostly works, but going with this model puts a damper on taking advantage of Rust's "fearless concurrency".
Passing the graph around explicitly¶
Instead of objects storing references to other parts of the object graph, we could pass them in as method parameters when appropriate. e.g. since the Host
should be the root of the graph that any of these need to access, we could pass it around to most methods, and in the objects themselves only store identifiers that can be used to look up parts of the graph as needed (e.g. pid
s, tid
's, etc).
Note though that from a Host
method, we can't pass a mutable reference to the Host
itself to a method on one of the child object it owns, since that'd invalidate the reference to the child:
mod passing_mutable_graph_root {
use std::collections::HashMap;
pub struct Host {
processes : HashMap<i32, Process>,
}
impl Host {
fn run_processes(&mut self) {
for (_pid, process) in self.processes.iter() {
process.run(self);
}
}
}
pub struct Process {}
impl Process {
fn run(&self, _host : &mut Host) {
}
}
}
So, we'd still need to pass around immutable references and use RefCell
or similar to implement internal mutability.
Since we're not using Rc
wrappers though, the Host
doesn't actually have a way to return a reference to other objects it owns. Here's a failed attempt:
mod passing_immutable_graph_root {
use std::collections::HashMap;
use std::cell::{Ref, RefCell};
pub struct Host {
next_pid : RefCell<i32>,
processes : RefCell<HashMap<i32, RefCell<Process>>>,
}
impl Host {
pub fn new() -> Host {
Host { next_pid : RefCell::new(0), processes : RefCell::new(HashMap::new())}
}
pub fn get_process<'a>(&'a self, pid : i32) -> Ref<'a, Process> {
self.processes.borrow().get(&pid).unwrap().borrow()
}
pub fn spawn_process(&self) {
let mut next_pid = self.next_pid.borrow_mut();
let mut processes = self.processes.borrow_mut();
(*processes).insert(*next_pid, RefCell::new(Process::new(*next_pid)));
(*next_pid) += 1;
}
}
pub struct Process {
pid : i32,
}
impl Process {
fn new(pid : i32) -> Process {
Process{pid}
}
pub fn run(&self, host: &Host) {
println!("Process method, accessing Host. Host's next_pid: {}", *host.next_pid.borrow())
}
}
}
{
use passing_immutable_graph_root as m;
let h = m::Host::new();
h.spawn_process();
let p = h.get_process(0);
p.run(&h);
}
Using arenas¶
The problem above is due to nested ownership. Since processes are nested under the host, we can't borrow a single process without also borrowing the host to which it belongs.
We can get it working by flattening out ownership, passing around a "dumb" arena object rather than the Host
itself. This lets us do most/all of the RefCell
manipulation when borrowing individual objects from the arena. As expected, we can safely transfer the arena between threads.
mod passing_arena {
use std::collections::HashMap;
use std::cell::{Ref, RefCell};
pub struct HostArena {
pub host : RefCell<Host>,
pub processes : RefCell<HashMap<i32, RefCell<Process>>>,
}
impl HostArena {
pub fn new() -> Self {
Self { host: RefCell::new(Host::new()), processes : RefCell::new(HashMap::new())}
}
}
pub struct Host {
// Notice that internals of Host no longer necessarily need to be wrapped in RefCell,
// since the host itself is wrapped in a RefCell.
next_pid : i32,
}
impl Host {
pub fn new() -> Host {
Host { next_pid : 0 }
}
pub fn spawn_process(&mut self, host_arena : &HostArena) {
let mut processes = host_arena.processes.borrow_mut();
(*processes).insert(self.next_pid, RefCell::new(Process::new(self.next_pid)));
(self.next_pid) += 1;
}
}
pub struct Process {
pid : i32,
}
impl Process {
fn new(pid : i32) -> Process {
Process{pid}
}
pub fn run(&self, host_arena : &HostArena) {
println!("Process method, accessing Host. Host's next_pid: {}", host_arena.host.borrow().next_pid)
}
}
}
{
use std::thread;
// Set up the host arena with one process on one thread...
use passing_arena as m;
let host_arena = m::HostArena::new();
{
// Limit the scope of the mutable borrow.
let mut host = host_arena.host.borrow_mut();
host.spawn_process(&host_arena);
}
// Move the arena into a worker thread and run it there.
// Since we're no longer using Rc, the arena is `Send`
// (can be safely transferred between threads).
let worker = thread::spawn(move || {
let processes = host_arena.processes.borrow();
processes.get(&0).unwrap().borrow().run(&host_arena);
});
worker.join();
}
Dealing with mutation cycles (e.g. fork)¶
This gives us the start of a workable model. We'll still need to be careful to avoid patterns of circular mutable borrows, which will result in panic
s. Such cycles could represent real bugs though, and such panic
s are preferable to the more subtle problems that might arise in corresponding C code.
One clear problem we'll run into is when the execution of a Process
wants to call fork
- i.e., create another process on the host. Since the collection holding the Process
will have already been borrowed to call Process::execute
, trying to insert a new Process
into that collection will fail (ht @ stevenengler for spotting this). We have a few ideas for breaking the cycle in such cases:
- Have the return-type of
Process::execute
be anEnum
message to the parentHost
which can request that theHost
performs some action, such as adding a providedProcess
object to theHostArena
, and then callsexecute
again to continue. - Have
Process::execute
push the new process and/or a callback onto an event queue to be handled later. - Instead of
HostArena
providing direct access toHashMap
s of objects, create (or find) an abstraction that could keep lists of pending additions and deletions. It could provide an api likefn run_with<F: CallOnce>(&self, id: i32, f : F)
that borrows the internalHashMap
, borrows the object (checking for relevant pending insertions or deletions), runs the function, and then performs the pending operations if there are no other outstanding borrows. Likewise it could provideinsert
anddelete
operations that push to the corresponding operation lists if the mainHashMap
is currently borrowed.
We're still thinking if there are other approaches we could take, including other ways we can tweak Shadow's design to fit better into Rust's ownership model. Happy to hear ideas :)