User:Ehsan/Safe C++
This page tries to explain some rough ideas about what a safe subset of C++ suitable for use inside Gecko would look like. The ideas below mostly focus on the high level semantics, and there is a lot of details to be figured out still.
Note that the goal here is protecting against use-after-free bugs, and also some runtime crashes. This proposal specifically doesn't protect against data races as it does not (yet) address mutability.
Contents
Owning references
When creating a new object on the heap, use OwningRef<T>, for example:
OwningRef<T> object(1, 2, 3); // instead of nsRefPtr<T> object(new T(1, 2, 3))
OwningRef's do not convert to a raw pointer. They do provide an operator->(), which can *only* be called as part of a member access. This part will be ensured through custom analysis.
OwningRef's cannot be copied. They can hand out borrowed refs, explained below. They can be moved inside a scope (possibly to the caller of the function) if they haven't handed out borrowed refs. If they have, a custom analysis will prevent the code from being compiled.
After an object has been moved, calling operator->() on it is a compile time error. This will be ensured through a custom analysis.
void good() { OwningRef<T> object(1, 2, 3); object->DoSomething(); OwningRef<T> newObject(std::move(object)); // compiler is happy! newObject->DoSomethingElse(); } void bad() { OwningRef<T> object(1, 2, 3); T* sneaky = object.operator->(); // error: operator->() can only be used to access the members. nice try though! OwningRef<T> newObject1(std::move(object)); object->DoSomething(); // error: operator->() cannot be used after the object has been moved. BorrowedRef<T> borrow1(object.Borrow()); goo(borrow1); BorrowedRef<T> borrow2(object.Borrow()); quo(borrow2); OwningRef<T> newObject2(std::move(newObject1)); // error: Cannot move object because of outstanding borrowed refs! } void takeBorrowed(BorrowedRef<T>); void takeOwnership(OwningRef<T>&&); void passer() { OwningRef<T> object(1, 2, 3); takeBorrowed(object.Borrow()); takeOwnership(object); // This moves the object object->DoSomething(); // error: operator->() cannot be used after the object has been moved. BorrowedRef<T> borrowed(object.Borrow()); // error: Borrow() cannot be used after the object has been moved. } // Returning an OwningRef to the caller OwningRef<T>&& goodCreator() { OwningRef<T> object(1, 2, 3); goo(object.Borrow()); // Pass a borrowed ref in a temporary to a function return std::move(object); } OwningRef<T>&& badCreator() { OwningRef<T> object(1, 2, 3); BorrowedRef<T> borrowed(object.Borrow()); goo(borrowed); // Pass a borrowed ref in a non-temporary to a function return std::move(object); // error: Cannot move object because of outstanding borrowed refs! }
I believe that all of the checks required for OwningRef should be enforceable at compile time.
Borrowed references
Borrowed references can be obtained from owning references. It is guaranteed at compile time that a BorrowedRef cannot outlive the *validity* of the OwningRef that handed it out. Note that the validity is potentially shorter than the lifetime of the object. See bad()
above, for example.
Borrowed references can't be copied (unless the copy happens when passing a BorrowedRef to another function that accepts one) and they can never be moved. Because they can't be copied, they also can't be returned from a function. They cannot be used to obtain a raw pointer out of the object, and similar to OwningRef, their operator-> will only be usable for member access.
One very common use case of BorrowedRef is for passing pointers to functions where the function doesn't want to own the data passed to it.
Instantiating a BorrowedRef with global storage is a compile-time error. This is necessary in order to prevent creating global borrowed refs that would prevent doing anything with the OwningRef they were created from after they are assigned to.
Instantiating a BorrowedRef as a class member is a compile-time error. This is necessary because it's impossible to represent a lifetime variable in C++ similar to Rust, which makes it impossible to perform lifetime checks on such members at compile time. SharedRef must be used instead.
BorrowedRef<T> gBorrowed; // error: Cannot instantiate BorrowedRef with global storage. struct S { static BorrowedRef<T> sBorrowed; // error: Cannot instantiate BorrowedRef with global storage. BorrowedRef<T> mBorrowed; // error: Cannot instantiate BorrowedRef as a class member. };
Similar to the above, instantiating a BorrowedRef from a member of another BorrowedRef is an error too:
struct S { OwningRef<T> mOwned; }; BorrowedRef<S> borrowedS /* comes from somewhere */; BorrowedRef<T> borrowedT = borrowedS->mOwned.Borrow(); // error: Can't borrow something from a BorrowedRef
I believe that all of the checks required for BorrowedRef should be enforceable at compile time.
Shared references can either be obtained from OwningRefs or as a short-hand, directly created. SharedRefs can be borrowed from with the exact same semantics as OwningRef. When creating a SharedRef from an OwningRef, the same semantics apply as if we'd created a BorrowedRef from the OwningRef.
SharedRefs can be moved to other SharedRefs with the same semantics as moving OwningRefs.
void good() { SharedRef<T> object(1, 2, 3); object->DoSomething(); SharedRef<T> newObject(std::move(object)); // compiler is happy! newObject->DoSomethingElse(); } void bad() { SharedRef<T> object(1, 2, 3); T* sneaky = object.operator->(); // error: operator->() can only be used to access the members. nice try though! SharedRef<T> newObject1(std::move(object)); object->DoSomething(); // error: operator->() cannot be used after the object has been moved. BorrowedRef<T> borrow1(object.Borrow()); goo(borrow1); BorrowedRef<T> borrow2(object.Borrow()); quo(borrow2); SharedRef<T> newObject2(std::move(newObject1)); // error: Cannot move object because of outstanding borrowed refs! }
SharedRefs, as opposed to OwningRefs, can be copied. Copying a shared ref increments its reference count (as opposed to moving it.) Multiple SharedRefs cannot be obtained from the same OwningRef object. This will be implemented as a custom analysis.
void bad() { OwningRef<T> owning(1, 2, 3); SharedRef<T> shared1(owning.Share()); SharedRef<T> shared2(owning.Share()); // error: Share() cannot be used after the object has been shared. SharedRef<T> shared3(shared1); SharedRef<T> shared4(std::move(shared1)); // super efficient, no refcount twiddling take(owning.Borrow()); // error: Borrow() cannot be used after the object has been shared. take(shared1.Borrow()); // fine! }
The similar semantics of OwningRef and SharedRef makes it possible to write functions that accept BorrowedRefs that don't claim ownership over their arguments, but also don't dictate what kind of reference the caller holds.
SharedRef's main use case is for class members. As a rule of thumb, if the class owns something, the programmer should use an OwningRef member, and if it doesn't, they should use a SharedRef member.
Some of the properties of SharedRef can be statically enforced. Their other properties will be enforced through reference counting at runtime. Two different implementations of SharedRef can be conceived: an XPCOM style where the reference count is stored inside the object, and an std:::shared_ptr style where the reference count is stored inside the smart pointer. The latter model is more flexible, but it's very likely that we can use the former model too with a very similar interface possibly without some facilities such as constructing a SharedRef from an OwningRef (since creating an OwningRef to an object with a refcount inside it is meaningless!) Some details will need to be clarified here.
Unsafe references
C++ makes it impossible to use smart pointers inside unions. For this reason, we need a "safe" way to extract a raw pointer from OwningRef. This can be implemented using a helper class like this:
void extractPtr() { OwningRef<T> object(1, 2, 3); UnsafeRef<T> thief(object.Steal()); SharedRef<T> shared(object.Share()); // error: Share() cannot be used after the reference has been stolen BorrowedRef<T> borrowed(object.Borrow()); // error: Borrow() cannot be used after the reference has been stolen T* guts(theif.Extract()); thief.Extact(); // error: extract cannot be called after the reference has been extracted }
Extract() is the only supported operation on UnsafeRef. These objects cannot be copied, moved, assigned, etc. The destructor of UnsafeRef can delete the object if it has not been extracted yet.
UnsafeRef objects can only be allocated on the stack.
Disallowed C++ language features
The safety properties that we desire are inherently incompatible with two main classes of C++ features:
- Unchecked references to objects. These are C++ pointers and references to objects (not primitive types). Since the language doesn't offer a way to tag them with ownership information, the usage of them in Safe C++ is completely prohibited.
- Raw arrays of objects, again, since the language doesn't offer a way to tag them with ownership information.
- Direct control over lifetimes. For obvious reasons, giving the programmer direct control over lifetimes goes against the goals of Safe C++. C++ keywords new, new[], delete and delete[] are completely prohibited in Safe C++ code.
Usafe C++
Not everything can be expressed in Safe C++. We would need to come up with some syntax sugar to turn off extra compiler checks in some places. Example:
void ugly() { unsafe(); // magic syntax declaring the block as unsafe T* object = new T(1, 2, 3); delete object; DoStuff(object); // game over! }
Mutability
Making a C++ code base const correct is very difficult. But assuming that works, we should be able to borrow some of Rust's mutability handling mechanisms as well. I have no good ideas here that are baked enough yet.
One part of Rust's semantics that should be possible to implement without relying on const correctness much would be to disallow operator->() on an OwningRef while there are outstanding borrowed refs to it. This will implement something similar to the idea of freezing in Rust, but it remains to be seen how ergonomic that would be (given that outlawing operator-> in that case prevents read-only access to the underlying data as well.)
TODO
- How to deal with copy constructors?