Basic Learnings
Introduction
This is a collection of non-obvious key knowledge that I found necessary to get comfortable with rust.
Types
Unit Type
The unit type ()
is equivalent to TypeScript's void
type and undefined
value.
Arrays
Array bounds are checked at runtime.
Statements / Expressions
With a semicolon you get a statement, without you get an expression.
Expressions are values that you need to use/assign/return.
Expressions can can be put inside their own scopes {...}
.
The expression scopes can have statements inside and only if they end with an expression they return a value other than ()
If the last element in a function is a statement, it is automatically returned.
Comments
There are only single line comments. 2 slashes for standard comments (//
) and 3 for documentation comments (///
). See Publishing a Crate to Crates.io.
Vocabulary
Code branches are also referred to as "arms".
If
Ifs are expressions, so they can be used on variable assignment; just don't put semicolon at the end of the arms/branches:
Iterating
You have loop
, while
, and for
.
loop
loop
iterates forever unless you break it. loop
is an expressions and returns the argument you pass to `break``, if you want to:
for ... in ...
The for
loops are like the python ones or the for ... of
of JavaScript. Example:
while
Use a while
loop if you want to manage counters yourself.
Loop labels
Loops (loop
, for
, while
, ) can be labeled do disambiguate break
and continue
statements in nested loops. Loop labels always start with a single quote '
:
String VS string literal
A string literal has fixed size defined at compile time. A String has unknown size and belongs to the heap. To create a String from a string literal use:
Strings are treated as pointers, not values.
Stack and Heap
The tack
The stack is for fixed size elements. The stack memory access is last in, first out. Pointers to the heap can be stored in the stack. Writing to the stack is faster because there is no scanning in th heap for a gap of sufficient size, the location is always the top of the stack.
When you pass arguments to a function, those are pushed to the stack, and they are popped off when the function finished.
The Heap
The Heap allows elements for which size is not known at compile time or size changes. The heap always stores pointers. This is called allocating on the heap or allocating. Accessing the heap is slower because it has to follow a pointer.
Ownership
The goal of ownership is to help you manage the stack and the heap. Ownership rules:
Each value in Rust has an owner.
There can only be one owner at a time.
When the owner goes out of scope, the value will be dropped.
Allocating and deallocating
Allocating ends at the end of the scope. Rust calls drop
automatically at the end of the scope to free the memory.
Ownership: moving vs copying
Moving (or transferring ownership)
When we assign a pointer to another variable, the initial variable becomes invalid and cannot be accessed unless the value implements the Copy
trait, in which case a copy of the contents will be created. For Strings or other objects you can .clone()
them. This process of assigning a pointer to another variable and this losing access to the initial variable is called moving:
Rust by design will always make you use .clonse()
when you want to make copies of information stored in the heap in order to make aware of expensive operations.
Copy
trait
It is reserved for stack types for performance reasons. It gives a compilation error if it is added to a type that implements the Drop
trait.
It is implemented mostly for scalar types and tuples of types that implement the Copy
trait.
Calling functions
Calling functions either moves or copies the variables to the function. Moving a variable into a function is called transferring ownership.
Function return values are also moved out of the scope of the function and up to the caller.
Borrowing
We can borrow variables many times in read-only manner, but in write mode (mut
) the borrowing is exclusive. This prevents data races.
Create scopes to create multiple non-simultaneous mut
references
This is ok because there are no uses of the read-only references before the mutable reference:
Slices
Slices are references to parts of other elements (slice of a string or an array):
With slices, we have guaranteed that the source cannot be modified until we are done working with one of its slices.
Functions that return slices
A function can return a slice of a reference that has been borrowed to them:
If you borrow many arguments, you need to use lifetimes. A lifetime is a label prefixed with a single quote that is placed between the &
and the variable name. It needs to be declared after the function name between <
and >
. Lifetimes are a way of saying:
Return statements
Return statements can short-circuit functions anywhere:
Structs
Types of structs
Tuple structs
Unit-like structs
Useless alone, but traits and enums will add more juice to the recipe.
Traditional structs
Like C structs or TypeScript types.
Struct update syntax
You can take the remaining values from another struct of the same type:
Relevant points:
It has to appear last
It moves ownership to the new struct!
Borrows need lifetimes
Mix & Match
With a generic:
With lifetime related to scope but not to returned value.
Lifetime elision
Obvious cases don't require lifetime annotation. As the compiler evolves, more obvious cases might not need lifetime annotation.
Lifetimes on methods
All return values of a method by default get the lifetime of &self
' static
lifetime
The 'static
lifetime denotes that the lifetime is the whole duration of the program. Literal strings are implicitly 'static
, because they are hardcoded in the binary.
println! for structs
structs don't implement the std::fmt::Display
so we cannot send them right away to println!
. println!
can debug structs using the specifiers {:?}
and {:#?}
for pretty print. Unfortunately, structs also don't implement the Debug
trait, so that won't work either. But there is a quick trick: we can annotate the structs with #[derive(Debug)]
and voilà!
Output:
dgb!
dbg!
allows you to log to STDERR expressions. It either moves or accepts borrowing, and returns the provided move/borrow. Examples:
outputs:
Methods
You can create methods for structs, enums or trait objects. Create additional impl
blocks. There can be more than one for a given type. Example impl
:
First paramater must be called self
and have type Self
. Self is a reference to the type. We can avoid the Self type though. You can use &self
, &mut self
or self
. The latter is a move and it is a rare use case; this technique is usually used when the method transforms self into something else and you want to prevent the caller from using the original instance after the transformation.
Methods and fields can have the same name; invocation parenthesis disclose which one you look for.
Associated functions
impl
blocks can contain functions with no reference to Self
. These are just function, not methods. Methods and functions inside impl are called associated functions. They are invoked as <Type>::<function>
. Example:
Enums
Pattern matching
EVIL SHADOWING!!!! 🤬
ifs to the rescue!
@ Bindings
The at operator @ lets us create a variable that holds a value at the same time as we’re testing that value for a pattern match:
Destructuring
if ... let
Crates, modules and packages
Crate
It is the minimum compilation unit.
There can be many application crates, but only one library crate.
The root application crate is
src/main.rs
.The library crate is
src/lib.rs
.Additional application crates are in
src/bin/<appliation crate>.rs
.
Module
Modules are declared inside root files (src/main.rs
or src/lib.rs
) with mod whatever_module;
.
The code of the modules can live:
In a block of code placed instead of a semicolon right after
mod whatever_module
.In a file named
src/whatever_module.rs
.In a file named
src/whatever_module/mod.rs
. Old stile; avoid.
Submodules can be declared inside other modules, recurring the same 3 pattern above. Examples:
Inside code:
In files:
Example with nesting in one file:
Referencing modules inside the crate
Absolute path (crate == root)
Relative path
use
example:
Relative paths
super::
: Parent module.
Visibility
RULE 1: Everything is private by default
If a module is declared with pub
(pub mod whatever_module
) then it is visible by the parent of the modulewhere it was defined. Same applies to modules' items.
RULE 2: struct attributes are also private by default:
RULE 3: Submodules see everything from their super modules
All submodules are defined within the context of the parent module and they have full access to it.
Modules good practice
A package can contain both a src/main.rs
and src/lib.rs
. This means that it is an executable that also exposes its logic as a library. In this case all the module tree is defined under src/lib.rs
and we import it from src/main.rs
by using paths starting with the module name. Example: whaterver_page::module::submosule::Item
.
use
Use use
to bring a module or item into a scope. Also, aliases:
pub use
use an export... re-export
Not much to say. Expose in an upper level something that is nested.
Importing package
Also:
Collections
Vector
Vectors store elements in memory next to each other. New memory is allocated and elements might be copied to a new location as needed in order to make them be next to each other.
Creation
vec!
is a macro to created populated vector. It allows to create readonly vec right away.
Otherwise:
Reading vector
Iterating vectors
Use for ... in
because it guarantees immutability of the vector during the iteration.
String
String indexing &s[1]
Not possible. Internally strings are Vec<u8>
... bytes... big trouble with UTF-8... so not possible. Period.
Also, the only way to go to position N in a UTF-8 string is to scan the string, slow performance and unpredicted time, better use functions that remind you that.
String slicing &s[a..b]
This is allowed, but if you slice in the middle of an UTF-8 char, panic and game over... watch out!
Iterating strings
WARNING: On UTF-8 grapheme might be composed of more than one UTF-8 char. Those are called clusters of UTF-8 chars.
HashMap
Not included in the prelude, no macro to simplify its usage and can be iterated with a for in
loop:
HashMaps move
Insert if not present and read:
We don't need to re-apply, we can update a reference to the value because .or_insert
returns a mutable reference &mut V
HashMap's has function can be customized
The default one, SipHash, is slow but safe against DoS attacks.
Error handling
panic!
macro
Unrecoverable error. Behavior:
Show error message or show backtrace (AKA stacktrace) with the environment variable
RUST_BACKTRACE=1
and compiled with debug symbols.Unwind the stack or quit right away (use the latter when small binary is top priority).
Cargo.toml
:
Result<T, E>
match
is considered too verbose, to rustacians prefer unwrap_or_else
.
If you feel like using unwrap
, use expect
instead, because it gives a meaningful error message.
?
operator in Result
The ?
operator is almost equivalent to "unwrap Ok
or return Err
". The difference is that ?
calls the from
function from the From
trait on the error type of the function, which transforms values, so the error received by ?
is transformed into the error type of the function. These 2 pieces of code are equivalent:
This is how the from
is created to transform io::Error
into OurError
:
And, obviously, it is chainable (but, for peace of mind, try to avoid one-liners):
But, in the future, just use a function that does it all if available, like fs::read_to_string("hello.txt")
for this case.
?
operator in Option
Same but, instead of Err
you get None
.
main
can return a Result
The main function may return any types that implement the std::process::Termination
trait, which contains a function report that returns an ExitCode.
When to panic!
or return Result
Examples, Prototype Code, and Tests
Prototyping is more comfortable with unwrap
or expect
. Just don't leave it there.
In tests, panic!
, unwrap
or expect
.
When explaining by example, avoid boilerplate too.
Cases in Which You Have More Information Than the Compiler
When you know it is never going to fail. In this case, use expect
to provide documentation on the decision. Here, mentioning the assumption that this IP address is hardcoded will prompt us to change expect to better error handling code if in the future, we need to get the IP address from some other source instead
Guidelines for Error Handling (from official docs, I do not fully agree)
panic!
if:
You receive values that don't make sense. (user's input might not make sense, but it is ok, don't
panic!
here).After a certain point you need a specific state in order to ensure security/safeness.
Types are not enough to handle correctness in your code.
An external library returns something that it should have not.
Panicking is a way of stating that a developer messed up and so the developer has to go back to the code and fix something.
When failure is expected, use Result
. For example, an HTTP request might fail, but it is ok.
Use types to avoid runtime checks
The type system is powerful enough (and sometimes more powerful that the human mind) to keep code safe at compile time without the need of runtime checks + panic!
calls.
Example where pure types are not enough. This type, Guess
is trusted to contain a value between 1 and 100, so you can use it blindly if you need numbers between 1 and 100, and you only need to pay attention when instantiating it from user input (or any other side effect).
Generics
In rust, generics are resolved at compile time; no runtime overhead.
Generics need to be implemented for each type. For example, the following Point
implementation only has distance_from_origin
for the type f32
:
Traits
A trait defines functionality a particular type has and can share with other types. They are similar to OOP interfaces; they are a collection of functions that need to be implemented to satisfy the given trait. This allows to narrow down generic types.
In order to use the functions of a trait we have to bring the trait into scope:
Trait restriction
In order to implement a trait into a type, either the trait or the type must be local to the crate. This ensures that No 2 different crates implement the same traits in the same types, resulting in no need for disambiguation.
Default trait implementation
Traits can have a default implementation that can be overridden:
Then we don't implement the method to keep default behavior:
NOTE: Once overridden, there is no way to access the default one.
Traits as parameters
Syntactic sugar follows:
Many traits with +
Alternate syntax
Returning traits
Watch out! Even if your function returns a "generic", only one type can be returned. Error ahead:
Implement traits on generics
Blanket implementations: traits on traits
For example, the standard library implements the ToString
trait for any type that implements the Display
trait:
Implementors is how blanket implementations are referenced to in the documentation.
Closures
Closures are functions that capture the scope. Syntax is
|<args>| code
.Closures usually don't need type annotations for parameters or return type. Example of annotated closure:
There are equivalent:
Types of the close are inferred only once:
Capturing References or Moving Ownership
Closures can capture values from their environment in three ways, which directly map to the three ways a function can take a parameter: borrowing immutably, borrowing mutably, and taking ownership. The closure will decide which of these to use based on what the body of the function does with the captured values.
Capturing immutable reference
Capturing mutable reference
move
: giving ownership to the closure (useful to pass data to threads)
Moving Captured Values Out of Closures and the Fn
Traits
Closures will automatically implement one, two, or all three of these Fn traits, in an additive fashion, depending on how the closure’s body handles the values:
FnOnce
applies to closures that can be called once. All closures implement at least this trait, because all closures can be called. A closure that moves captured values out of its body will only implementFnOnce
and none of the otherFn
traits, because it can only be called once.FnMut
applies to closures that don’t move captured values out of their body, but that might mutate the captured values. These closures can be called more than once.Fn
applies to closures that don’t move captured values out of their body and that don’t mutate captured values, as well as closures that capture nothing from their environment.
Note: Functions can implement all three of the Fn
traits too. If what we want to do doesn’t require capturing a value from the environment, we can use the name of a function rather than a closure where we need something that implements one of the Fn
traits. For example, on an Option<Vec<T>>
value, we could call unwrap_or_else(Vec::new)
to get a new, empty vector if the value is None.
BOOM!
Ok
Iterators
Iterators are lazy. For example, calling .map
does nothing unless we call .next()
.
To transform an Iterator
into a Vec
we can call .collect()
Iterators get "consumed": they can be iterated only once.
Iterators implement the Iterator
trait:
The iter
method of Vec
produces an iterator over immutable references. If we want to create an iterator that takes ownership of v1 and returns owned values, we can call into_iter
instead of iter
. Similarly, if we want to iterate over mutable references, we can call iter_mut
instead of iter
.
Some methods of the iterator, like sum
, consume the iterator.
Zero cost abstracgtions
The rust compiler is smart:
Rust knows that there are 12 iterations, so it “unrolls” the loop. Unrolling is an optimization that removes the overhead of the loop controlling code and instead generates repetitive code for each iteration of the loop.
All the coefficients get stored in registers, which means accessing the values is very fast. There are no bounds checks on the array access at runtime. All these optimizations that Rust is able to apply make the resulting code extremely efficient.
Smart pointer
Smart pointers, are data structures that act like a pointer but also have additional metadata and capabilities.
String
and Vec<T>
are smart pointers: they own some memory, they allow you to manipulate and they also have metadata and extra capabilities or guarantees.
Smart pointers are usually implemented using structs, and they also implement the Deref
and Drop
traits. The Deref
trait allows an instance of the smart pointer struct to behave like a reference. The Drop
trait allows you to customize the code that’s run when an instance of the smart pointer goes out of scope.
Common smart pointers
Box<T>
for allocating values on the heapRc<T>
, a reference counting type that enables multiple ownershipRef<T>
andRefMut<T>
, accessed throughRefCell<T>
, a type that enforces the borrowing rules at runtime instead of compile time
Using Box<T>
to Point to Data on the Heap
Boxes allow you to store data on the heap rather than the stack. What remains on the stack is the pointer to the heap data. Boxes don’t have performance overhead, other than storing their data on the heap instead of on the stack. Mostly used when:
Dynamic sized type that needs to be passed in static size.
Transfer ownership of large data without copying it.
When you want to own a value that implements a specific trait.
Example of useless box because a pointer to an i32 has no advantage over the value itself. When a boxed is passed the pointer gets copied anyway:
In other words, it is like a pointer.
This does not compile because the compiler cannot know the size of List.
This compiles because here the size of List is always size of i32
+ size of usize
(pointer size):
Smart Pointers
Pointers/references and de-references
Deref
trait: treating a type like a reference
Rust will call .deref()
automatically when dereferencing a type that implements the Deref
trait, so we can simply do *customTypeValue
.
Deref
cohercion
We can pass a type that dereferences to another type to a function that expects a reference this other type. This is recursive.
DerefMut
Same but with mutable references. Equivalences:
From &T to &U when T: Deref<Target=U>
From &mut T to &mut U when T: DerefMut<Target=U>
From &mut T to &U when T: Deref<Target=U>
A 4th case would make sense.
Drop
trait (AKA destroy or destructor)
Code to be executed when a smart point gets out of scope. Used to close file handles, conections, free ram, etc.
Dropping a Value Early with std::mem::drop
If you need to drop something before it gets out of scope, do
Rc<T>
: single thread reference counting
Rc::new
: create anRc
.Rc::clone(&)
: create another reference to anRc
.Rc::strong_count
Rc::weak_count
RefCell<T>
: interior mutability pattern
When the rust compiler fails to accept correct code you can use unsafe. Using this unsafe to mutate an immutable reference is the RefCell<T>
use case.
When you need to mutate an immutable value, you store the value inside a RefCell::new(xxx)
and then you can do .borrow_mut()
to get a mutable reference to the value or .borrow()
to get an immutable reference to the value. Since this has clear issues at runtime, even though it compiles if we try to get 2 mutable references at the same time or try to create a mutable reference while immutable references exist we don't get compile errors but we get runtime errors if we don't do it right.
Preventing Reference Cycles: Turning an Rc<T>
into a Weak<T>
Rc::downgrade
returns a Weak<T>
. They go to weak_count instead of strong_count and references cycles with will be cleaned up as soon as the strong_count reaches zero. Weak references could be gone, so upgrade() -> Option<Rc<T>>
needs to be called in order to check if they still exists. Example
Threads and concurrency
Threads
Passing messages (moving values)
Sharing state
Arc<T>
and Mutex<T>
. Mutex provides a thread safe lock, Arc is "atomic reference counter", which allows many threads to share the Mutex... watch out for deadlocks! (threads waiting for each other's lock)
Send
and Sync
marker traits
NOTE: marker traits are language features, not library traits.
The Send
marker trait indicates that ownership of values of the type implementing Send can be transferred between threads. Almost every Rust type is Send, but not all, like Rc<T>
.
The Sync
marker trait indicates that it is safe for the type implementing Sync to be referenced from multiple threads. In other words, any type T is Sync if &T (an immutable reference to T) is Send, meaning the reference can be sent safely to another thread. Similar to Send, primitive types are Sync, and types composed entirely of types that are Sync are also Sync.
Pattern matching round 2
Refutable patterns
if let PATTERN = EXPRESSION { ... }
while let PATTERN = EXPRESSION { ... }
Example:
Irrefutable patterns
for (x,y,z) in vec
loops (for tuples and defined known structures)let (x,y,z) = val
Function parameters:
fn print_coordinates(&(x, y): &(i32, i32)) {
Pattern syntax
Ignoring variables
Use an underscore _ in pattern matching
Prefix the name with an underscore
_x
Use 2 dots .. to ignore remaining parts of value
Match guards
if conditions that apply to an arm in pattern matching:
Unsafe rust
It enables the following unsafe operations:
Are allowed to ignore the borrowing rules by having both immutable and mutable pointers or multiple mutable pointers to the same location
Aren’t guaranteed to point to valid memory
Are allowed to be null
Don’t implement any automatic cleanup
Raw pointers
Raw pointers can be created in safe code, but they cannot be dereferenced unless we create an unsafe block.
Unsafe functions
They need to be called within unsafe blocks. The whole body of the functions is considered unsafe. Safe functions can have unsafe blocks!
Unsafe called kept to the bare minimum with an assert!()
call to make sure that we do the right thing:
extern
functions
The "C"
application binary interface (ABI) is the most common and follows the C programming language’s ABI
Calling Rust Functions from Other Languages
We add the extern keyword and specify the ABI to use just before the fn keyword for the relevant function. We also need to add a #[no_mangle] annotation to tell the Rust compiler not to mangle the name of this function.
Accessing or Modifying a Mutable Static Variable
static variable = global variable.
Use
SCREAMING_SNAKE_CASE
.Static variables can only store references with the 'static lifetime.
A subtle difference between constants and immutable static variables is that values in a static variable have a fixed address in memory. Using the value will always access the same data. Constants, on the other hand, are allowed to duplicate their data whenever they’re used. Another difference is that static variables can be mutable. Accessing and modifying mutable static variables is unsafe.
Implementing an Unsafe Trait
A trait is unsafe when at least one of its methods has some invariant that the compiler can’t verify.
Accessing Fields of a Union
Unions are primarily used to interface with unions in C code. We cannot access their values outside of unsafe blocks.
Read more about unions in The Rust Reference
Advanced traitss
Placeholder types
The implementor of a trait will specify the concrete type to be used instead of the placeholder type for the particular implementation.
Default Generic Type Parameters and Operator Overloading
Rust doesn’t allow you to create your own operators or overload arbitrary operators. But you can overload the operations and corresponding traits listed in std::ops by implementing the traits associated with the operator.
This is possible because there is a default type parameter <Rcs=Self>:
This allows us to overload for related types:
You’ll use default type parameters in two main ways:
To extend a type without breaking existing code
To allow customization in specific cases most users won’t need
Trait method disambiguation
This is the generic definition:
With Self
How to:
With no Self
How to:
SuperTraits that require specific traits:
Using the Newtype Pattern to Implement External Traits on External Types
The implementation of Display uses self.0 to access the inner Vec, because Wrapper is a tuple struct and Vec is the item at index 0 in the tuple. Then we can use the functionality of the Display trait on Wrapper.
The downside of using this technique is that Wrapper is a new type, so it doesn’t have the methods of the value it’s holding. We would have to implement all the methods of Vec directly on Wrapper such that the methods delegate to self.0, which would allow us to treat Wrapper exactly like a Vec. If we wanted the new type to have every method the inner type has, implementing the Deref trait (discussed in Chapter 15 in the “Treating Smart Pointers Like Regular References with the Deref Trait” section) on the Wrapper to return the inner type would be a solution. If we don’t want the Wrapper type to have all the methods of the inner type—for example, to restrict the Wrapper type’s behavior—we would have to implement just the methods we do want manually.
Advanced types
Type Aliases
The Never Type !
A type that never returns. In other works, a statement that will exit: continue
, panic!,
loop`, etc.
Dynamically Sized Types (DST)
https://doc.rust-lang.org/book/ch19-04-advanced-types.html#dynamically-sized-types-and-the-sized-trait
Advanced Functions and Closures
Function Pointers
As an example of where you could use either a closure defined inline or a named function, let’s look at a use of the map method provided by the Iterator trait in the standard library. To use the map function to turn a vector of numbers into a vector of strings, we could use a closure, like this:
Or we could name a function as the argument to map instead of the closure, like this:
Or initialize enums:
Returning Closures
Macros
This is a quick overview, see a more thorough explanation here: Rust Macros
Rust has 2 families of macros:
Declarative !macro_rules
Are written and have simplified implementation as they do pattern matching,.
Procedural marcos
They receive the code and work on it. There are 3 kinds:
Derive macros
Simplify a bit writing macros that automatically implement traits.
Attribute macros
Similar to derive macros, but they receive a parameter
Function like macros
They look like function calls.