Advanced Data Types
Strings
-
They are represented in three types:
-
String
- A smart pointer. -
&String
- Reference to a String. -
&str
- String Slice -
Defining a String
#![allow(unused)] fn main() { let string = String::new("127.0.0.1:8080"); // It can grow and shrink let string_literal = "1234"; // It's memory is fixed at runtime }
-
Slicing a string
#![allow(unused)] fn main() { let string = String::from("127.0.0.1:8080"); let string_slice = &string[10..14]; // We can also use &string[10..] // We can also use let string_slice = &string[10..]; // Give me everything after 10th byte not character let string_slice = &string[..12]; // Give me everything upto 12th byte not character }
-
Rust uses UTF-8 encoding. So, prefer not to not pass integer values for slicing, as the slice function slices on the basis of bytes instead of characters.
#![allow(unused)] fn main() { let string = String::from("😀😃😄😁"); let string_slice = &string[..4]; }
For this slice instead of returning 4 emojis, the rust will return 1 emoji because it takes 4 bytes to store an emoji.
string_slice = "😀"
-
Strings in rust can dynamically grow or shrink.
-
We can borrow an entire string by using this syntax
#![allow(unused)] fn main() { let string = String::from("127.0.0.1:8080"); let string_borrow: &str = &string; }
-
-
This is how
let s1 = String::from("hello");
is stored in Rust:- Left - Parts of String that are stored on the stack:
- Pointer: Points to the memory that holds the contents of the string.
- Length: How much memory, in bytes, the contents of the String is currently using.
- Capacity: The total amount of memory, in bytes, that the String has received from the allocator.
- Right - The memory on the heap that holds the contents.
- Left - Parts of String that are stored on the stack:
String Slicing
-
Slices let you reference a contiguous sequence of elements in a collection rather than the whole collection.
-
A slice is a kind of reference, so it does not have ownership.
-
Slices are represented by
&str
and are immutable.#![allow(unused)] fn main() { let s = String::from("hello world"); let hello = &s[0..5]; // or you can use &s[..5] let world = &s[6..11]; }
-
This will throw compile-time error:
// FAIL: You cannot clear the memory, to which some reference already exists fn main() { let mut s = String::from("hello world"); let word = first_word(&s); // Returns a &str, which is a referenc to s s.clear(); // error! println!("the first word is: {}", word); }
-
Thus, String Slices helps us write secure code by protecting the references to a string.
-
Also string literals
let string_literal = "hello";
, are string slices&str
, and are immutable.
Note: It is expected that the String only contains ASCII characters, because in case of UTF-8, if we try to slice between a multibyte character, it'll cause an error.
-
The correct way to use referencing is discussed int the section, "Which is better
&String
or&str
?". -
Difference between String, String Literal, String Slice
Property String String Literal String Slice Reference to String Definition let string = String::from("some_string");
let string_literal = "1234";
let string_slice = &string[1..3]
let string_reference = &string
Representation String
&str
&str
&String
Mutable :white_check_mark: :x: :x: :x: Memory Management Heap (but deallocates when out of scope) Heap (Points to binary) Heap (Points to Binary) Heap Use Cases Taking Input, or any String Manipulation Defining Constant Strings Slicing and Borrowing Borrowing
Strings and UTF-8 encoding
-
Characters are represented by single inverted commas, and has
4 bytes
of storage. For Example,'😀'
. -
String is not a collection of characters but collections of bytes.
-
Rust has only one string type in the core language, which is the string slice
str
that is usually seen in its borrowed form&str
. -
String Slices are the references to some UTF-8 data stored somewhere else.
-
String Literals are string slices when stored in program's binary.
-
The
String
type, which is provided by Rust’s standard library rather than coded into the core language, is a growable, mutable, owned, UTF-8 encoded string type. -
When Rustaceans, call "string in rust", they collectively mean:
String
&str
-
Both
String
and&str
are UTF-8 encoded. -
Creating the
String
type:#![allow(unused)] fn main() { let mut s = String::new(); }
-
To create a
String
from some starting string:#![allow(unused)] fn main() { let s = "initial contents".to_string(); // This fn can be used on any type that implements Display trait let s = String::from("initial contents"); }
-
It is possible to store any properly encoded data:
#![allow(unused)] fn main() { let hello = String::from("नमस्ते"); let hello = String::from("안녕하세요"); let hello = String::from("Здравствуйте"); }
-
Updating the String:
#![allow(unused)] fn main() { let mut s = String::from("foo"); s.push_str("bar"); // It takes string slice, hence doesn't takes ownership s.push('!'); // This fn only takes character as argument. // s will become "foobar!" }
-
Concatenating two strings with the
+
operator:#![allow(unused)] fn main() { // '+' is a replacement of - fn add(self, s: &str) -> String { let s1 = String::from("Hello, "); let s2 = String::from("world!"); let s3 = s1 + &s2; // note s1 has been moved here and can no longer be used }
Note: In Rust, if we provide &str
, as a function's argument, it can accept both &String
and &str
. Rust uses a deref coercion, which here turns &s2
into &s2[..]
.
-
Combining multiple strings or formatting them:
#![allow(unused)] fn main() { let s1 = String::from("tic"); let s2 = String::from("tac"); let s3 = String::from("toe"); // Method 1 let s = s1 + "-" + &s2 + "-" + &s3; // Method 2 let s = format!("{}-{}-{}", s1, s2, s3); // It works like println!() but returns String }
-
Indexing into Strings is not possible and results in error:
#![allow(unused)] fn main() { // FAIL: Strings can be indexed in Rust let s1 = String::from("hello"); let h = s1[0]; // Won't work }
-
How values are stored in string.
-
String is just a wrapper over
Vec<u8>
, this means1 byte
of space for each element in the vector. Hence, if we want to save special charcters, then it may take more than one element to store the values. -
Let's consider following examples:
#![allow(unused)] fn main() { let hello = String::from("Hola"); // Each character will take 1 byte of storage let hello = String::from("Здравствуйте"); // Each character will take 2 bytes of storage }
-
Let's understand using the Hindi word
“नमस्ते”
:-
As Bytes (the way
String
does usingu8
which ranges from0
to255
):#![allow(unused)] fn main() { [224, 164, 168, 224, 164, 174, 224, 164, 184, 224, 165, 141, 224, 164, 164, 224, 165, 135] }
-
As Unicode Scalar Values (the way
char
does):#![allow(unused)] fn main() { ['न', 'म', 'स', '्', 'त', 'े'] }
-
As Grapheme Clusters (the way a Hindi speaker might do):
#![allow(unused)] fn main() { ["न", "म", "स्", "ते"] }
-
-
-
Slicing Strings:
-
You need to provide the range of
bytes
to be sliced out of String. Again, not characters but bytes.#![allow(unused)] fn main() { let hello = "Здравствуйте"; // Each character here is composed of 2 bytes let s = &hello[0..4]; // It'll save first 4 bytes, `Зд` let will_panic = &hello[0..1]; // It'll panic, as if invalid index was accessed in the vector. }
-
-
Iterating over strings:
-
You can iterate over the unicode scalar values or what
chars
might store:#![allow(unused)] fn main() { for c in "नमस्ते".chars() { println!("{}", c); } // This is what it'll print न म स ् त े }
-
You can iterate over bytes also, the way
String
is stored inVec<u8>
format:#![allow(unused)] fn main() { for b in "नमस्ते".bytes() { println!("{}", b); } // The output will be like 224 164 // --snip-- 165 135 }
-