Frank Casanova

Understanding HashMaps

Arrays vs. HashMaps

Arrays are collections of data stored in sequential order. Each element is assigned a numerical index, which allows for quick access. For example:

If you know the index of a value, you can immediately retrieve it.
Arrays are efficient for accessing elements by index but are limited when you only know the value or want to use a key.

graph LR
    Array["Array"]
    Index0["Index: 0"]
    Index1["Index: 1"]
    Index2["Index: 2"]
    Value0["Value: 10"]
    Value1["Value: 20"]
    Value2["Value: 30"]

    Array -->|points to| Index0
    Array -->|points to| Index1
    Array -->|points to| Index2
    Index0 -->|points to| Value0
    Index1 -->|points to| Value1
    Index2 -->|points to| Value2

HashMaps solve this limitation by allowing data to be stored and accessed using keys instead of indices. For example:

You could use a person’s name as a key to retrieve their phone number.
HashMaps offer flexibility and power compared to arrays.

graph TD
    HashMap["HashMap"]
    Key1["Key: 'Alice'"]
    Key2["Key: 'Bob'"]
    Key3["Key: 'Charlie'"]
    HashFunction["Hash Function"]
    Index0["Index: 0"]
    Index1["Index: 1"]
    Index2["Index: 2"]
    Value1["Value: 123456"]
    Value2["Value: 789012"]
    Value3["Value: 345678"]

    HashMap -->|stores| Key1
    HashMap -->|stores| Key2
    HashMap -->|stores| Key3
    Key1 -->|hashed by| HashFunction
    Key2 -->|hashed by| HashFunction
    Key3 -->|hashed by| HashFunction
    HashFunction -->|maps to| Index0
    HashFunction -->|maps to| Index1
    HashFunction -->|maps to| Index2
    Index0 -->|points to| Value1
    Index1 -->|points to| Value2
    Index2 -->|points to| Value3

How HashMaps Work

At their core, HashMaps are arrays, but they use a hash function to convert keys into indices. Here’s how it works:

A hash function takes a key and returns an integer.
This integer is used as an index in the underlying array.

A simple hash function in Rust might look like this:

fn simple_hash(string: &str) -> u32 {
    let mut total = 0;
    for c in string.chars() {
        total += c as u32;
    }
    total
}

The modulus operator (%) ensures the index falls within the array bounds.

Properties of a Good Hash Function

A "perfect" hash function should:

Be deterministic: Always produce the same output for the same input.
Distribute keys uniformly across the array.
Be fast to compute.
Be resistant to collisions.

However, achieving all these properties simultaneously is often impossible, so trade-offs are made.

Handling Collisions

Collisions occur when two keys hash to the same index. A common solution is to use a linked list at each index. When a collision happens:

The new key-value pair is appended to the linked list.
When searching, the linked list is traversed until the key is found.

graph LR
    Index0["Index: 0"]
    KeyValue1["Key: 'Alice', Value: 123456"]
    KeyValue2["Key: 'Eve', Value: 567890"]
    KeyValue3["Key: 'Grace', Value: 901234"]
    None["None"]

    Index0 -->|points to| KeyValue1
    KeyValue1 -->|next| KeyValue2
    KeyValue2 -->|next| KeyValue3
    KeyValue3 -->|next| None

Implementing the HashMap

Basic Operations

We’ll implement four core operations:

HashMap<T, V>.get(key: T)
HashMap<T, V>.push(key: T, value: V)
HashMap<T, V>.delete(key: T)
HashMap<T, V>.clear()

Step 1: Setting Up the Project

Initialize a new Rust library project:
```
cargo init --lib
```

Replace the contents of src/lib.rs with the following:

const DEFAULT_MAX_SIZE: u64 = 256;

pub struct HashMap<T, V> {
    curr_size: usize,
    arr: [Option<KeyValue<T, V>>; DEFAULT_MAX_SIZE as usize],
}

pub struct KeyValue<T, V> {
    key: T,
    value: V,
    next: Option<Box<KeyValue<T, V>>>,
}

Step 2: Implementing `put`

The put function inserts a key-value pair into the HashMap:

Hash the key to determine the index.
If the index is empty, insert the new key-value pair.
If the index is occupied, handle collisions by appending to the linked list.

impl<T: Clone + Hash + PartialEq, V: Copy> HashMap<T, V> {
    pub fn put(&mut self, key: T, val: V) -> Option<V> {
        let hash_val: u64 = hash_key(key.clone());
        let position = hash_val % DEFAULT_MAX_SIZE;
        match &self.arr[position as usize] {
            Some(_) => self.update_or_link_new_val(key, val, position as usize),
            None => {
                self.insert_new_value(key, val, position as usize);
                None
            }
        }
    }
}

graph LR
    HashMap["HashMap"]
    Key["Key: 'Dave'"]
    HashFunction["Hash Function"]
    Index3["Index: 3"]
    KeyValue["Key: 'Dave', Value: 456789"]
    None["None"]

    HashMap -->|inserts| Key
    Key -->|hashed by| HashFunction
    HashFunction -->|maps to| Index3
    Index3 -->|points to| KeyValue
    KeyValue -->|next| None

Step 3: Implementing `get`

The get function retrieves a value by its key:

Hash the key to find the index.
Traverse the linked list at the index to find the key.

pub fn get(&self, key: T) -> Option<V> {
    let hash_val: u64 = hash_key(key.clone());
    let position = hash_val % DEFAULT_MAX_SIZE;
    match &self.arr[position as usize] {
        Some(_) => self.check_list_for_key(key, position as usize),
        None => None,
    }
}

graph LR
    HashMap["HashMap"]
    Key["Key: 'Bob'"]
    HashFunction["Hash Function"]
    Index1["Index: 1"]
    KeyValue["Key: 'Bob', Value: 789012"]

    HashMap -->|searches for| Key
    Key -->|hashed by| HashFunction
    HashFunction -->|maps to| Index1
    Index1 -->|points to| KeyValue

Step 4: Implementing `remove`

The remove function deletes a key-value pair:

Hash the key to find the index.
Traverse the linked list to find and remove the key.

pub fn remove(&mut self, key: T) -> Option<V> {
    let hash_val: u64 = hash_key(key.clone());
    let position: u64 = hash_val % DEFAULT_MAX_SIZE;
    match &self.arr[position as usize] {
        Some(_) => self.check_item_in_list_and_remove(key, position as usize),
        None => None,
    }
}

graph LR
    HashMap["HashMap"]
    Key["Key: 'Charlie'"]
    HashFunction["Hash Function"]
    Index2["Index: 2"]
    KeyValue["Key: 'Charlie', Value: 345678"]
    None["None"]

    HashMap -->|removes| Key
    Key -->|hashed by| HashFunction
    HashFunction -->|maps to| Index2
    Index2 -->|updates to| None

Step 5: Implementing `clear`

The clear function resets the HashMap to its initial state:

pub fn clear(&mut self) {
    self.curr_size = 0;
    self.arr = [Self::INIT; DEFAULT_MAX_SIZE as usize];
}

graph LR
    HashMap["HashMap"]
    Array["Array"]
    EmptyArray["Empty Array"]

    HashMap -->|clears| Array
    Array -->|becomes| EmptyArray

Conclusion

Congratulations! You’ve implemented a HashMap in Rust from scratch. You now understand:

How HashMaps work under the hood.
How to handle collisions using linked lists.
How to implement core operations like put, get, remove, and clear.