The expedition is finally underway, but the communication system I’ve been given is broken as it can’t pick up the signal from the noise. I need to add a way for the device to identify the communication headers in a stream of seemingly random characters.
Part 1 - Start of packet header
The start of packet header will be four different characters, so I need to detect the first time that happens in the
stream. I can use Itertools::tuple_windows
to get a moving window of four
characters over the stream, add an index, and return the offset of the first one I find where all the characters are
unique.
fn find_start_of_packet(data_stream: &String) -> usize {
let (i, _) =
data_stream
.chars()
.tuple_windows()
.enumerate()
.find(|(_, (a, b, c, d))| is_unique(a, b, c, d))
.unwrap();
i + 4
}
fn is_unique(a: &char, b: &char, c: &char, d: &char) -> bool {
let mut set = BTreeSet::new();
set.insert(a);
set.insert(b);
set.insert(c);
set.insert(d);
set.len() == 4
}
The examples in the puzzle can be used as test cases.
#[cfg(test)]
mod tests {
use crate::day_6::find_start_of_packet;
#[test]
fn can_parse() {
assert_eq!(
find_start_of_packet(&"mjqjpqmgbljsphdztnvjfqwrcgsmlb".to_string()),
7
);
assert_eq!(
find_start_of_packet(&"bvwbjplbgvbhsrlpgdmjqwftvncz".to_string()),
5
);
assert_eq!(
find_start_of_packet(&"nppdvjthqldpwncqszvftbrmjlhg".to_string()),
6
);
assert_eq!(
find_start_of_packet(&"nznrnfrfntjfmvfwmzdfjlvtqnbhcprsg".to_string()),
10
);
assert_eq!(
find_start_of_packet(&"zcfzfwzzqfrljwzlrfnpqdbhtmscgvjw".to_string()),
11
);
}
}
Those are passing, so I can apply it to the puzzle input.
pub fn run() {
let _contents =
fs::read_to_string("res/day-6-input").expect("Failed to read file");
let start_of_packet = find_start_of_packet(&_contents);
println!(
"The start of packet is detected after {} characters",
start_of_packet
);
}
// The start of packet is detected after 1920 characters
Part two - Start of message header
In the elves’ protocol, the start of message header will be the first substring of 14 unique characters. The part 1
moving window as a tuple gets a bit unwieldy at a window size of 14, so I’ll refactor to use Vec::windows
. I can
then take the window size as a parameter as the window is now a slice of arbitrary length. I also update the filter to
use a built-in way of getting the unique characters in an iterator.
fn find_non_repeating_string_of_length(
data_stream: &String,
window_size: usize
) -> usize {
let chars: Vec<char> = data_stream.chars().collect();
let (i, _) =
chars.windows(window_size)
.enumerate()
.find(|(_, window)| window.iter().unique().count() == window_size)
.unwrap();
i + window_size
}
I can add tests to check for the window of 14 characters examples. They also have a lot of boilerplate that makes them harder to read, so I’ll clean that up a bit too.
#[test]
fn can_find_start_of_packet() {
let examples = vec![
("mjqjpqmgbljsphdztnvjfqwrcgsmlb", 7),
("bvwbjplbgvbhsrlpgdmjqwftvncz", 5),
("nppdvjthqldpwncqszvftbrmjlhg", 6),
("nznrnfrfntjfmvfwmzdfjlvtqnbhcprsg", 10),
("zcfzfwzzqfrljwzlrfnpqdbhtmscgvjw", 11),
];
for (data_stream, expected) in examples {
assert_eq!(
find_non_repeating_string_of_length(&data_stream.to_string(), 4),
expected
)
}
}
#[test]
fn can_find_start_of_message() {
let examples = vec![
("mjqjpqmgbljsphdztnvjfqwrcgsmlb", 19),
("bvwbjplbgvbhsrlpgdmjqwftvncz", 23),
("nppdvjthqldpwncqszvftbrmjlhg", 23),
("nznrnfrfntjfmvfwmzdfjlvtqnbhcprsg", 29),
("zcfzfwzzqfrljwzlrfnpqdbhtmscgvjw", 26),
];
for (data_stream, expected) in examples {
assert_eq!(
find_non_repeating_string_of_length(&data_stream.to_string(), 14),
expected
)
}
}
The run method can similarly be updated to run for both window lengths
pub fn run() {
let _contents =
fs::read_to_string("res/day-6-input").expect("Failed to read file");
let start_of_packet = find_non_repeating_string_of_length(&_contents, 4);
println!(
"The start of packet is detected after {} characters",
start_of_packet
);
let start_of_message = find_non_repeating_string_of_length(&_contents, 14);
println!(
"The start of packet is detected after {} characters",
start_of_message
);
}
// The start of packet is detected after 1920 characters
// The start of packet is detected after 2334 characters
Performance enhancement
I’m still not fully happy with my solution. Recalculating the unique characters for each window is quite inefficient. I have the idea of keeping track of the current count for each character in the window, and then for each step I can increment the count for the new character, and decrement the count for the one leaving the window.
I first encode this behaviour in a custom struct + implementation. It needs to:
- Initialise its state with the first
<window_size>
characters. - Allow a character to be added, and another removed in one step as the window is advanced.
- Adding needs to increment the count for that character - adding a count of one if the character is new
- Removing should decrement the count for that character - removing that count if it goes to 0
- Expose the number of unique characters, i.e. its length.
struct Counts {
counts: HashMap<char, usize>,
}
impl Counts {
fn new(init: &str) -> Self {
Self {
counts: init.chars().counts_by(|c| c)
}
}
fn add_and_remove(&mut self, to_add: &char, to_remove: &char) {
if to_add == to_remove {
return;
}
self.add(to_add);
self.remove(to_remove);
}
fn add(&mut self, to_add: &char) {
let new_to_add_count = self.counts.get(to_add).unwrap_or(&0) + 1;
self.counts.insert(*to_add, new_to_add_count);
}
fn remove(&mut self, to_remove: &char) {
let new_to_remove_count = self.counts.get(to_remove).unwrap() - 1;
if new_to_remove_count == 0 {
self.counts.remove(to_remove);
} else {
self.counts.insert(*to_remove, new_to_remove_count);
}
}
fn len(&self) -> usize {
self.counts.len()
}
}
The implementation of find_non_repeating_string_of_length
can then be refactored to use this structure. The
existing tests will verify Counts
works as expected and that the behavior of find_non_repeating_string_of_length
is the same.
fn find_non_repeating_string_of_length(
data_stream: &String,
window_size: usize
) -> usize {
let (init, rest) = data_stream.split_at(window_size);
let mut counts = Counts::new(init);
for (i, (to_add, to_remove))
in rest.chars()
.zip(data_stream.chars())
.enumerate()
{
counts.add_and_remove(&to_add, &to_remove);
if counts.len() == window_size {
return i + window_size + 1;
}
}
unreachable!()
}
The original version was running in 2.5 - 3ms, and the updated one in 1.5-2ms, so it does seem to be an improvement.