Day 4 - Ceres Search

2024-12-04

Seems like these challanges mostly don’t involve searching for this missing historian(does anyone even notice the lore for these?), today we’ve got an elf looking for help with her word search.

Link To The Code On GitHub

Part 1

The task for part 1 is simple: given a grid of letters, find all occurrences of XMAS, forwards, backwards, up, down, and diagonal. I learned from yesterday, and I’ll start with a naive solution:

RUST

fn find_surrounding_mas(input: &[u8], i: usize, line_len: usize) -> u32 {
    // LEFT
    (i>=3 && &input[i-3..i]==b"SAM") as u32+
    // RIGHT
    (i<=input.len()-4 && &input[i+1..i+4]==b"MAS") as u32+
    // UP
    (i>=3*line_len
        && input[i-3*line_len] == b'S'
        && input[i-2*line_len] == b'A'
        && input[i-line_len] == b'M') as u32 +
    // UP+RIGHT
    (i+3>=3*line_len
    && input[i+3-3*line_len] == b'S'
    && input[i+2-2*line_len] == b'A'
    && input[i+1-line_len] == b'M') as u32 +
    // UP+LEFT
    (i>=3*line_len+3
        && input[i-3*line_len-3] == b'S'
        && input[i-2*line_len-2] == b'A'
        && input[i-line_len-1] == b'M') as u32 +
    //DOWN
    (i+3*line_len<input.len()
        && input[i+3*line_len] == b'S'
        && input[i+2*line_len] == b'A'
        && input[i+line_len] == b'M') as u32 +
    //DOWN+RIGHT
    (i+3*line_len+3<input.len()
        && input[i+3*line_len+3] == b'S'
        && input[i+2*line_len+2] == b'A'
        && input[i+line_len+1] == b'M') as u32 +
    // DOWN+LEFT
    (i+3*line_len-3<input.len()
        && input[i+3*line_len-3] == b'S'
        && input[i+2*line_len-2] == b'A'
        && input[i+line_len-1] == b'M') as u32
}

pub fn part1(input: &[u8]) -> u32 {
    let line_len = memchr::memchr(b'\n', input).unwrap() + 1;
    memchr::memchr_iter(b'X', input)
        .map(|i| find_surrounding_mas(input, i, line_len))
        .sum::<u32>()
}

I’m just looking all the Xs in the input(using memchr which I introduced yesterday, could have also used a simple position), and then checking their surroundings.

The only issues I had was bounds checking mistakes, I first tried fixing them with checked input.get(), but the index could underflow anyway, so I didn’t use it.

This solution solves part 1 and now it’s time for part 2.

Part 2

Of course, the instructions given in part 1 were wrong, this is not an XMAS search, it’s an X-MAS search, meaning I need to find X patterns of the word MAS, for example:

M.M
.A.
S.S

The . are other irrelevant letters, of course each MAS can be forwards or backwards.
To me this seems even easier than part 1:

RUST

fn is_x(input: &[u8], i: usize, line_len: usize) -> bool {
    // UPLEFT+DOWNRIGHT
    ((input.get(i - line_len - 1) == Some(&b'M') && input.get(i + line_len + 1) == Some(&b'S'))
        || (input.get(i - line_len - 1) == Some(&b'S') && input.get(i + line_len + 1) == Some(&b'M'))) &&
    // DOWNLEFT+UPRIGHT
    ((input.get(i + line_len - 1) == Some(&b'M') && input.get(i - line_len + 1) == Some(&b'S'))
        || (input.get(i + line_len - 1) == Some(&b'S') && input.get(i - line_len + 1) == Some(&b'M')))
}
pub fn part2(input: &[u8]) -> u32 {
    let line_len = memchr::memchr(b'\n', input).unwrap() + 1;
    memchr::memchr_iter(b'A', input)
        .filter(|&i| is_x(input, i, line_len))
        .count() as u32
}

Find all the As, check their surroundings, and part 2 is done.

Failed Optimization

The initial times:

Day4 - Part1/naive time:   [79.028 µs 79.312 µs 79.645 µs]
Day4 - Part2/naive time:   [57.325 µs 57.440 µs 57.577 µs]

The only optimization I can think of is using memchr::memmem to replace the right and left checks:

RUST

let forwards = find_iter(input, "XMAS").count() as u32;
let backwards = find_iter(input, "SAMX").count() as u32;
let line_len = memchr::memchr(b'\n', input).unwrap() + 1;
let other: u32 = memchr::memchr_iter(b'X', input[line_len..input.len() - line_len])
    .map(|i| find_surrounding_mas(input, i+lin, line_len))
    .sum();
forwards + backwards + other

The right and left checks were removed from find_surrounding_mas.

Day4 - Part1/naive  time:   [79.028 µs 79.312 µs 79.645 µs]
Day4 - Part1/memmem time:   [102.43 µs 102.75 µs 103.08 µs]

Turns out its slower…

I also tried replacing all the indexing inside find_surrounding_mas with unsafe get_unchecked but it was also slower (~86us).

Multithreading For The Win?

As a last resort, I tried going multithreaded.
Using a simple rayon iter_bridge gave an 8x slowdown, not good.
Using standard threads and chunking the input properly was not as bad:

RUST

#[aoc(day4, part1, mt)]
pub fn part1_mt(input: &[u8]) -> u32 {
    const THREAD_COUNT: usize = 2usize;
    let line_len = memchr::memchr(b'\n', input).unwrap() + 1;
    let chunk_size = input.len() / THREAD_COUNT;
    thread::scope(|s| {
        let threads: Vec<ScopedJoinHandle<u32>> = (0..THREAD_COUNT)
            .map(|tid| s.spawn(move || part1_chunk(input, line_len, tid, chunk_size)))
            .collect();
        let local_res = memchr::memchr_iter(b'X', &input[THREAD_COUNT * chunk_size..])
            .map(|i| find_surrounding_mas(input, i + THREAD_COUNT * chunk_size, line_len))
            .sum::<u32>();
        local_res + threads.into_iter().map(|t| t.join().unwrap()).sum::<u32>()
    })
}
fn part1_chunk(input: &[u8], line_len: usize, tid: usize, chunk_size: usize) -> u32 {
    memchr::memchr_iter(b'X', &input[tid * chunk_size..(tid + 1) * chunk_size])
        .map(|i| find_surrounding_mas(input, i + tid * chunk_size, line_len))
        .sum::<u32>()
}

But also not great:

threads	time
2	80us
4	85us
6	100us

This is not it either, I’ll stick to the multithreaded solution.

Optimizing Part 2

While I could not improve part 1, I have a couple ideas for part 2: I can search for the A only in the middle lines(not first or last), since in the outer lines there can’t possibly the required X shape around it, and the same for the outer columns.

RUST

pub fn part2_opt(input: &[u8]) -> u32 {
    let line_len = memchr::memchr(b'\n', input).unwrap() + 1;
    // no point searching in the first and last line and column and helps with bounds checking
    memchr::memchr_iter(b'A', &input[line_len..input.len() - line_len])
        .filter(|&i| is_x(input, i + line_len, line_len))
        .count() as u32
}
fn is_x(input: &[u8], i: usize, line_len: usize) -> bool {
    let column = i % line_len;
    if column == 0 || column == line_len - 1 {
        return false;
    }
    // UPLEFT+DOWNRIGHT
    ((input.get(i - line_len - 1) == Some(&b'M') && input.get(i + line_len + 1) == Some(&b'S'))
        || (input.get(i - line_len - 1) == Some(&b'S') && input.get(i + line_len + 1) == Some(&b'M'))) &&
    // DOWNLEFT+UPRIGHT
    ((input.get(i + line_len - 1) == Some(&b'M') && input.get(i - line_len + 1) == Some(&b'S'))
        || (input.get(i + line_len - 1) == Some(&b'S') && input.get(i - line_len + 1) == Some(&b'M')))
}

But it’s slower..

Day4 - Part2/naive      time:   [57.325 µs 57.440 µs 57.577 µs]
Day4 - Part2/opt        time:   [63.710 µs 63.891 µs 64.066 µs]

What I didn’t do yet, is utilise this knowledge that the X is always in the center area - the get() methods will never go out of bounds, this means that can be replaced by normal indexing, and more than that, since I know it will never go out of bounds and the compiler doesn’t, I can replace it with unsafe get_unchecked().

Day4 - Part2/opt        time:   [55.863 µs 55.969 µs 56.093 µs]

A little better.
One final attempt:
Instead of checking both the M and the S in both directions forwards and backwards, it is possible to check if the sum matches M+S in both diagonals:

RUST

// UPLEFT+DOWNRIGHT
(input.get_unchecked(i - line_len - 1)+input.get_unchecked(i + line_len + 1) == (b'S'+b'M')) &&
// DOWNLEFT+UPRIGHT
(input.get_unchecked(i + line_len - 1) + input.get_unchecked(i - line_len + 1) == (b'S'+b'M'))

Day4 - Part2/opt        time:   [51.502 µs 52.007 µs 52.690 µs]

And even faster than before.

End of Day 4

I guess this is it for the day, I could not improve part 1 at all, but at least I improved part 2 a little.

Up: Advent Of Code 2024

Previous: Day 3 - Mull It Over Next: Day 5 - Print Queue