Tim Giroux

CS 102 pt 0: Finding Substring Concatenations

This is not real course work, and I'm not taking any CS 102 class. I thought it might be worthwhile to post notes on basic CS concepts once in a while. Like most devs I already know this sort of stuff by heart, but I do LeetCode problems sometimes to stay sharp in different programming languages. I did this simple problem today and decided to write this explainer for a general audience.

Assuming you know what these words mean: string, substring, word, map, index, and pop.

Problem Statement

Given a string and a list of equal length words, find all substrings that are a combination of all words.

Algorithm

A naive approach might check a lot of potential combinations. A more standard approach would use comparison between maps.

Naive Approach

String: barfoothefoobarman

Words: ["foo", "bar"]

One starting index at a time, scan to the right while comparing with candidate available words from the list. If a word is found, eliminate it and keep scanning. If all words are found, we found an overall match at the starting index, otherwise we found nothing.

index 0

[bar] foothefoobarman ✓ match → pop "bar"

remaining words: {"foo"}; continue at next 3-letter section

bar [foo] thefoobarman ✓ match → pop "foo"

all words used → match at index 0

index 1

b [arf] oothefoobarman ✗ not in word list → advance

index 2

ba [rfo] othefoobarman ✗ not in word list → advance

index 3

bar [foo] thefoobarman ✓ match → pop "foo"

remaining words: {"bar"}; continue

barfoo [the] foobarman ✗ not in word list → fail at this start

...continue.

Standard Approach

This approach is much faster to compute because there are less steps.

String: barfoothefoobarman

Words: ["foo", "bar"]

Word length L: 3

First, count how many times each word appears in the word list and put it in a map:

{"foo": 1, "bar": 1}

We split the string into words of length L:

bar foo the foo bar man

We slide a window of size 2 (number of words) from left to right, checking each window to see if it has the same word frequency as our word list:

[bar foo] the foo bar man | {"bar": 1, "foo": 1} == {"foo": 1, "bar": 1}Match at index 0

bar [foo the] foo bar man | {"foo": 1, "the": 1} != {"foo": 1, "bar": 1} ✗ No match

bar foo [the foo] bar man | {"the": 1, "foo": 1} != {"foo": 1, "bar": 1} ✗ No match

bar foo the [foo bar] man | {"foo": 1, "bar": 1} == {"foo": 1, "bar": 1}Match at index 9

bar foo the foo [bar man] | {"bar": 1, "man": 1} != {"foo": 1, "bar": 1} ✗ No match

We also have to check at other starting points like so:

b arf oot hef oob arm an

and

ba rfo oth efo oba rma n

Checking windows on each of these.

Vocabulary

Note that these are dumbed down and contextual to this post.