Search⌘ K
AI Features

Keyword Search

Explore how to implement keyword search using trie data structures to efficiently detect whether words from one list exist as complete words or prefixes in another list. Understand the insertion and search algorithms with clear examples and complexity analysis to optimize your solutions for coding interviews and real-world applications.

Problem statement

Given two lists of words (strings), for each word in the second list, identify if the word appears as either a complete word or a prefix of any word in the first list. If it's present as a complete word, print "word". If it's present as a prefix, print "prefix". If it isn't present as either a complete word or a prefix, print "not-found".

Example 1

Sample input

list1: ["app", "apple", "apply", "ape"]
list2: ["app", "appl", "apex"]

Sample output

["word", "prefix", "not-found"]

Explanation

"app" is present as a complete word in the list1.
"appl" is a "prefix" of the word "apple", which is present in list1.
"apex" is not present in list1, neither as a prefix nor as a complete word.

Example 2

Sample input

list1: ["pay", "paid", "pad", "pat"]
list2: ["pa", "pai", "pat"]

Sample output

["prefix", "prefix", "word"]

Explanation

"pa" is a "prefix" of the words "pay", "paid", "pad" and "pat", present in list1.
"pai" is a "prefix" of the word "paid" present in list1.
"pat" is present as a complete word in list1.

Try it yourself

Try to solve the following problem yourself before reading the solution.

C++
Java
#include <iostream>
#include <vector>
using namespace std;
vector<string> findIsWordOrIsPrefix(vector<string>& list1, vector<string>& list2){
// your code goes here
vector<string> answer;
return answer;
}
Solve keyword search in C++

Intuition

For every word of list2, this problem requires us to check two conditions. First, check if the word is present in the list1. If not, then determine if the word is present as a prefix of any string of list1.

It's visible that the problem is related to strings and prefix search. Although we can adopt the brute-force approach of comparing each word of the list2 with each word of the list1 to solve the problem, it would lead to higher time complexity. Using tries instead is an efficient data structure for prefix search-related problems. Hence, we'll use a trie data structure to solve this problem.

Algorithm

Step 1: Insertion in a trie

Insert all the words from list1 into a trie. Set the value of isEndOfWord as true for the trie node associated with the last character of the word.

Step 2: Searching in a trie

Iterate over the words of list2 to check if they're present in the trie. While searching for a word in the trie, we can encounter the following three conditions:

  • Found as a complete word: If we reach the last character of the word and isEndOfWord is set as true for the trie node, it means we have found a complete word.

  • Found as a prefix: If we reach the last character of the word and isEndOfWord is false for the trie node, it implies the current word is a prefix for some word of list1.

  • Not found: If we find that a letter of the provided word is not found in the trie before the entire length of the provided word was iterated, it indicates that the current word is not present in the trie and hence does not exist in list1—either as a complete word or as a prefix.

Visualization

The following picture explains the construction of a trie using the keys given in the example below:

It can be observed from the above illustration that while going over the words of list2:

  • For app, we'll encounter isEndOfWord as true for the last letter p, which means that it's present as a complete word in list1.

  • For appl, we'll encounter isEndOfWord as false for the last letter l, which means that it's a prefix of a word in list1.

  • For apex, we'll reach the end of the path a -> p -> e before we encounter an x, and therefore we can say that the word is not found in list1.

Complexity analysis

Variables

  • Number of words in list1 = N1N_1.

  • Number of words in list2 = N2N_2.

  • The average length of given words = WW.

Time complexity

Operation-wise time complexity

Operation

Time Complexity

Insertion of a word in the trie

O(W)O(W)

Insertion of all words of list1 in the trie

O(N1W)O(N_1W)

Searching for a given word of list2 in trie

O(W)O(W)

Searching for all the words of list2 in the trie

O(N2W)O(N_2W)

Explanation

The solution can be divided into two parts—that is, insertion and searching in the trie.

  1. Inserting a single word in the trie takes time equal to the length of the word. In the worst case, no letter of the current word exists in the trie, and we need to create new trie nodes for all letters. This adds a time complexity of O(W)O(W). So the creation of a trie of words takes time proportional to the number of words times the average length of the word, which is O(N×W)O(N \times W).

  2. While searching for a prefix or a word in the trie, we traverse the trie downward node by node until either the word length or the trie path is exhausted, which adds a time complexity of O(W).O(W).

Considering the fact that the lengths of list1 and list2 are almost similar, the time complexity becomes O(N1W+N2W)O(NW).O(N_1W + N_2W) \approx O(NW).

Space complexity

Operation-wise space complexity

Operation

Space Complexity

Insertion of a word in the trie

O(W)O(W)

Insertion of all words of list1 in the trie

O(N1W)O(N_1W)

Explanation

In the worst case, all the letters of all the words can be different. This implies that there will be no common prefixes, and therefore new trie nodes will be created for all the letters of all the words. Inserting a string of length WW in the trie adds a space complexity of O(W)O(W). We insert all the NN strings of list1 in the trie. Hence, the overall space complexity becomes O(NW)O(NW).

While searching for a prefix in the trie, we traverse the trie downward until either the word length or the trie path is exhausted. No new trie nodes are created in this process, so it doesn't add space complexity.