Keyword Search
Explore how to implement keyword search using trie data structures to efficiently detect whether words from one list exist as complete words or prefixes in another list. Understand the insertion and search algorithms with clear examples and complexity analysis to optimize your solutions for coding interviews and real-world applications.
Problem statement
Given two lists of words (strings), for each word in the second list, identify if the word appears as either a complete word or a prefix of any word in the first list. If it's present as a complete word, print "word". If it's present as a prefix, print "prefix". If it isn't present as either a complete word or a prefix, print "not-found".
Example 1
Sample input
list1: ["app", "apple", "apply", "ape"]list2: ["app", "appl", "apex"]
Sample output
["word", "prefix", "not-found"]
Explanation
"app" is present as a complete word in the list1. "appl" is a "prefix" of the word "apple", which is present in list1."apex" is not present in list1, neither as a prefix nor as a complete word.
Example 2
Sample input
list1: ["pay", "paid", "pad", "pat"]list2: ["pa", "pai", "pat"]
Sample output
["prefix", "prefix", "word"]
Explanation
"pa" is a "prefix" of the words "pay", "paid", "pad" and "pat", present in list1."pai" is a "prefix" of the word "paid" present in list1."pat" is present as a complete word in list1.
Try it yourself
Try to solve the following problem yourself before reading the solution.
Intuition
For every word of list2, this problem requires us to check two conditions. First, check if the word is present in the list1. If not, then determine if the word is present as a prefix of any string of list1.
It's visible that the problem is related to strings and prefix search. Although we can adopt the brute-force approach of comparing each word of the list2 with each word of the list1 to solve the problem, it would lead to higher time complexity. Using tries instead is an efficient data structure for prefix search-related problems. Hence, we'll use a trie data structure to solve this problem.
Algorithm
Step 1: Insertion in a trie
Insert all the words from list1 into a trie. Set the value of isEndOfWord as true for the trie node associated with the last character of the word.
Step 2: Searching in a trie
Iterate over the words of list2 to check if they're present in the trie. While searching for a word in the trie, we can encounter the following three conditions:
Found as a complete word: If we reach the last character of the word and
isEndOfWordis set astruefor the trie node, it means we have found a complete word.Found as a prefix: If we reach the last character of the word and
isEndOfWordisfalsefor the trie node, it implies the current word is a prefix for some word oflist1.Not found: If we find that a letter of the provided word is not found in the trie before the entire length of the provided word was iterated, it indicates that the current word is not present in the trie and hence does not exist in
list1—either as a complete word or as a prefix.
Visualization
The following picture explains the construction of a trie using the keys given in the example below:
It can be observed from the above illustration that while going over the words of list2:
For
app, we'll encounterisEndOfWordastruefor the last letterp, which means that it's present as a complete word inlist1.For
appl, we'll encounterisEndOfWordasfalsefor the last letterl, which means that it's a prefix of a word inlist1.For
apex, we'll reach the end of the patha -> p -> ebefore we encounter anx, and therefore we can say that the word is not found inlist1.
Complexity analysis
Variables
Number of words in
list1=. Number of words in
list2=. The average length of given words =
.
Time complexity
Operation-wise time complexity
Operation | Time Complexity |
Insertion of a word in the trie | |
Insertion of all words of | |
Searching for a given word of | |
Searching for all the words of |
Explanation
The solution can be divided into two parts—that is, insertion and searching in the trie.
Inserting a single word in the trie takes time equal to the length of the word. In the worst case, no letter of the current word exists in the trie, and we need to create new trie nodes for all letters. This adds a time complexity of
. So the creation of a trie of words takes time proportional to the number of words times the average length of the word, which is . While searching for a prefix or a word in the trie, we traverse the trie downward node by node until either the word length or the trie path is exhausted, which adds a time complexity of
Considering the fact that the lengths of list1 and list2 are almost similar, the time complexity becomes
Space complexity
Operation-wise space complexity
Operation | Space Complexity |
Insertion of a word in the trie | |
Insertion of all words of |
Explanation
In the worst case, all the letters of all the words can be different. This implies that there will be no common prefixes, and therefore new trie nodes will be created for all the letters of all the words. Inserting a string of length list1 in the trie. Hence, the overall space complexity becomes
While searching for a prefix in the trie, we traverse the trie downward until either the word length or the trie path is exhausted. No new trie nodes are created in this process, so it doesn't add space complexity.