Scaling Search and Indexing

We'll cover the following...

Problems with the proposed design
Solution
Separate the indexing and search
Indexing explained
Summary

Problems with the proposed design

While the design from the previous lesson is functional, it has significant drawbacks regarding resource usage and scalability:

Colocated indexing and searching: Running both operations on the same node causes resource contention. Since both indexing and searching are resource-intensive, they degrade each other’s performance. This design also prevents independent scaling of search and indexing resources based on load.
Index recomputation: Computing the index independently on every replica wastes CPU. Index construction is a heavy pipeline involving hundreds of operations. Recomputing the same index on multiple machines is inefficient.

To address these issues, we need an alternative approach that decouples these operations.

Solution

Instead of recomputing the index on every replica, the system computes the inverted index once on the primary node. The resulting index file is then distributed to the replicas. This approach reduces CPU and memory usage by avoiding redundant ...