Scaling Search and Indexing
We'll cover the following...
Problems with the proposed design
While the design from the previous lesson is functional, it has significant drawbacks regarding resource usage and scalability:
Colocated indexing and searching: Running both operations on the same node causes resource contention. Since both indexing and searching are resource-intensive, they degrade each other’s performance. This design also prevents independent scaling of search and indexing resources based on load.
Index recomputation: Computing the index independently on every replica wastes CPU. Index construction is a heavy pipeline involving hundreds of operations. Recomputing the same index on multiple machines is inefficient.
To address these issues, we need an alternative approach that decouples these operations.
Solution
Instead of recomputing the index on every replica, the system computes the inverted index once on the primary node. The resulting index file is then distributed to the replicas. This approach reduces CPU and memory usage by avoiding redundant ...