Solution: H-Index

Let’s solve the H-Index problem using the Sort and Search pattern.

Statement

You are given an array of integers citations, where each element citations[i] represents the number of citations received for the ithi^{th} publication of a researcher.

Your task is to find the researcher’s h-index and return the value of hh.

Note: The h-index is defined as the highest number hh such that the given researcher has published at least hh papers, each of which has been cited at least hh times.

Constraints:

  • n==n == citations.length

  • 1n10001 \leq n \leq 1000

  • 00 \leq citations[i] 1000\leq 1000

Solution

The core idea of this solution is to calculate the h-index of a researcher by using a counting sort-based strategy rather than sorting the citation array directly. Since the h-index is defined as the maximum value hh such that the researcher has at least hh papers with at least hh citations, we can reframe the problem into counting how many papers have a certain number of citations. The observation is that citations above nn (the total number of papers) don’t increase the h-index beyond nn. So, all citation counts are capped at nn, allowing us to operate within a fixed-size frequency array.

The algorithm builds a bucket array to count how many papers have exactly 0,1,...,n0, 1, ..., n citations, where any citation value above nn is counted in the last bucket. Then, starting from the highest possible h-index candidate (nn), the algorithm accumulates the number of papers that have at least kk citations. The first point where this count meets or exceeds kk gives us the correct h-index.

This approach avoids the overhead of full sorting and leverages counting sort principles along with a reverse accumulation search.

Now, let’s look at the solution steps below:

  1. We initialize the following:

    1. n to represent the number of papers (length of the citations list).

    2. A list papers of size n + 1 to count how many papers have a specific number of citations, with papers[i] representing the count of papers cited exactly i times.

    3. All citation values greater than n are capped and grouped into papers[n], since the h-index cannot exceed n.

  2. Next, we iterate through the citations list:

    1. For each citation count, c, we increment papers[min(n, c)].

  3. We also initialize k = n as our candidate h-index, and s = papers[n], the number of papers with at least n citations.

  4. We then iterate through the papers array, while k > s, and search for the h-index by:

    1. We decrement k and add papers[k] to s, computing the total number of papers with at least k citations.

  5. The iteration continues until we find the highest k such that there are at least k papers with k or more citations.

  6. After iterating, the current k is returned as the h-index.

Let’s look at the following illustration to get a better understanding of the solution:

Level up your interview prep. Join Educative to access 70+ hands-on prep courses.