AI Features

Finding the most frequent words by Shakespeare

We'll cover the following...

Given a text, what are the most frequent words?

Finding the most frequent words for a given text (e.g., Knight_of_the_Burning_Pestle) is easy, we can build a function toptokens(), which is nothing but the topcrimes() function developed in our previous project. Let’s watch the following video lecture first:

Video thumbnail
Video lecture: Finding the most frequent words by Shakespeare (complex)

For example, if we want to grab the most frequent words in the Romeo and Juliet play, we can execute the following:

Shell
toptokens() { cat $1 | \
csvcut -c "tokens",$2 | \
sort -nr -t "," -k 2 | \
head -n 20 | \
awk -F',' '{print $1 "," $2}' ; }
toptokens plays_and_poems_stat.csv "Romeo_and_Juliet___play___Shakespeare" | csvlook
The top 20 frequent words in the work "Romeo and Juliet"
The top 20 frequent words in the work "Romeo and Juliet"

Given an author, what are the ...