Overview: Core Operations with spaCy
Explore the essential operations in spaCy including the creation of language processing pipelines, tokenization, and lemmatization. Understand spaCy's key classes and data structures to build a solid foundation for working with NLP tasks using spaCy's features in real-world projects.
We'll cover the following...
We will learn the core operations with spaCy, such as creating a language pipeline, tokenizing the text, and breaking the text into its sentences.
First, we'll learn what a language processing pipeline is and the pipeline components. We'll continue with general spaCy conventions—important classes and class organization—to help us to better understand spaCy library organization and develop a solid understanding of the library itself.
We'll then learn about the first pipeline component—Tokenizer. We'll also learn about an important linguistic concept—lemmatization—along with its applications in natural language understanding (NLU). Following that, we will cover container classes and spaCy data structures in detail. We will finish the chapter with useful spaCy features that we'll use in everyday NLP development.
We're going to cover the following main topics:
Overview of spaCy conventions
Introducing tokenization
Understanding lemmatization
spaCy container objects
More spaCy features