AI Features

0% completed

All LessonsFree Lessons (5)

Introduction

Getting Started Overview of Dataset

Data Input/Output

Introduction to Data Input and Output Read Data into DataFrame Rename Attributes Select a Subset of Attributes Data Input and Output: Save a Snapshot Read Parquet Data Source Write Production Code Quiz: Data Input and Output Challenge: Data Input and Output Solution: Data Input and Output

Data Transformation

Introduction to Data Transformation

Handling Date-time

Impute Unavailable Data Points

Average Review per Product

Total Number of Reviews for Each Product

Distribution of the Review Text Length

Yearly Median Review

Top reviews of 2017

Compare Total Review of 2016 and 2017

Conversion Between Wide and Long Format using melt and pivot

Date Transformation: Save a Snapshot

Avoid Global Scope

Quiz: Data Transformation

Challenge: Data Transformation

Solution: Data Transformation

User Defined Function (UDF)

Introduction to User-defined Functions Object Conversion Between Python and Scala Writing UDF UDF in Action UDF: Save a snapshot Quiz: User-defined Functions Challenge: User-defined Functions Solution: User Defined Function

Wrapping Up

Appendix

Amazon Review Data (2018)pandas and PySpark: Behind the Scenes

Project

Apriori Algorithm for Finding Frequent Itemsets with PySpark

From Pandas to PySpark DataFrame

Solution: Data Transformation

Let's see the solution to the data transformation Challenge.

We'll cover the following...

Task
Solution
Explanation

Task

Perform summary statistics on the ...