Introduction

Explore how to create effective linear regression and classification models using scikit-learn. Understand the process of hyperparameter tuning and model evaluation through cross-validation to optimize your data analysis results.

We'll cover the following...

A. Creating models for data

A. Creating models for data

The main job of a data scientist is analyzing data and creating models for obtaining results from the data. Oftentimes, data scientists will use simple statistical models for their data, rather than machine learning models like neural networks. This is because data scientists tend to work with smaller datasets than machine learning engineers, so they can quickly extract good results using statistical models.

The scikit-learn library provides many statistical models for linear regression. It also provides a few good models for classifying data, which will be introduced in later chapters.

When creating these models, data scientists need to figure out the optimal hyperparameters to use. Hyperparameters are values that we set when creating a model, e.g. certain constant coefficients used in the model's calculations. We'll talk more about hyperparameter tuning, the process of finding the optimal hyperparameter settings, in later chapters.

1.What you'll learn from this course

2.Data Manipulation with NumPy

3.Data Analysis with pandas

4.Data Preprocessing with scikit-learn

5.Data Modeling with scikit-learn

6.Clustering with scikit-learn

7.Gradient Boosting with XGBoost

8.Deep Learning with TensorFlow

9.Deep Learning with Keras

Introduction

A. Creating models for data