Data Engineering - Fundamentals

Welcome! This course assumes you have some decent programming knowledge. We're going to be using Python, but any knowledge should be fine.

Setup: Installing Python and pip

Before we can start using all the cool tools like Jupyter, NumPy, and pandas, we need to make sure Python and pip are installed. These two are the foundation for everything else. Don't worry, it’s pretty straightforward, and I’ll guide you through the steps.

Step 1: Installing Python

First things first, we need to install Python. Python is the programming language we’ll be using throughout the course.

Windows

Head over to the official Python website here.
Download the latest version of Python for Windows (it’ll detect your operating system automatically).
Important: During the installation, make sure to check the box that says Add Python to PATH. This will save you a lot of headaches later on.
Follow the installation instructions, clicking through the prompts until it’s done.

macOS

If you’re on a Mac, you’ve got Python pre-installed, but it’s often an older version, so let’s update it.

Again, head to the Python website and download the latest version for macOS.
Run the installer and follow the instructions.
That’s it – simple!

Linux

Most Linux distributions come with Python pre-installed, but to check, open your terminal and type:

python3 --version

If it shows a version number (e.g., Python 3.8.5), you’re good to go. If not, install it with:

sudo apt-get update
sudo apt-get install python3

Step 2: Installing pip

pip is Python’s package manager, and it lets us install libraries like NumPy and pandas. It usually comes with Python, but we’ll double-check just to be sure.

Check if pip is already installed

Open your terminal (or command prompt on Windows) and type:

pip --version

If you see something like this, you’re all set:

pip 21.0.1 from /usr/local/lib/python3.9/site-packages (python 3.9)

Installing pip (if it’s missing)

If pip isn’t installed, here’s how to get it sorted.

Windows/macOS: It should come with Python. If you’re missing it, try reinstalling Python (make sure you check the Add Python to PATH box).
Linux: Run the following command in your terminal:

sudo apt-get install python3-pip

Step 3: Verifying the Installation

Let’s make sure everything is working properly.

Open your terminal or command prompt.
Type the following to check your Python installation:

python --version

You should see something like this:

Python 3.9.1

If you see something similar, that means Python is installed correctly.

Now check if pip is working:

pip --version

You should get something like:

pip 21.0.1

That’s it!

You’re all set now. Python and pip are installed, and you're ready to dive into the lessons. If you run into any issues, check out the official Python documentation for troubleshooting tips here.

Now, whenever we mention running commands like pip install pandas, you’ll know exactly what to do.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
resources		resources
chapter1-1.md		chapter1-1.md
chapter1-2.md		chapter1-2.md
chapter1-3.md		chapter1-3.md
chapter2-1.md		chapter2-1.md
chapter2-2.md		chapter2-2.md
chapter2-3.md		chapter2-3.md
chapter2-4.md		chapter2-4.md
chapter3-1.md		chapter3-1.md
chapter3-2.md		chapter3-2.md
chapter3-3.md		chapter3-3.md
chapter4-1.md		chapter4-1.md
chapter4-2.md		chapter4-2.md
chapter4-3.md		chapter4-3.md
chapter5-1.md		chapter5-1.md
chapter5-2.md		chapter5-2.md
chapter5-3.md		chapter5-3.md
chapter6.md		chapter6.md
chapter7.md		chapter7.md
readme.md		readme.md

Folders and files

Latest commit

History

Repository files navigation

Data Engineering - Fundamentals

Setup: Installing Python and pip

Step 1: Installing Python

Windows

macOS

Linux

Step 2: Installing pip

Check if pip is already installed

Installing pip (if it’s missing)

Step 3: Verifying the Installation

That’s it!

Contents

Chapter 1: Introduction to Jupyter, NumPy, and pandas

Chapter 2: Data Manipulation with pandas and NumPy

Chapter 3: Aggregating, Grouping, and Merging Data

Lesson 1: Aggregation and GroupBy

Lesson 2: Merging and Joining DataFrames

Lesson 3: Working with Large Datasets

Chapter 4: Data Engineering Pipelines with pandas

Lesson 1: Loading and Saving Data in Various Formats

Lesson 2: Data Engineering Workflow with pandas

Lesson 3: Handling Data Quality Issues

Chapter 5: Advanced Techniques in NumPy and pandas for Data Engineering

Lesson 1: Vectorization and Broadcasting in NumPy

Lesson 2: Applying Functions with pandas (apply() and map())

Lesson 3: Window Functions for Time Series and Rolling Data

Chapter 6: pandas and NumPy in Real-world Data Engineering

Lesson 1: Data Engineering in Big Data Environments

Lesson 2: Integrating pandas with Databases

Chapter 7: Final Project and Assessment

Lesson 1: Capstone Project Setup

Lesson 2: Capstone Project Implementation

Lesson 3: Capstone Project Review

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Uh oh!

Lesson 2: Applying Functions with pandas (`apply()` and `map()`)