Pandas for Beginners: A Practical Introduction

Level Up Coding
Published in
3 min readAug 31, 2023


Pandas is one of the most popular and powerful data analysis libraries for Python. It provides easy-to-use data structures and tools for working with structured data. In this post, we'll go through a practical introduction to using Pandas for data analysis.

Importing Pandas:

To start using Pandas, we first need to import it:

import pandas as pd

The convention is to import Pandas using `pd` as the shorthand name.

Creating a Pandas DataFrame:

A Pandas DataFrame is a 2-dimensional labeled data structure that can store different data types (strings, numbers, booleans etc.) in columns. It is similar to a spreadsheet or SQL table.

Let's create a simple DataFrame from a dictionary:

data = {'Name': ['John', 'Mary', 'Peter', 'Jeff', 'Bill'], 
'Age': [28, 32, 47, 19, 55],
'Gender': ['Male', 'Female', 'Male', 'Male', 'Male']
df = pd.DataFrame(data)
  Name  Age  Gender
0 John 28 Male
1 Mary 32 Female
2 Peter 47 Male
3 Jeff 19 Male
4 Bill 55 Male

The dictionary keys become the column names and the values become the data in columns.

Selecting Columns:

We can select a column in Pandas using the column name like a dictionary key:

ages = df['Age']
0    28
1 32
2 47
3 19
4 55
Name: Age, dtype: int64

This returns a Pandas Series containing just the 'Age' column data.

Selecting Rows:

We can select rows by integer location or boolean indexing. Let’s get the first 3 rows:

  Name  Age  Gender
0 John 28 Male
1 Mary 32 Female
2 Peter 47 Male

And rows where 'Age' is greater than 30:

print(df[df['Age'] > 30])
  Name  Age  Gender
1 Mary 32 Female
2 Peter 47 Male
4 Bill 55 Male

Loading Data from CSV:

We can easily load data into a DataFrame from a CSV file using `read_csv()`:

df = pd.read_csv('data.csv')

This will load the 'data.csv' file into a Pandas DataFrame.

There are many additional options like parsing dates and handling missing values that can be specified.

Basic Data Cleaning:

Pandas makes it easy to get rid of missing data and tidy up messy data:

# Drop rows with missing values

# Fill missing values

# Change column names
df.rename(columns={'old_name': 'new_ name'})

Useful Operations:

Pandas includes a lot of vectorized functions that make data munging fast:

# Calculate sum of Age column

# Calculate sum of Age column

# Get mean of Age

# Get max value of Age

# Sort by Age column

There are many more functions for aggregations, slicing, transforming, combining, and visualizing data.


This covers some of the basics of using Pandas for practical data analysis in Python. Key takeaways:

- DataFrame for storing tabular data
- Read/write data from CSV files
- Column selection, row slicing, boolean indexing
- Built-in methods for cleaning, munging and transforming
- Vectorized operations for fast data analysis

Pandas combines ease of use with performance, making it indispensable for data science workflows. Happy Learning!

P.S. Ever wondered if spamming the 👏 clap button here on Medium is the secret workout for your index finger? Give it a try and let me know if your finger gains superpowers! 💪😎

Part 2: Mastering Pandas: Advanced Techniques for Data Manipulation Excellence



I'm a Data Scientist & Renewable Energy geek 🌱 Exploring Data📊, Green tech🌍, and Innovation💡 Hope to write on Data Science, Life, & Everything in between ;)