Pandas for Beginners: A Practical Introduction

Published in

Level Up Coding

3 min readAug 31, 2023

Pandas is one of the most popular and powerful data analysis libraries for Python. It provides easy-to-use data structures and tools for working with structured data. In this post, we'll go through a practical introduction to using Pandas for data analysis.

Importing Pandas:

To start using Pandas, we first need to import it:

import pandas as pd

The convention is to import Pandas using `pd` as the shorthand name.

Creating a Pandas DataFrame:

A Pandas DataFrame is a 2-dimensional labeled data structure that can store different data types (strings, numbers, booleans etc.) in columns. It is similar to a spreadsheet or SQL table.

Let's create a simple DataFrame from a dictionary:

data = {'Name': ['John', 'Mary', 'Peter', 'Jeff', 'Bill'], 
        'Age': [28, 32, 47, 19, 55],
        'Gender': ['Male', 'Female', 'Male', 'Male', 'Male']

df = pd.DataFrame(data)
print(df)

  Name  Age  Gender
0  John   28    Male
1  Mary   32  Female   
2  Peter  47    Male
3   Jeff   19    Male
4   Bill   55    Male

The dictionary keys become the column names and the values become the data in columns.

Selecting Columns:

We can select a column in Pandas using the column name like a dictionary key:

ages = df['Age']
print(ages)

0    28
1    32 
2    47
3    19
4    55
Name: Age, dtype: int64

This returns a Pandas Series containing just the 'Age' column data.

Selecting Rows:

We can select rows by integer location or boolean indexing. Let’s get the first 3 rows:

print(df[0:3])

  Name  Age  Gender
0  John   28    Male    
1  Mary   32  Female
2  Peter  47    Male

And rows where 'Age' is greater than 30:

print(df[df['Age'] > 30])

  Name  Age  Gender
1  Mary   32  Female
2  Peter  47    Male
4   Bill   55    Male

Loading Data from CSV:

We can easily load data into a DataFrame from a CSV file using `read_csv()`:

df = pd.read_csv('data.csv')

This will load the 'data.csv' file into a Pandas DataFrame.

There are many additional options like parsing dates and handling missing values that can be specified.

Basic Data Cleaning:

Pandas makes it easy to get rid of missing data and tidy up messy data:

# Drop rows with missing values
df.dropna()

# Fill missing values 
df.fillna(value)

# Change column names
df.rename(columns={'old_name': 'new_ name'})

Useful Operations:

Pandas includes a lot of vectorized functions that make data munging fast:

# Calculate sum of Age column
df['Age'].sum()

# Calculate sum of Age column
df['Age'].sum() 

# Get mean of Age 
df['Age'].mean()

# Get max value of Age
df['Age'].max()

# Sort by Age column
df.sort_values('Age')

There are many more functions for aggregations, slicing, transforming, combining, and visualizing data.

Conclusion:

This covers some of the basics of using Pandas for practical data analysis in Python. Key takeaways:

- DataFrame for storing tabular data
- Read/write data from CSV files
- Column selection, row slicing, boolean indexing
- Built-in methods for cleaning, munging and transforming
- Vectorized operations for fast data analysis

Pandas combines ease of use with performance, making it indispensable for data science workflows. Happy Learning!

P.S. Ever wondered if spamming the 👏 clap button here on Medium is the secret workout for your index finger? Give it a try and let me know if your finger gains superpowers! 💪😎

Part 2: Mastering Pandas: Advanced Techniques for Data Manipulation Excellence