Pandas is a powerful and versatile library in Python that provides data structures and functions designed to make data analysis fast and easy in Python. It's built on top of the NumPy library and integrates well with many other data science and machine learning libraries.
1. Core Components of Pandas:
2. Installing Pandas:
pip install pandas
3. Basics:
# Importing Pandas library import pandas as pd # Creating a Series s = pd.Series([1, 2, 3, 4]) print(s) # Creating a DataFrame from a dictionary data = { 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'San Francisco', 'Los Angeles'] } df = pd.DataFrame(data) print(df)
4. Loading Data:
Pandas provides functions to load various types of files, including CSV, Excel, SQL databases, and more.
# Loading a CSV file df = pd.read_csv('path_to_file.csv') # Loading an Excel file df = pd.read_excel('path_to_file.xlsx')
5. Data Exploration:
# Display the first 5 rows df.head() # Display the last 5 rows df.tail() # Describe the dataset (statistics) df.describe() # Information about the DataFrame df.info()
6. Indexing and Selection:
# Selecting a column df['Name'] # Selecting multiple columns df[['Name', 'Age']] # Row selection using loc and iloc df.loc[0] # Selects the first row by label df.iloc[0] # Selects the first row by index position
7. Filtering:
# Filter rows based on conditions adults = df[df['Age'] > 18]
8. Modifying DataFrames:
# Adding a new column df['Salary'] = [50000, 60000, 70000] # Dropping a column df.drop('Salary', axis=1, inplace=True) # Renaming columns df.rename(columns={'Name': 'First Name'}, inplace=True)
9. Handling Missing Data:
# Checking for missing values df.isnull() # Dropping missing values df.dropna() # Filling missing values df.fillna(value=0)
10. Grouping and Aggregation:
# Grouping by a column and aggregating df.groupby('City').mean()
11. Merging, Joining, and Concatenating:
# Concatenating two DataFrames result = pd.concat([df1, df2]) # Merging two DataFrames merged_df = pd.merge(df1, df2, on='key_column') # Joining two DataFrames joined_df = df1.join(df2, how='inner')
12. Saving Data:
# Saving to a CSV file df.to_csv('output.csv', index=False) # Saving to an Excel file df.to_excel('output.xlsx', index=False)
This is a very basic introduction to the powerful Pandas library. There's a lot more you can do with it, including time series analysis, pivot tables, multi-level indexing, and more.
Pandas Tutorial
Creating Objects
Viewing Data
Selection
Manipulating Data
Grouping Data
Merging, Joining and Concatenating
Working with Date and Time
Working With Text Data
Working with CSV and Excel files
Operations
Visualization
Applications and Projects