Indexing and Selection#
Effective data manipulation in Pandas often involves precise indexing and selection to isolate specific data segments. This chapter demonstrates several methods to select columns and rows in a DataFrame, enabling refined data analysis.
Select Column#
To select a single column from a DataFrame and return it as a Series:
import pandas as pd
# Sample DataFrame
data = {'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35]}
df = pd.DataFrame(data)
# Selecting a single column
selected_column = df['name']
print(selected_column)
Result:
0 Alice
1 Bob
2 Charlie
Name: name, dtype: object
Select Multiple Columns#
To select multiple columns, use a list of column names. The result is a new DataFrame:
# Selecting multiple columns
selected_columns = df[['name', 'age']]
print(selected_columns)
Result:
name age
0 Alice 25
1 Bob 30
2 Charlie 35
Select Rows by Position#
You can select rows based on their position using iloc
, which is primarily integer position based:
# Selecting rows by position
selected_rows = df.iloc[0:2]
print(selected_rows)
Result:
name age
0 Alice 25
1 Bob 30
Select Rows by Label#
To select rows by label index, use loc
, which uses labels in the index:
# Selecting rows by label
selected_rows_by_label = df.loc[0:1]
print(selected_rows_by_label)
Result:
name age
0 Alice 25
1 Bob 30
Conditional Selection#
For conditional selection, use a condition within brackets to filter data based on column values:
# Conditional selection
condition_selected = df[df['age'] > 30]
print(condition_selected)
Result:
name age
2 Charlie 35
This selection and indexing functionality in Pandas allows for flexible and efficient data manipulations, forming the basis of many data operations you'll perform.