Indexing and Selection#

Effective data manipulation in Pandas often involves precise indexing and selection to isolate specific data segments. This chapter demonstrates several methods to select columns and rows in a DataFrame, enabling refined data analysis.

Select Column#

To select a single column from a DataFrame and return it as a Series:

import pandas as pd

# Sample DataFrame
data = {'name': ['Alice', 'Bob', 'Charlie'],
        'age': [25, 30, 35]}
df = pd.DataFrame(data)

# Selecting a single column
selected_column = df['name']
print(selected_column)

Result:

0     Alice
1       Bob
2    Charlie
Name: name, dtype: object

Select Multiple Columns#

To select multiple columns, use a list of column names. The result is a new DataFrame:

# Selecting multiple columns
selected_columns = df[['name', 'age']]
print(selected_columns)

Result:

      name  age
0    Alice   25
1      Bob   30
2  Charlie   35

Select Rows by Position#

You can select rows based on their position using iloc, which is primarily integer position based:

# Selecting rows by position
selected_rows = df.iloc[0:2]
print(selected_rows)

Result:

    name  age
0  Alice   25
1    Bob   30

Select Rows by Label#

To select rows by label index, use loc, which uses labels in the index:

# Selecting rows by label
selected_rows_by_label = df.loc[0:1]
print(selected_rows_by_label)

Result:

    name  age
0  Alice   25
1    Bob   30

Conditional Selection#

For conditional selection, use a condition within brackets to filter data based on column values:

# Conditional selection
condition_selected = df[df['age'] > 30]
print(condition_selected)

Result:

      name  age
2  Charlie   35

This selection and indexing functionality in Pandas allows for flexible and efficient data manipulations, forming the basis of many data operations you'll perform.