Advanced File Handling#

Handling files with various configurations and formats is a common necessity in data analysis. Pandas provides extensive capabilities for reading from and writing to different file types with varying delimiters and formats. This chapter will explore reading CSV files with specific delimiters and writing DataFrames to JSON files.

Read CSV with Specific Delimiter#

CSV files can come with different delimiters like commas (,), semicolons (;), or tabs (\t). Pandas allows you to specify the delimiter when reading these files, which is crucial for correctly parsing the data.

Reading CSV with Semicolon Delimiter#

Suppose you have a CSV file filename.csv with the following content:

Name;Age;City
Alice;30;New York
Bob;25;Los Angeles
Charlie;35;Chicago

To read this CSV file into a DataFrame using Pandas, specify the semicolon as the delimiter:

import pandas as pd

# Reading a CSV file with semicolon delimiter
df = pd.read_csv('filename.csv', delimiter = ';')
print(df)

Result:

      Name  Age         City
0    Alice   30     New York
1      Bob   25  Los Angeles
2  Charlie   35      Chicago

Reading CSV with Tab Delimiter#

If the CSV file uses tabs as delimiters, here’s how you might see the file and read it:

File content (filename_tab.csv):

Name    Age City
Alice   30  New York
Bob 25  Los Angeles
Charlie 35  Chicago

To read this file:

# Reading a CSV file with tab delimiter
df_tab = pd.read_csv('filename_tab.csv', delimiter = '\t')
print(df_tab)

Result:

      Name  Age         City
0    Alice   30     New York
1      Bob   25  Los Angeles
2  Charlie   35      Chicago

Writing to JSON#

Writing data to JSON format can be useful for web applications and APIs. Here's how to write a DataFrame to a JSON file:

# DataFrame to write to JSON
df.to_json('filename.json')

Assuming df contains the previous data, the JSON file filename.json would look like this:

{"Name":{"0":"Alice","1":"Bob","2":"Charlie"},"Age":{"0":30,"1":25,"2":35},"City":{"0":"New York","1":"Los Angeles","2":"Chicago"}}

This format is known as 'column-oriented' JSON. Pandas also supports other JSON orientations which can be specified using the orient parameter.

These advanced file handling techniques ensure that you can work with a wide range of file formats and configurations, facilitating data sharing and integration across different systems and applications.