Reading and Writing Data with Pandas
This lesson explains how to read and write data in different formats using the Python Pandas library. Pandas provides simple functions to load data from CSV, Excel, JSON files, and SQL databases, as well as to export data to these formats.
Remember that to use Pandas, it is usually imported with the alias pd:
import pandas as pdReading Files
Pandas provides functions to load data from various formats. All return a DataFrame.
CSV
The read_csv function reads comma-separated value files:
df = pd.read_csv('data.csv')Some useful parameters:
sep: separator character (default,)header: row number with column names (default0)index_col: column to use as indexusecols: list of columns to readdtype: dictionary with data types of columnsna_values: values to interpret as NaN
df = pd.read_csv('data.csv', sep=';', index_col=0, usecols=['name', 'age'])Excel
The read_excel function reads Excel files:
df = pd.read_excel('data.xlsx')Relevant parameters:
sheet_name: name or index of the sheet to read (default is the first)header,index_col,usecols: same as in CSV
df = pd.read_excel('data.xlsx', sheet_name='Sheet1')JSON
The read_json function reads JSON files:
df = pd.read_json('data.json')The orient parameter specifies the JSON structure:
'split': dictionary with keysindex,columns, anddata'records': list of dictionaries (one per row)'index': dictionary of dictionaries indexed by rows'columns': dictionary of dictionaries indexed by columns'values': only the values as a 2D array
df = pd.read_json('data.json', orient='records')SQL
The read_sql function reads data from an SQL database (e.g., sqlite3):
import sqlite3
conn = sqlite3.connect('database.db')
df = pd.read_sql('SELECT * FROM table', conn)
conn.close()You can pass an SQL query or the name of a table. A prior connection to the database is required.
Writing Files
DataFrames can be exported to different formats using specific methods.
Export to CSV
The to_csv method writes a DataFrame in CSV format:
df.to_csv('output.csv')Common parameters:
sep: separator characterindex: whether to include the index (defaultTrue)header: whether to include column names (defaultTrue)columns: list of columns to exportna_rep: string to represent NaN values
df.to_csv('output.csv', sep=';', index=False)Export to Excel
The to_excel method writes a DataFrame in Excel format:
df.to_excel('output.xlsx')Relevant parameters:
sheet_name: sheet name (default'Sheet1')index: whether to include the indexcolumns: list of columns to export
df.to_excel('output.xlsx', sheet_name='Results', index=False)Export to JSON
The to_json method writes a DataFrame in JSON format:
df.to_json('output.json')The orient parameter controls the structure (same as in read_json):
df.to_json('output.json', orient='records', indent=2)The indent parameter adds indentation to improve readability.
General Considerations
All export methods accept the index parameter to control whether the DataFrame index is written (the index is the label identifying each row of the DataFrame and allows direct access). By default, it is True, but it is often convenient to disable it with index=False.
df.to_csv('data.csv', index=False)
df.to_excel('data.xlsx', index=False)
df.to_json('data.json', orient='records')