Using Gspread-Pandas¶
There are two main objects you will interact with in gspread-pandas
: the Client
and the Spread
objects. The goal of these objects is to make it easy to work with a variety of concepts in Google Sheets and Pandas DataFrames. A lot of care has gone into documenting functions in code to make code introspection tools useful (displaying documentation, code completion, etc).
The target audience are Data Analysts and Data Scientists, but this can also be used by Data Engineers or anyone trying to automate workflows with Google Sheets and Pandas.
Client¶
The Client
extends the Client
object from gspread
to add some functionality. I try to contribute back to the upstream project, but some things don’t make it in, and others don’t belong.
The main things that are added by the Client
are:
- Handling credentials and authentication to reduce boilerplate code.
- Store file paths within drive for more working with files in a more intuitive manner (requires passing
load_dirs=True
or callingClient.refresh_directories()
if you’ve already instantiated aClient
) - A variety of functions to query for and work with Spreadsheets in your Google Drive, mainly:
list_spreadsheet_files
list_spreadsheet_files_in_folder
find_folders
find_spreadssheet_files_in_folders
create_folder
move_file
- Monkey patch the request to automatically retry when there is a 100 second quota exhausted error.
You can read more in the docs for the Client object
.
Spread¶
The Spread
object represents an open Google Spreadsheet. A Spread
object has multiple Worksheets, and only one can be open at any one time. Any function you call will act on the currently open Worksheet, unless you pass sheet=<worksheet_name_or_index>
when you call the function, in which case it will first open that Worksheet and then perform the action. A Spread
object internally aso holds an instance of a Client
to do the majority of the work.
The Spread
object does a lot of stuff to make it easier for working with Google Spreadsheets. For example, it can handle merged cells, frozen rows/columns, data filters, multi-level column headers, permissions, and more. Some things can be called individually, others can be passed in as function parameters. It can also work with tuples (for example (1, 1)
) or A1 notation for specifying cells.
Some of the most important properties of a Spread
object are:
spread
: The currently open Spreadsheet (this is agspread
object)sheet
: The currently open Worksheet (this is agspread
object)client
: TheClient
object. This will be automatically created if one is not passed in, but you can also share the sameClient
instance among multipleSpread
objects if you pass it in.sheets
: The list of all available Worksheets_sheet_metadata
: We store metadata about the sheet, which includes stuff like merged cells, frozen columns, and frozen rows. This is a private property, but you can refresh this withrefresh_spread_metadata()
Some of the most useful functions are:
sheet_to_df
: Create a Pandas DataFrame from a Worksheetclear_sheet
: Clear out all values and resize a Worksheetdelete_sheet
: Delete a Worksheetdf_to_sheet
: Create a Worksheet from a Pandas DataFramefreeze
: Freeze a given number of rows and/or columnsadd_filter
: Add a filter to the Worksheet for the given range of datamerge_cells
: Merge cells in a Worksheetunmerge_cells
: Unmerge cells within a rangeadd_permission
: Add a permission to a Spreadsheetadd_permissions
: Add multiple permissions to a Spreadsheetlist_permissions
: Show all current permissions on the Spreadsheetmove
: Move the Spreadsheet to a different location
You can read more in the docs for the Spread object
.