Getting Started¶
author: Diego Fernandez
Links:
Attention
Upgrading from < 2.0
If you are upgrading, the user
is now an optional param that
uses default
as the default. If you’re a single user, you might
want to re-name your credentials to default
so you can stop
specifying it:
mv ~/.config/gspread_pandas/creds{<old_name>,default}
Overview¶
A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames. It enables you to easily pull data from Google spreadsheets into DataFrames as well as push data into spreadsheets from DataFrames. It leverages gspread in the backend for most of the heavylifting, but it has a lot of added functionality to handle things specific to working with DataFrames as well as some extra nice to have features.
The target audience are Data Analysts and Data Scientists, but it can also be used by Data Engineers or anyone trying to automate workflows with Google Sheets and Pandas.
Some key goals/features:
- Be easy to use interactively, with good docstrings and auto-completion
- Nicely handle headers and indexes (including multi-level headers and merged cells)
- Run on Jupyter, headless server, and/or scripts
- Allow storing different user credentials or using Service Accounts
- Automatically handle token refreshes
- Enable handling of frozen rows and columns
- Enable filling in all merged cells when pulling data
- Nicely handle large data sets and auto-retries
- Enable creation of filters
- Handle retries when exceeding 100 second user quota
- When pushing DataFrames with MultiIndex columns, allow merging or flattening headers
- Ability to handle Spreadsheet permissions
- Ability to specify
ValueInputOption
andValueRenderOption
for specific columns
Installation / Usage¶
To install use pip:
$ pip install gspread-pandas
Or clone the repo:
$ git clone https://github.com/aiguofer/gspread-pandas.git
$ python setup.py install
Before using, you will need to download Google client credentials for your app.
Client Credentials¶
To allow a script to use Google Drive API we need to authenticate our self towards Google. To do so, we need to create a project, describing the tool and generate credentials. Please use your web browser and go to Google console and :
- Choose Create Project in popup menu on the top.
- A dialog box appears, so give your project a name and click on Create button.
- On the left-side menu click on API Manager.
- A table of available APIs is shown. Switch Drive API and click on Enable API button. Do the same for Sheets API. Other APIs might be switched off, for our purpose.
- On the left-side menu click on Credentials.
- In section OAuth consent screen select your email address and give your product a name. Then click on Save button.
- In section Credentials click on Add credentials and switch OAuth client ID (if you want to use your own account or enable the use of multiple accounts) or Service account key (if you prefer to have a service account interacting with spreadsheets).
- If you select OAuth client ID:
- Select Application type item as Desktop app and give it a name.
- Click on Create button.
- Click on Download JSON icon on the right side of created OAuth client IDs and store the downloaded file on your file system.
- If you select Service account key
- Click on Service account dropdown and select New service account
- Give it a Service account name and ignore the Role dropdown (unless you know you need this for something else, it’s not necessary for working with spreadsheets)
- Note the Service account ID as you might need to give that user permission to interact with your spreadsheets
- Leave Key type as JSON
- Click Create and store the downloaded file on your file system.
- Please be aware, the file contains your private credentials, so take
care of the file in the same way you care of your private SSH key;
Move the downloaded JSON to
~/.config/gspread_pandas/google_secret.json
(or you can configure the directory and file name by directly callinggspread_pandas.conf.get_config
Thanks to similar project df2gspread for this great description of how to get the client credentials.
You can read more about it in the configuration docs including how to change the default behavior.
Example¶
import pandas as pd
from gspread_pandas import Spread, Client
file_name = "http://stats.idre.ucla.edu/stat/data/binary.csv"
df = pd.read_csv(file_name)
# 'Example Spreadsheet' needs to already exist and your user must have access to it
spread = Spread('Example Spreadsheet')
# This will ask to authenticate if you haven't done so before
# Display available worksheets
spread.sheets
# Save DataFrame to worksheet 'New Test Sheet', create it first if it doesn't exist
spread.df_to_sheet(df, index=False, sheet='New Test Sheet', start='A2', replace=True)
spread.update_cells('A1', 'B1', ['Created by:', spread.email])
print(spread)
# <gspread_pandas.client.Spread - User: '<example_user>@gmail.com', Spread: 'Example Spreadsheet', Sheet: 'New Test Sheet'>
# You can now first instanciate a Client separately and query folders and
# instanciate other Spread objects by passing in the Client
client = Client()
# Assumming you have a dir called 'example dir' with sheets in it
available_sheets = client.find_spreadsheet_files_in_folders('example dir')
spreads = []
for sheet in available_sheets.get('example dir', []):
spreads.append(Spread(sheet['id'], client=client))
Troubleshooting¶
EOFError in Rodeo¶
If you’re trying to use gspread_pandas
from within
Rodeo you might get an
EOFError: EOF when reading a line
error when trying to pass in the verification
code. The workaround for this is to first verify your account in a regular shell.
Since you’re just doing this to get your Oauth token, the spreadsheet doesn’t need
to be valid. Just run this in shell:
python -c "from gspread_pandas import Spread; Spread('<user_key>','')"
Then follow the instructions to create and store the OAuth creds.
This action would increase the number of cells in the workbook above the limit of 10000000 cells.¶
IMO, Google sheets is not the right tool for large datasets. However, there’s probably good reaons you might have to use it in such cases. When uploading a large DataFrame, you might run into this error.
By default, Spread.df_to_sheet
will add rows and/or columns needed to accomodate the DataFrame.
Since a new sheet contains a fairly large number of columns, if you’re uploading a DF with lots of
rows you might exceed the max number of cells in a worksheet even if your data does not. In order
to fix this you have 2 options:
- The easiest is to pass
replace=True
, which will first resize the worksheet and clear out all values. - Another option is to first resize to 1x1 using
Spread.sheet.resize(1, 1)
and then dodf_to_sheet
There’s a strange caveat with resizing, so going to 1x1 first is recommended (replace=True
already does this). To read more see this issue
Configuration¶
By default, the configuration will be in $HOME/.config/gspread_pandas
on Nix systems and
%APPDATA%\gspread_pandas
on Windows. Under the default behavior, you must have your Google
client credentials stored in google_secret.json
in that directory. If you’re not using a
Service Account, the user credentials will be stored in a subdirectory called creds
.
App Credentials¶
There’s 2 main types of app credentials: OAuth client and Service Account. In order to act as your own Google user, you will need the OAuth client app credentials. With this type of credentials, each user will need to grant permissions to your app. When they grant permissions, their credentials will be stored as described below.
As a Service Account, the used credentials will be for the service account itself. This means that you’ll be using the service account’s e-mail and Google drive. Additionally, it will only be able to work withSpreadsheets that it has permissions for. Although Service Accounts can be useful for batch processes, you might generally prefer to work as your own user.
User Credentials¶
Once you have your client credentials, you can have multiple user
credentials stored in the same machine. This can be useful when you have
a shared server (for example with a Jupyter notebook server) with
multiple people that may want to use the library. The user
parameter to
Spread
must be the key identifying a user’s credentials, by default it
will store the creds using default
as the key. The first
get_creds
is called for a specific key, you will have to authenticate
through a text based OAuth prompt; this makes it possible to run on a headless
server through ssh or through a Jupyter notebook. After this, the
credentials for that user will be stored in the creds
subdirectory and the
tokens will berefreshed automatically any time the tool is used.
Users will only be able to interact with Spreadsheets that they have access to.
Authentication¶
In the backend, the library is leveraging Google’s google-auth to handle authentication. It conveniently stores everything as described above so that you don’t have to worry about boiler plate code to handle auth.
When a Client
is instanciated, an AuthorizedSession
is created using the
credentials and this is what’s used to make requests to the API. This takes care
of handling token refreshes and retries for you.
Alternate Workflows¶
There’s a variety of ways to change the default behavior of how/where configuration is stored.
The easiest way to change the default location is to set the GSPREAD_PANDAS_CONFIG_DIR
env variable to the directory where you want to store everything. If you use this, the
client creds will still need to be named google_secret.json
and user creds will still
be stored in the creds
subdirectory.
If you have different client credentials, you could load them passing in conf_dir
and/or
file_name
to gspread_pandas.conf.get_config
. Alternatively, you could pull these from
elsewhere, like a database. Once you have the config, you could then pass that to a
Client
or Spread
instance, or you could get credentials by passing it to
gspread_pandas.conf.get_creds
.
When using a Service Account, the user
param will be ignored in Client
, Spread
and
get_creds
. Otherwise, this param will be used to store the OAuth2 credentials for each user in the
creds subdirectory. If you generate your credentials elsewhere, you can pass them in to a Client
or Spread
. You can also run through the flow to get OAuth2 and avoid saving them by calling
get_creds
directly. You can also override the creds_dir
if you call this function.
Using Gspread-Pandas¶
There are two main objects you will interact with in gspread-pandas
: the Client
and the Spread
objects. The goal of these objects is to make it easy to work with a variety of concepts in Google Sheets and Pandas DataFrames. A lot of care has gone into documenting functions in code to make code introspection tools useful (displaying documentation, code completion, etc).
The target audience are Data Analysts and Data Scientists, but this can also be used by Data Engineers or anyone trying to automate workflows with Google Sheets and Pandas.
Client¶
The Client
extends the Client
object from gspread
to add some functionality. I try to contribute back to the upstream project, but some things don’t make it in, and others don’t belong.
The main things that are added by the Client
are:
- Handling credentials and authentication to reduce boilerplate code.
- Store file paths within drive for more working with files in a more intuitive manner (requires passing
load_dirs=True
or callingClient.refresh_directories()
if you’ve already instantiated aClient
) - A variety of functions to query for and work with Spreadsheets in your Google Drive, mainly:
list_spreadsheet_files
list_spreadsheet_files_in_folder
find_folders
find_spreadssheet_files_in_folders
create_folder
move_file
- Monkey patch the request to automatically retry when there is a 100 second quota exhausted error.
You can read more in the docs for the Client object
.
Spread¶
The Spread
object represents an open Google Spreadsheet. A Spread
object has multiple Worksheets, and only one can be open at any one time. Any function you call will act on the currently open Worksheet, unless you pass sheet=<worksheet_name_or_index>
when you call the function, in which case it will first open that Worksheet and then perform the action. A Spread
object internally aso holds an instance of a Client
to do the majority of the work.
The Spread
object does a lot of stuff to make it easier for working with Google Spreadsheets. For example, it can handle merged cells, frozen rows/columns, data filters, multi-level column headers, permissions, and more. Some things can be called individually, others can be passed in as function parameters. It can also work with tuples (for example (1, 1)
) or A1 notation for specifying cells.
Some of the most important properties of a Spread
object are:
spread
: The currently open Spreadsheet (this is agspread
object)sheet
: The currently open Worksheet (this is agspread
object)client
: TheClient
object. This will be automatically created if one is not passed in, but you can also share the sameClient
instance among multipleSpread
objects if you pass it in.sheets
: The list of all available Worksheets_sheet_metadata
: We store metadata about the sheet, which includes stuff like merged cells, frozen columns, and frozen rows. This is a private property, but you can refresh this withrefresh_spread_metadata()
Some of the most useful functions are:
sheet_to_df
: Create a Pandas DataFrame from a Worksheetclear_sheet
: Clear out all values and resize a Worksheetdelete_sheet
: Delete a Worksheetdf_to_sheet
: Create a Worksheet from a Pandas DataFramefreeze
: Freeze a given number of rows and/or columnsadd_filter
: Add a filter to the Worksheet for the given range of datamerge_cells
: Merge cells in a Worksheetunmerge_cells
: Unmerge cells within a rangeadd_permission
: Add a permission to a Spreadsheetadd_permissions
: Add multiple permissions to a Spreadsheetlist_permissions
: Show all current permissions on the Spreadsheetmove
: Move the Spreadsheet to a different location
You can read more in the docs for the Spread object
.
Modules¶
gspread_pandas¶
Submodules¶
gspread_pandas.spread module¶
-
class
gspread_pandas.spread.
Spread
(spread, sheet=0, config=None, create_spread=False, create_sheet=False, scope=['openid', 'https://www.googleapis.com/auth/drive', 'https://www.googleapis.com/auth/userinfo.email', 'https://www.googleapis.com/auth/spreadsheets'], user='default', creds=None, client=None, permissions=None)¶ Bases:
object
Simple wrapper for gspread to interact with Pandas. It holds an instance of an ‘open’ spreadsheet, an ‘open’ worksheet, and a list of available worksheets.
Each user will be associated with specific OAuth credentials. The authenticated user will need the appropriate permissions to the Spreadsheet in order to interact with it.
Parameters: - spread (str) – name, url, or id of the spreadsheet; must have read access by
the authenticated user,
see
open_spread
- sheet (str,int) – optional, name or index of Worksheet,
see
open_sheet
(default 0) - config (dict) – optional, if you want to provide an alternate configuration,
see
get_config
(default None) - create_sheet (bool) – whether to create the worksheet if it doesn’t exist,
it wil use the
spread
value as the sheet title (default False) - create_spread (bool) – whether to create the spreadsheet if it doesn’t exist,
it wil use the
spread
value as the sheet title (default False) - scope (list) – optional, if you’d like to provide your own scope (default default_scope)
- user (str) – string indicating the key to a users credentials,
which will be stored in a file (by default they will be stored in
~/.config/gspread_pandas/creds/<user>
but can be modified withcreds_dir
property in config). If using a Service Account, this will be ignored. (default “default”) - creds (google.auth.credentials.Credentials) – optional, pass credentials if you have those already (default None)
- client (Client) – optionall, if you’ve already instanciated a Client, you can just pass that and it’ll be used instead (default None)
- permissions (list) – a list of strings. See
add_permissions
for the expected format
-
add_filter
(start=None, end=None, sheet=None)¶ Add filters to data in the open worksheet.
Parameters: - start (tuple,str) – Tuple indicating (row, col) or string like ‘A1’ (default ‘A1’)
- end (tuple, str) – Tuple indicating (row, col) or string like ‘A1’ (default last cell in sheet)
- sheet (str,int,Worksheet) – optional, if you want to open or create a
different sheet before adding the filter,
see
open_sheet
(default None)
Returns: Return type: None
-
add_permission
(permission)¶ Add a permission to the current spreadsheet.
The format should be:
<id>|(<group>)|(<role>)|(<notify>)|(<require_link>)
where:<id> - email address of group or individual, domain, or ‘anyone’ <group> - optional, if the id is a group e-mail, this needs to be ‘group’ or
‘grp’- <role> - optional, one of ‘owner’, ‘writer’, or ‘reader’. If ommited, ‘reader’
- will be used
<notify> - optional, if you don’t want to notify the user, pass ‘no’ or ‘false’ <require_link> - optional, if you want to require the user to have the link,
pass ‘link’For example, to allow anyone with a link in the group admins@example.com to write when they have a link, but without sending a notification to the group:
admins@example.com|grp|owner|false|link
Or if you want to give user@example.com reader permissions without a notification:
user@example.com|no
Or to give anyone read access:
anyone
Parameters: permissions (string) – A strings meeting the above mentioned format. Returns: Return type: None
-
add_permissions
(permissions)¶ Add permissions to the current spreadsheet. See.
add_permission
for format.Parameters: permissions (list) – A list of strings meeting the above mentioned format. Returns: Return type: None
-
clear_sheet
(rows=1, cols=1, sheet=None)¶ Reset open worksheet to a blank sheet with given dimensions.
Parameters: - rows (int) – number of rows (default 1)
- cols (int) – number of columns (default 1)
- sheet (str,int,Worksheet) – optional; name, index, or Worksheet,
see
open_sheet
(default None)
Returns: Return type: None
-
create_sheet
(name, rows=1, cols=1)¶ Create a new worksheet with the given number of rows and cols.
Automatically opens that sheet after it’s created.
Parameters: - name (str) – name of new Worksheet
- rows (int) – number of rows (default 1)
- cols (int) – number of columns (default 1)
Returns: Return type: None
-
delete_sheet
(sheet)¶ Delete a worksheet by title. Returns whether the sheet was deleted or not. If current sheet is deleted, the
sheet
property will be set to None.Parameters: sheet (str,Worksheet) – name or Worksheet Returns: True if deleted successfully, else False Return type: bool
-
df_to_sheet
(df, index=True, headers=True, start=(1, 1), replace=False, sheet=None, raw_columns=None, freeze_index=False, freeze_headers=False, fill_value='', add_filter=False, merge_headers=False, flatten_headers_sep=None, merge_index=False)¶ Save a DataFrame into a worksheet.
Parameters: - df (DataFrame) – the DataFrame to save
- index (bool) – whether to include the index in worksheet (default True)
- headers (bool) – whether to include the headers in the worksheet (default True)
- start (tuple,str) – tuple indicating (row, col) or string like ‘A1’ for top left cell (default (1,1))
- replace (bool) – whether to remove everything in the sheet first (default False)
- sheet (str,int,Worksheet) – optional, if you want to open or create a different sheet
before saving,
see
open_sheet
(default None) - raw_columns (list, str) – optional, list of columns from your dataframe that you want interpreted as RAW input in google sheets. This can be column names or column numbers.
- freeze_index (bool) – whether to freeze the index columns (default False)
- freeze_headers (bool) – whether to freeze the header rows (default False)
- fill_value (str) – value to fill nulls with (default ‘’)
- add_filter (bool) – whether to add a filter to the uploaded sheet (default False)
- merge_headers (bool) – whether to merge cells in the header that have the same value (default False)
- flatten_headers_sep (str) – if you want to flatten your multi-headers to a single row, you can pass the string that you’d like to use to concatenate the levels, for example, ‘: ‘ (default None)
- merge_index (bool) – whether to merge cells in the index that have the same value (default False)
Returns: Return type: None
-
find_sheet
(sheet)¶ Find a given worksheet by title or by object comparison.
Parameters: sheet (str,Worksheet) – name of Worksheet or Worksheet object Returns: the Worksheet by the given name or None if not found Return type: Worksheet
-
freeze
(rows=None, cols=None, sheet=None)¶ Freeze rows and/or columns for the open worksheet.
Parameters: - rows (int) – number of rows to freeze, use 0 to ‘unfreeze’ (default None)
- cols (int) – number of columns to freeze, use 0 to ‘unfreeze’ (default None)
- sheet (str,int,Worksheet) – optional, if you want to open or create a
different sheet before freezing,
see
open_sheet
(default None)
Returns: Return type: None
-
get_sheet_dims
(sheet=None)¶ Get the dimensions of the currently open Worksheet.
Parameters: sheet (str,int,Worksheet) – optional, if you want to open a different sheet first, see open_sheet
(default None)Returns: a tuple containing (num_rows,num_cols) Return type: tuple
-
list_permissions
()¶ List all permissions for this Spreadsheet.
Returns: a list of dicts indicating the permissions on this spreadsheet Return type: list
-
merge_cells
(start, end, merge_type='MERGE_ALL', sheet=None)¶ Merge cells between the start and end cells. Use merge_type if you want to change the behavior of the merge.
Parameters: - start (tuple,str) – Tuple indicating (row, col) or string like ‘A1’
- end (tuple, str) – Tuple indicating (row, col) or string like ‘A1’
- merge_type (str) – One of MERGE_ALL, MERGE_ROWS, or MERGE_COLUMNS (default “MERGE_ALL”)
- sheet (str,int,Worksheet) – optional, if you want to open or create a
different sheet before adding the filter,
see
open_sheet
(default None)
Returns: Return type: None
-
move
(path='/', create=True)¶ Move the current spreadsheet to the specified path in your Google drive. If the file is not currently in you drive, it will be added.
Parameters: - path (str) – folder path (Default value = “/”)
- create (bool) – if true, create folders as needed (Default value = True)
-
open
(spread, sheet=None, create_sheet=False, create_spread=False)¶ Open a spreadsheet, and optionally a worksheet. See.
open_spread
andopen_sheet
.Parameters: - spread (str) – name, url, or id of Spreadsheet
- sheet (str,int) – name or index of Worksheet (default None)
- create_sheet (bool) – whether to create the worksheet if it doesn’t exist,
it wil use the
spread
value as the sheet title (default False) - create_spread (bool) – whether to create the spreadsheet if it doesn’t exist,
it wil use the
spread
value as the sheet title (default False)
Returns: Return type: None
-
open_sheet
(sheet, create=False)¶ Open a worksheet. Optionally, if the sheet doesn’t exist then create it first (only when
sheet
is a str).Parameters: - sheet (str,int,Worksheet) – name, index, or Worksheet object
- create (bool) – whether to create the sheet if it doesn’t exist,
see
create_sheet
(default False)
Returns: Return type: None
-
open_spread
(spread, create=False)¶ Open a spreadsheet. Authorized user must already have read access.
Parameters: - spread (str) – name, url, or id of Spreadsheet
- create (bool) – whether to create the spreadsheet if it doesn’t exist,
it wil use the
spread
value as the sheet title (default False)
Returns: Return type: None
-
refresh_spread_metadata
()¶ Refresh spreadsheet metadata.
-
sheet_to_df
(index=1, header_rows=1, start_row=1, unformatted_columns=None, formula_columns=None, sheet=None)¶ Pull a worksheet into a DataFrame.
Parameters: - index (int) – col number of index column, 0 or None for no index (default 1)
- header_rows (int) – number of rows that represent headers (default 1)
- start_row (int) – row number for first row of headers or data (default 1)
- unformatted_columns (list) – column numbers or names for columns you’d like to pull in as unformatted values (defaul [])
- formula_columns (list) – column numbers or names for columns you’d like to pull in as actual formulas (defaul [])
- sheet (str,int) – optional, if you want to open a different sheet first,
see
open_sheet
(default None)
Returns: DataFrame with the data from the Worksheet
Return type: DataFrame
-
unmerge_cells
(start='A1', end=None, sheet=None)¶ Unmerge all cells between the start and end cells. Use defaults to unmerge all cells in the sheet.
Parameters: - start (tuple,str) – Tuple indicating (row, col) or string like ‘A1’ (default A1)
- end (tuple,str) – Tuple indicating (row, col) or string like ‘A1’ (default last cell in sheet)
- sheet (str,int,Worksheet) – optional, if you want to open or create a
different sheet before adding the filter,
see
open_sheet
(default None)
Returns: Return type: None
-
update_cells
(start, end, vals, sheet=None, raw_columns=None)¶ Update the values in a given range. The values should be listed in order from left to right across rows.
Parameters: - start (tuple,str) – tuple indicating (row, col) or string like ‘A1’
- end (tuple,str) – tuple indicating (row, col) or string like ‘Z20’
- vals (list) – array of values to populate
- sheet (str,int,Worksheet) – optional, if you want to open a different sheet first,
see
open_sheet
(default None) - raw_columns (list, int) – optional, list of column numbers in the google sheet that should be interpreted as “RAW” input
Returns: Return type: None
-
email
¶ (str) - E-mail for the currently authenticated user
-
sheet
= None¶ (gspread.worksheet.Worksheet) - Currently open Worksheet
-
sheets
¶ (list) - List of available Worksheets
-
spread
= None¶ (gspread.spreadsheet.Spreadsheet) - Currently open Spreadsheet
-
url
¶ (str) - Url for this spreadsheet
- spread (str) – name, url, or id of the spreadsheet; must have read access by
the authenticated user,
see
gspread_pandas.client module¶
-
class
gspread_pandas.client.
Client
(user='default', config=None, scope=['openid', 'https://www.googleapis.com/auth/drive', 'https://www.googleapis.com/auth/userinfo.email', 'https://www.googleapis.com/auth/spreadsheets'], creds=None, session=None, load_dirs=False)¶ Bases:
gspread.client.Client
The gspread_pandas
Client
extendsClient
and authenticates using credentials stored ingspread_pandas
config.This class also adds a few convenience methods to explore the user’s google drive for spreadsheets.
Parameters: - user (str) – optional, string indicating the key to a users credentials,
which will be stored in a file (by default they will be stored in
~/.config/gspread_pandas/creds/<user>
but can be modified withcreds_dir
property in config). If using a Service Account, this will be ignored. (default “default”) - config (dict) – optional, if you want to provide an alternate configuration,
see
get_config
(default None) - scope (list) – optional, if you’d like to provide your own scope (default default_scope)
- creds (google.auth.credentials.Credentials) – optional, pass credentials if you have those already (default None)
- session (google.auth.transport.requests.AuthorizedSession) – optional, pass a google.auth.transport.requests.AuthorizedSession or a requests.Session and creds (default None)
- load_dirs (bool) – optional, whether you want to load directories and paths on instanciation. if you refresh directories later or perform an action that requires them, they will be loaded at that time. For speed, this is disabled by default (default False)
-
create_folder
(path, parents=True)¶ Create a new folder in your Google drive.
Parameters: - path (str) – folder path
- parents (bool) – if True, create parent folders as needed (Default value = True)
Returns: information for the created directory
Return type: dict
-
find_folders
(folder_name_query='')¶ Return all folders that the user has access to containing
folder_name_query
in the name.Parameters: folder_name_query (str) – Case insensitive string to search in folder name. If empty, it will return all folders. Returns: List of folders. Each folder is a dict with the following keys: id, kind, mimeType, and name. Return type: list
-
find_spreadsheet_files_in_folders
(folder_name_query)¶ Return all spreadsheets that the user has access to in all the folders that contain
folder_name_query
in the name. Returns as a dict with each key being the folder name and the value being a list of spreadsheet files.Parameters: folder_name_query (str) – Case insensitive string to search in folder name Returns: Spreadsheets in each folder. Each entry is a dict with the folder name as the key and a list of spreadsheets as the value. Each spreadsheet is a dict with the following keys: id, kind, mimeType, and name. Return type: dict
-
list_spreadsheet_files
(title=None)¶ Return all spreadsheets that the user has access to.
Parameters: title (str) – name of the spreadsheet, if none is passed it’ll return every file (default None) Returns: List of spreadsheets. Each spreadsheet is a dict with the following keys: id, kind, mimeType, and name. Return type: list
-
list_spreadsheet_files_in_folder
(folder_id)¶ Return all spreadsheets that the user has access to in a sepcific folder.
Parameters: folder_id (str) – ID of a folder, see find_folders
Returns: List of spreadsheets. Each spreadsheet is a dict with the following keys: id, kind, mimeType, and name. Return type: list
-
login
()¶ Override login since AuthorizedSession now takes care of automatically refreshing tokens when needed.
-
move_file
(file_id, path, create=False)¶ Move a file to the given path.
Parameters: - file_id (str) – file id
- path (str) – folder path. A path starting with / will use your drive, for shared drives the path will start with <Shared Drive Name>/ (no leading /)
- create (bool) – whether to create any missing folders (Default value = False)
-
open
(title)¶ Opens a spreadsheet.
Parameters: title (str) – A title of a spreadsheet. Returns: a Spreadsheet
instance.If there’s more than one spreadsheet with same title the first one will be opened.
Raises: gspread.SpreadsheetNotFound – if no spreadsheet with specified title is found. >>> c = gspread.authorize(credentials) >>> c.open('My fancy spreadsheet')
-
refresh_directories
()¶ Refresh list of directories for the current user.
-
directories
¶ (list) - list of dicts for all avaliable directories for the current user
-
email
¶ (str) - E-mail for the currently authenticated user
-
root
¶ (dict) - the info for the top level Drive directory for current user
-
scope
= None¶ (list) - Feeds included for the OAuth2 scope
- user (str) – optional, string indicating the key to a users credentials,
which will be stored in a file (by default they will be stored in
gspread_pandas.conf module¶
-
gspread_pandas.conf.
get_config
(conf_dir=None, file_name='google_secret.json')¶ Get config for Google client. Looks in ~/.config/gspread_pandas/google_secret.json by default but you can override it with conf_dir and file_name. The creds_dir value will be set to conf_dir/creds and the directory will be created if it doesn’t exist; if you’d like to override that you can do so by changing the ‘creds_dir’ value in the dict returned by this function.
Download json from https://console.developers.google.com/apis/credentials
Parameters: - conf_dir (str) – Full path to config dir (Default value = get_config_dir())
- file_name (str) – (Default value = “google_secret.json”)
Returns: Dict with necessary contents of google_secret.json
Return type: dict
-
gspread_pandas.conf.
get_creds
(user='default', config=None, scope=['openid', 'https://www.googleapis.com/auth/drive', 'https://www.googleapis.com/auth/userinfo.email', 'https://www.googleapis.com/auth/spreadsheets'], creds_dir=None, save=True)¶ Get google google.auth.credentials.Credentials for the given user. If the user doesn’t have previous creds, they will go through the OAuth flow to get new credentials which will be saved for later use. Credentials will be saved in config[‘creds_dir’], if this value is not set, then they will be stored in a folder named
creds
in the default config dir (either ~/.config/gspread_pandas or.$GSPREAD_PANDAS_CONFIG_DIR)
Alternatively, it will get credentials from a service account.
Parameters: - user (str) – Unique key indicating user’s credentials. This is not necessary when using a ServiceAccount and will be ignored (Default value = “default”)
- config (dict) – Optional, dict with “client_id”, “client_secret”, and “redirect_uris” keys for
OAuth or “type”, “client_email”, “private_key”, “private_key_id”, and
“client_id” for a Service Account. If None is passed, it will call
get_config()
(Default value = None) - creds_dir (str, Path) – Optional, directory to load and store creds from/in. If None, it will use the
creds
subdirectory in the default config location. (Default value = None) - scope (list) – Optional, scope to use for Google Auth (Default value = default_scope)
Returns: Google credentials that can be used with gspread
Return type: google.auth.credentials.Credentials
Contributing¶
Code should be run through black, isort, and flake8 before being merged. Pre-commit takes care of it for you, but you need to have Python 3 installed to be able to run black. To contribute, please fork the repo, create a feature branch, push it to your repo, then create a pull request.
To install and set up the environment after you fork it (replace aiguofer with your username):
$ git clone https://github.com/aiguofer/gspread-pandas.git && cd gspread-pandas
$ pip install -e ".[dev]"
$ pre-commit install
Testing¶
Our tests levarage betamax to remember HTTP
interactions with the API. In order to add new tests that change the requests to
betamax, you’ll need to have Service Account credentials stored as google_secret.json
in the root project directory. You can then re-record tests by deleting the necessary
cassetes in tests/cassettes
then running:
$ GSPREAD_RECORD=true pytest <path_to_test>
NOTE: Currently, the tests don’t do any setup and teardown of expected directories/files in the Google Drive. My main concern in implementing this is that somehow it might mistakenly use a specific user’s credentials and delete important stuff. If you have any ideas here I’d be happy to discuss.
Versions¶
In order to bump versions, we use bump2version. This will take care of adding an entry in the CHANGELOG for the new version and bumping the version everywhere it needs to. This will also create a git tag for the specific version.
CI¶
CI is managed by Github Actions: python-package.yml - workflow for testing and linting for each python version on every push tagged-release.yml - workflow that does ^ and then creates a github release only on tagged push python-publish.yml - workflow that publishes package to PyPi when a github release is created
Change Log¶
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
[Unreleased]¶
Added¶
- Add
merge_index
option forSpread.df_to_sheet
[3.2.2] - 2022-06-29¶
Changed¶
- Update CI and its documentation
[3.2.1] - 2022-06-29¶
Fixed¶
- Don’t use the deprecated pandas option
inplace = True
- Fix hanging auth test
- Example in docs
Removed¶
- Fix for early pandas 1.0 is no longer necessary
[3.2.0] - 2022-03-19¶
[3.1.0] - 2022-03-19¶
Fixed¶
- OOB auth flow was removed from google-auth, see here. Fixed it by running local server to listen for token.
[3.0.4] - 2022-01-18¶
Fixed¶
- Frozen column field in
gridProperties
changed fromfrozenColCount
tofrozenColumnCount
[3.0.3] - 2022-01-04¶
Fixed¶
- Potential issue with resizing in
clear_sheet
Added¶
- Section about resizing issues to README
[3.0.2] - 2021-12-29¶
Removed¶
- Remove deprecated param raw_column_names
[3.0.1] - 2021-12-28¶
Changed¶
- Pipeline changes
[3.0.0] - 2021-12-28¶
Changed¶
- Upgrade gspread to >= 5
Removed¶
- BREAKING: Removed Python 2.7 support
- Removed Travis pipeline
Added¶
- CI pipeline using Github Actions
[2.3.1] - 2021-11-29¶
Fixed¶
- Fix gspread 5 incompatibility by blocking requirements
[2.3.0] - 2021-03-21¶
Changed¶
- Switch to pycryptodome instead of pycrypto to fix tests
- Switch from future to six for python 2 compat
Fixed¶
- Fix docstring about default sheet param for open
- Fix numpy deprecation warning about numeric-style types
- Fix google quota exceed retry
Added¶
- Added ability to use shared drives
[2.2.4] - 2021-01-24¶
Fixed¶
- Fix dtype issues for numeric+null values for pandas 1.0
- Fix auth within Colab
[2.2.3] - 2020-03-26¶
Fixed¶
- Ensure we respect XDG_CONFIG_HOME
- Use current dir if no $HOME env var is set (in some cloud environments and Colab)
[2.2.2] - 2020-03-21¶
Changed¶
- Use Path for various paths in conf.py
[2.2.1] - 2020-01-05¶
Added¶
- Added a section explaining the main concepts of
Gspread-Pandas
to the docs.
Fixed¶
- Fixed the auto-retry when encountering the 100 second quota limit (thanks @rodelrod).
- All
Client
objects should now be monkeypatched to retry on 100 s quota error. Before it was only those within aSpread
instance.
Changed¶
- When a
Spread
is opened, it will now open the first worksheet by default.
[2.2.0] - 2019-11-17¶
Added¶
- Added
formula_columns
andunformatted_columns
as options toSpread.sheet_to_df
to allow usage of ValueRenderOption for specific columns. - Added test for python 3.7 and 3.8 in travis
Fixed¶
- Bug with certain functions when index=False
- Bug when all columns are
raw_columns
- Fixed travis tests
Deprecated¶
raw_column_names
is deprecated in favor ofraw_columns
. It will be removed in v3.
Changed¶
- bumpversion config now in setup.cfg
- Docstrings now use docformatter format
[2.1.3] - 2019-08-25¶
Added¶
- Added
Spread.add_permission
to differentiate adding a single permission vs adding multiple permissions.
Changed¶
- Ensure license is packaged with sdist
- Unified and simplified project and test configuration settings
[2.1.2] - 2019-07-10¶
Added¶
- Added test for open_sheet
Fixed¶
- Client directories property now works as expected
- Various fixes for tests
Changed¶
- Tox must run correctly before being able to push a new tag
[2.1.1] - 2019-07-10¶
Fixed¶
- Fixed import in Spread after refactor, oops! should have ran tests
[2.1.0] - 2019-07-10¶
Added¶
- Client now has an optional
load_dirs
param which default toFalse
Changed¶
- BREAKING: Refactored
Spread
into its own file. If you were importing likefrom gspread_pandas.client import Spread
you will need to change tofrom gspread_pandas.spread import Spread
. - Directories and paths are no longer loaded by default. However, if you try to use any functionality that requires it, it’ll load it at that point.
Fixed¶
- If a file doesn’t have
parents
it’ll no longer break (thanks @shredding) (#29) - $XDG_CONFIG_HOME should now be respected
- If you don’t have Drive API access in the scope, it should now still work and print a message instead
[2.0.0] - 2019-06-12¶
Added¶
- Test python 3.7, Windows, and MacOS
- You can now iterate over worksheets like:
for sheet in spread
Spread.df_to_sheet
can now flatten multi-level headers using theflatten_headers_sep
param- Add ability to set permissions on spreadsheets
- Add ability to create and move folders and spreadsheets
- A session can now be passed directly to a
Client
- A
raw_column_names
param toSpread.df_to_sheet
to force specific columns to be sent to the Google Sheets API as RAW input so it doesn’t get interpreted as a number, date, etc.
Removed¶
- BREAKING: Removed
start/end_row/col
params from add_filter - BREAKING: Removed
user_creds_or_client
param from Spread - BREAKING: Removed
user_or_creds
param from Client
Changed¶
- The
credentials
param is now calledcreds
everywhere - Test suite is now a lot more robust
- Use google-auth instead of the now deprecated oauth2client library. This moves the retry code into that library.
- Default config will now be in
C:\Users\<user>\AppData\gspread_pandas
on Windows
Fixed¶
- Things should now work when passing a
Worksheet
object toSpread.open
[1.3.1] - 2019-05-17¶
Fixed¶
- Passing 0 to
sheet`
param in any function should work now - When using multi-row column headers in a spreadsheet, the index name should now be better identified
- Spread;update_cells should now work when passing a single cell value
- When start != ‘A1’, freeze_rows/headers should now correctly freeze the right amount of rows/headers so the index and columns are frozen
[1.3.0] - 2019-04-30¶
Added¶
- Function to merge_cells
- Function to unmerge_cells
- Option to merge_headers in df_to_sheet
- Retry when exceeding the 100s quota
Fixed¶
- Fix passing 0 for freeze_index or freeze_headers. This essentially “unfreezes”
- When the index has no name and you have a multi-level header, it will no longer fill in “index” as the index header
Deprecated¶
- Spread will no longer use the ‘user_creds_or_client’ param in v2. Instead, it will have optional ‘credentials’, ‘client’, and ‘user’ params. If creds or a client are passed, the user will be ignored. Otherwise, it’ll use the user, which will default to “default”
- Client will no longer use the ‘user_or_creds’ param in v2. Instead, it will have optional ‘credentials’ and ‘user’ params. If creds passed, the user will be ignored. Otherwise, it’ll use the user, which will default to “default”
- Spread.add_filter will be standardized to use ‘start’ and ‘end’ like other functions and the start/end_row/col are deprecated and will be removed in v2
Changed¶
- Exceptions are no longer raised while handling another exception. This should prevent the “During handling of the above exception, another exception occurred” message
- When opening a new Spreadsheet, the SpreadsheetNotFound exception will no longer be a “catchall” for any errors. If an error other than actually not finding the Spreadsheet occurs, it’ll be raised.
- Default value for the user param in util.get_config was changed to “default”
[1.2.0] - 2018-08-30¶
Added¶
- Add config files and pre-commit hooks for isort, black, and flake8
- Add config files for isort, black, and flake8
Fixed¶
- Fixed clear_sheet when there are frozen rows/cols
- Small fixes in README
Changed¶
- Changed from reST docstrings to numpy docstrings
- Updated README to include more in contributing section
[1.1.3] - 2018-07-07¶
Added¶
- Added unit tests for util
Fixed¶
- Fix parse_df_col_names when df has a multi-index
- Fix parse_sheet_index when using last column as index
- Fix fillna when using categorical variables
[1.1.2] - 2018-06-23¶
Fixed¶
- Fix issue with basestring usage
Changed¶
- Remove Python 3.4 from travis tests
[1.1.1] - 2018-06-13¶
Changed¶
Spread.clear_sheet
now doesn’t resize to 0 since V4 is much more efficient at making batch updates. This should help prevent formulas that point to these sheets from breaking.
[1.1.0] - 2018-06-02¶
Fixed¶
- Now works with gspread 3.0
- Spread.freeze is working again
Changed¶
- Moved a lot of the credential handling into functions in gspread_pandas.conf
- New
get_creds
function allows you to getOAuth2Credentials
and pass them in to aClient
orSpread
- Some functions were moved to
gspread_pandas.util
Added¶
- New function
Spread.add_filter
created so that you can add filters to worksheets - New param
add_filter
added toSpread.df_to_sheet
to add a filter to uploaded data
[1.0.4] - 2018-04-08¶
Fixed¶
- Change ValueInputOption to USER_ENTERED so dates and numbers are parsed correctly in Google Sheets
[1.0.2] - 2018-04-02¶
Changed¶
- Some dependency changes
- Travis deploy will only happen on python 3.6
- Changes to reduce number of fetch_sheet_metadata calls
[1.0.1] - 2018-03-26¶
Changed¶
- Replace pypi-publisher with twine in dev reqs
- Change download url, now it should match the tags from bumpversion
[1.0.0] - 2018-03-26¶
Added¶
- There is now a separate
Client
class that extends the gspread v4 Client class and adds some functionalty. This includes a monkeypatche and hacky workarounds for gspread 2.0 issues. Once they get fixed upstream I need to remove these.
Changed¶
- Now supports gspread 2.0 which uses Spreadsheets V4 API, this provides much better performance and reliability. Some APIs might have changed.
- No longer need to chunk update requests, and range requests can use larger chunks
- Some code improvements enabled by gspread 2.0
- Removed deprecated params and functions
[0.16.0] - 2018-03-24¶
Added¶
- Test on multiple versions using tox
- Enable travis-ci
Fixed¶
- Remove dir accidentally pushed by build
Changed¶
- Moved dev requirements into requirements_dev.txt
- Now using bumpversion for version management
- Minor updates to README
- Documentation now at Read The Docs
- Minor code changes to please flake8
- Deleted update_pypi.sh as releases are now handled by travis
[0.15.5] - 2018-03-12¶
Fixed¶
- Added dependency version limit for gspread; will remove in next version
[0.15.4] - 2018-02-13¶
Fixed¶
- README example now points to the correct URL (thanks @lionel)
- Calling parse_sheet_headers on an empty sheet doesn’t break anymore (thanks @taewookim)
Added¶
- You can now use service account credentials in the config (thanks @marcojetson)
[0.15.2] - 2017-11-18¶
Fixed¶
- Fix sheet_to_df when headers are present with no data
Changed¶
- Minimum Pandas version .20 now required
[0.15.1] - 2017-10-05¶
Fixed¶
- When there are merged cells outside the data range, an exception is no longer thrown.
- Cast keys() to a list to fix Python 3 compat
[0.15.0] - 2017-09-11¶
Changed¶
- Added
fill_value
option to df_to_sheet
Fixed¶
- Different application type credentials can be used now
- Some safeguards to prevent certain exceptions
- df_to_sheet won’t fail when categorical columns have nulls
[0.14.3] - 2017-06-22¶
Changed¶
- Force gspread sheets refresh when refreshing sheets
- Worksheet object can now be passed it to most functions with
sheet
param
[0.14.2] - 2017-06-18¶
Added¶
- Added
url
property for easy linking
Fixed¶
- Fixed retry for _retry_get_all_values
[0.14.1] - 2017-06-05¶
Changed¶
- Ensure sheet matadata is refreshed after sheet changing activitiesthrough use of a decorator
- Retry when calling
get_all_values
- More robust way to get index when a new sheet is created
[0.14.0] - 2017-05-25¶
Added¶
- Added function to freeze rows/columns to
Spread
- Added
freeze_index
andfreeze_headers
flags todf_to_sheet
Changed¶
- Don’t re-size again when using
replace=True
- Switch away from deprecated
gspread
functions - Make functions in
util
non-private
Fixed¶
- Prevent error when index > number of columns in
sheet_to_df
[0.13.0] - 2017-04-28¶
Added¶
- Added
create_spread
andcreate_sheet
params forSpread
class. This enables creating a spreadsheet or a worksheet during opening. This will require re-authenticating in order to use it
[0.12.1] - 2017-04-25¶
Changed¶
- If using multi-level headings, heading will be shifted up so the top level is not a blank string
- Some functions that don’t depend on
self
were moved intoutil.py
- The
headers
param insheet_to_df
was deprecated in favor ofheader_rows
Fixed¶
- I introduced some small bugs with the v4 api changes when a sheet is not found, they now work as expected even when a new sheet is created
- The list of sheets is now refreshed when one is deleted
[0.12.0] - 2017-03-31¶
Added¶
- Add Sheets API v4 client to
self.clientv4
Fixed¶
- Merged cells now all get the right value in
sheet_to_df
- You can now pass
replace=True
when a sheet has frozen rows/cols
[0.11.1] - 2017-03-22¶
Added¶
- Added note about
EOFError
when verifying Oauth inRodeo
Changed¶
- Add retry method for
sheet.range
to work around ‘Connection Broken’ error
Fixed¶
- Fixed clearing only rows with
clear_sheet
[0.11.0] - 2017-02-14¶
Changed¶
- Only clear up to first row in
clear_sheet
so that data filters will persist - Moved default config from
~/.google/
to~/.config/gspread_pandas
Fixed¶
- Allow passing index
0
toopen
- Fixed changelog
[0.10.1] - 2017-01-26¶
Added¶
- Added troubleshooting for
certifi
issue inREADME
Changed¶
- Only catch
SpreadsheetNotFound
exceptions when opening a spreadsheet
[0.10.0] - 2017-01-18¶
Added¶
- Added optional
create
param toopen_sheet
to create it if it doesn’t exist - Added optional
start
param todf_to_sheet
, will take tuple or address as str
Changed¶
- Improved docs, changed to
rst
- Made some variables private
- Improved
__str__
output - Switch to using exceptions from
gspread
spread
param is now required foropen
- When current sheet is deleted,
self.sheet
is set toNone
- Improved versioning, switched to Semantic Versioning
Fixed¶
- Fixed chunk calculation in Python 3
- Sheet names are case insensitive, fixed
find_sheet
Deprecated¶
- Deprecate
open_or_create_sheet
function in favor ofcreate=True
param foropen_sheet
- Deprecate
start_row
andstart_col
indf_to_sheet
in favor ofstart
param
[0.9] - 2016-12-07¶
Added¶
- Add
__repr__
and__str__
to show the active - Add user’s email as a property to Spread. I recommend deleting existing Oauth credentials and re-creating them with new permissions
- Allow importing with:
from gspread_pandas import Spread
- Added
CHANGELOG.md
Changed¶
- Restrict scope to only necessary endpoints
- Add retry for updating cells in case an error occurrs
- Minor changes to
README.md
Fixed¶
- Fixed the use of
start_row
> 1
[0.8] - 2016-11-11¶
Added¶
- Add python 3 build to
update_pypi.sh
script
Fixed¶
- Oauth flow now uses correct properties
[0.6] - 2016-10-27¶
Changed¶
- Change defaults in
sheet_to_df
to include index and header - Raise error when missing google client config file
Changed¶
- Improve decorators using
functools.wraps