| Title: | Get Spanish Origin-Destination Data |
|---|---|
| Description: | Gain seamless access to origin-destination (OD) data from the Spanish Ministry of Transport, hosted at <https://www.transportes.gob.es/ministerio/proyectos-singulares/estudios-de-movilidad-con-big-data/opendata-movilidad>. This package simplifies the management of these large datasets by providing tools to download zone boundaries, handle associated origin-destination data, and process it efficiently with the 'duckdb' database interface. Local caching minimizes repeated downloads, streamlining workflows for researchers and analysts. Methods described in Kotov et al. (2026) <doi:10.1177/23998083251415040>. Extensive documentation is available at <https://ropenspain.github.io/spanishoddata/index.html>, offering guides on creating static and dynamic mobility flow visualizations and transforming large datasets into analysis-ready formats. |
| Authors: | Egor Kotov [aut, cre] (ORCID: <https://orcid.org/0000-0001-6690-5345>), Robin Lovelace [aut] (ORCID: <https://orcid.org/0000-0001-5679-6536>), Eugeni Vidal-Tortosa [ctb] (ORCID: <https://orcid.org/0000-0001-5199-4103>) |
| Maintainer: | Egor Kotov <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.2.5 |
| Built: | 2026-06-05 19:51:42 UTC |
| Source: | https://github.com/rOpenSpain/spanishoddata |
Get a table with links to available data files for the specified data version. Optionally check (see arguments) the file size and availability of data files previously downloaded into the cache directory specified with SPANISH_OD_DATA_DIR environment variable (set by spod_set_data_dir()) or a custom path specified with data_dir argument. By default the data is fetched from Amazon S3 bucket where the data is stored. If that fails, the function falls back to downloading an XML file from the Spanish Ministry of Transport website. You can also control this behaviour with use_s3 argument.
For detailed data descriptions, see package vignettes using spod_codebook(ver = 1) and spod_codebook(ver = 2) and official methodology documents in References section.
spod_available_data( ver = 2, check_local_files = FALSE, quiet = FALSE, data_dir = spod_get_data_dir(), use_s3 = if (ver == 1) FALSE else TRUE, force = FALSE )spod_available_data( ver = 2, check_local_files = FALSE, quiet = FALSE, data_dir = spod_get_data_dir(), use_s3 = if (ver == 1) FALSE else TRUE, force = FALSE )
ver |
Integer. Can be 1 or 2. The version of the data to use. v1 spans 2020-2021, v2 covers 2022 and onwards. See more details in codebooks with |
check_local_files |
Logical. Whether to check if the local files exist and get the file size. Defaults to |
quiet |
A |
data_dir |
The directory where the data is stored. Defaults to the value returned by |
use_s3 |
|
force |
Logical. If |
A tibble with links, release dates of files in the data, dates of data coverage, local paths to files, and the download status.
character. The URL link to the data file.
POSIXct. The timestamp of when the file was published.
character. The file extension of the data file (e.g., 'tar', 'gz').
Date. The year and month of the data coverage, if available.
Date. The specific date of the data coverage, if available.
factor. Study category derived from the URL (e.g., 'basic', 'complete', 'routes').
factor. Data type category derived from the URL (e.g., 'number_of_trips', 'origin-destination', 'overnight_stays', 'data_quality', 'metadata').
factor. Temporal granularity category derived from the URL (e.g., 'day', 'month').
factor. Geographic zone classification derived from the URL (e.g., 'districts', 'municipalities', 'large_urban_areas').
character. The local file path where the data is (or going to be) stored.
logical. Indicator of whether the data file has been downloaded locally. This is only available if check_local_files is TRUE.
For the official website of the mobility study: Ministerio de Transportes y Movilidad Sostenible (MITMS) (2024). “Estudio de la movilidad con Big Data (Study of mobility with Big Data).” Data License: https://www.transportes.gob.es/el-ministerio/buen-gobierno/licencia_datos, https://www.transportes.gob.es/ministerio/proyectos-singulares/estudio-de-movilidad-con-big-data.
For v1 data methodology: Ministerio de Transportes, Movilidad y Agenda Urbana (MITMA) (2021). Análisis de la movilidad en España con tecnología Big Data durante el estado de alarma para la gestión de la crisis del COVID-19 (Analysis of mobility in Spain with Big Data technology during the state of alarm for COVID-19 crisis management). https://cdn.mitma.gob.es/portal-web-drupal/covid-19/bigdata/mitma_-_estudio_movilidad_covid-19_informe_metodologico_v3.pdf.
For v2 data methodology: Ministerio de Transportes y Movilidad Sostenible (MITMS) (2024). Estudio de movilidad de viajeros de ámbito nacional aplicando la tecnología Big Data. Informe metodológico (Study of National Traveler mobility Using Big Data Technology. Methodological Report). https://www.transportes.gob.es/recursos_mfom/paginabasica/recursos/a3_informe_metodologico_estudio_movilidad_mitms_v8.pdf.
For the spanishoddata R package: Kotov E, Vidal-Tortosa E, Cantú-Ros OG, Burrieza-Galán J, Herranz R, Gullón Muñoz-Repiso T, Lovelace R (2026).
“spanishoddata: A package for accessing and working with Spanish Open Mobility Big Data.”
Environment and Planning B: Urban Analytics and City Science.
ISSN 2399-8083.
doi:10.1177/23998083251415040.
Use spod_cite() to cite the package and the data with correct plain text, markdown, or BibTeX formats.
# Set data dir for file downloads spod_set_data_dir(tempdir()) # Get available data list for v1 (2020-2021) data spod_available_data(ver = 1) # Get available data list for v2 (2022 onwards) data spod_available_data(ver = 2) # Get available data list for v2 (2022 onwards) data # while also checking for local files that are already downloaded spod_available_data(ver = 2, check_local_files = TRUE)# Set data dir for file downloads spod_set_data_dir(tempdir()) # Get available data list for v1 (2020-2021) data spod_available_data(ver = 1) # Get available data list for v2 (2022 onwards) data spod_available_data(ver = 2) # Get available data list for v2 (2022 onwards) data # while also checking for local files that are already downloaded spod_available_data(ver = 2, check_local_files = TRUE)
WARNING: The checks may fail for May 2022 data and for some 2025 data, as the remote cheksums that are used for checking the file consistency are incorrect. We are working on solving this in future updates, for now, kindly rely on the built-in file size checks of spod_download, spod_get, and spod_convert. This function checks downloaded data files whether they are consistent with their checksums in Amazon S3 by computing ETag for each file. This involves computing MD5 for each part of the file and concatenating them and computing MD5 again on the resulting concatenated MD5s. This may take very long time if you check all files, so use with caution.
spod_check_files( type = c("od", "origin-destination", "os", "overnight_stays", "nt", "number_of_trips"), zones = c("districts", "dist", "distr", "distritos", "municipalities", "muni", "municip", "municipios", "lua", "large_urban_areas", "gau", "grandes_areas_urbanas"), dates = NULL, data_dir = spod_get_data_dir(), quiet = FALSE, ignore_missing_dates = FALSE, n_threads = 1 )spod_check_files( type = c("od", "origin-destination", "os", "overnight_stays", "nt", "number_of_trips"), zones = c("districts", "dist", "distr", "distritos", "municipalities", "muni", "municip", "municipios", "lua", "large_urban_areas", "gau", "grandes_areas_urbanas"), dates = NULL, data_dir = spod_get_data_dir(), quiet = FALSE, ignore_missing_dates = FALSE, n_threads = 1 )
type |
The type of data to download. Can be |
zones |
The zones for which to download the data. Can be |
dates |
A The possible values can be any of the following:
|
data_dir |
The directory where the data is stored. Defaults to the value returned by |
quiet |
A |
ignore_missing_dates |
Logical. If |
n_threads |
Numeric. Number of threads to use for file verificaiton. Defaults to 1. When set to 2 or more threads, uses |
A tibble similar to the output of spod_available_data, but with an extra column local_file_consistent, where TRUE indicates that the file cheksum matches the expected checksums in Amazon S3. Note: some v1 (2020-2021) files were not stored correctly on S3 and their ETag checksums are incorrectly reported by Amazon S3, so their true file sizes and ETag checksums were cached inside the spanishoddata package.
spod_set_data_dir(tempdir()) spod_download( type = "number_of_trips", zones = "distr", dates = "2020-03-14" ) # now check the consistency check_results <- spod_check_files( type = "number_of_trips", zones = "distr", dates = "2020-03-14" ) all(check_results$local_file_consistent)spod_set_data_dir(tempdir()) spod_download( type = "number_of_trips", zones = "distr", dates = "2020-03-14" ) # now check the consistency check_results <- spod_check_files( type = "number_of_trips", zones = "distr", dates = "2020-03-14" ) all(check_results$local_file_consistent)
Cite the package and the data
spod_cite(what = "all", format = "all")spod_cite(what = "all", format = "all")
what |
Character vector specifying what to cite. Can include "package", "data", "methodology_v1", "methodology_v2", or "all". Default is "all". |
format |
Character vector specifying output format(s). Can include "text", "markdown", "bibtex", or "all". Default is "all". |
Nothing. Prints citation in plain text, markdown, BibTeX, or all formats at once to console.
# Cite everything in all formats ## Not run: spod_cite() ## End(Not run) # Cite just the package in BibTeX format ## Not run: spod_cite(what = "package", format = "bibtex") ## End(Not run) # Cite both methodologies in plain text ## Not run: spod_cite(what = c("methodology_v1", "methodology_v2"), format = "text") ## End(Not run)# Cite everything in all formats ## Not run: spod_cite() ## End(Not run) # Cite just the package in BibTeX format ## Not run: spod_cite(what = "package", format = "bibtex") ## End(Not run) # Cite both methodologies in plain text ## Not run: spod_cite(what = c("methodology_v1", "methodology_v2"), format = "text") ## End(Not run)
Opens relevant vignette with a codebook for v1 (2020-2021) and v2 (2022 onwards) data or provide a webpage if vignette is missing.
spod_codebook(ver = 1)spod_codebook(ver = 1)
ver |
An |
Nothing, opens vignette if it is installed. If vignette is missing, prints a message with a link to a webpage with the codebook.
# View codebook for v1 (2020-2021) data spod_codebook(ver = 1) # View codebook for v2 (2022 onwards) data spod_codebook(ver = 2)# View codebook for v1 (2020-2021) data spod_codebook(ver = 1) # View codebook for v2 (2022 onwards) data spod_codebook(ver = 2)
DuckDB or hive-style parquet filesThis function allows the user to quickly connect to the data converted to DuckDB with the spod_convert function. This function simplifies the connection process. The user is free to use the DBI and DuckDB packages to connect to the data manually, or to use the arrow package to connect to the parquet files folder.
spod_connect( data_path, target_table_name = NULL, quiet = FALSE, max_mem_gb = NULL, max_n_cpu = max(1, parallelly::availableCores() - 1), temp_path = spod_get_temp_dir() )spod_connect( data_path, target_table_name = NULL, quiet = FALSE, max_mem_gb = NULL, max_n_cpu = max(1, parallelly::availableCores() - 1), temp_path = spod_get_temp_dir() )
data_path |
a path to the |
target_table_name |
Default is |
quiet |
A |
max_mem_gb |
|
max_n_cpu |
The maximum number of threads to use. Defaults to the number of available cores minus 1. |
temp_path |
The path to the temp folder for DuckDB for intermediate spilling in case the set memory limit and/or physical memory of the computer is too low to perform the query. By default this is set to the |
a DuckDB table connection object.
# Set data dir for file downloads spod_set_data_dir(tempdir()) # download and convert data dates_1 <- c(start = "2020-02-17", end = "2020-02-18") db_2 <- spod_convert( type = "number_of_trips", zones = "distr", dates = dates_1, overwrite = TRUE ) # now connect to the converted data my_od_data_2 <- spod_connect(db_2) # disconnect from the database spod_disconnect(my_od_data_2)# Set data dir for file downloads spod_set_data_dir(tempdir()) # download and convert data dates_1 <- c(start = "2020-02-17", end = "2020-02-18") db_2 <- spod_convert( type = "number_of_trips", zones = "distr", dates = dates_1, overwrite = TRUE ) # now connect to the converted data my_od_data_2 <- spod_connect(db_2) # disconnect from the database spod_disconnect(my_od_data_2)
Converts data for faster analysis into either DuckDB file or into parquet files in a hive-style directory structure. Running analysis on these files is sometimes 100x times faster than working with raw CSV files, espetially when these are in gzip archives. To connect to converted data, please use 'mydata <- spod_connect(data_path = path_returned_by_spod_convert)' passing the path to where the data was saved. The connected mydata can be analysed using dplyr functions such as select, filter, mutate, group_by, summarise, etc. In the end of any sequence of commands you will need to add collect to execute the whole chain of data manipulations and load the results into memory in an R data.frame/tibble. For more in-depth usage of such data, please refer to DuckDB documentation and examples at https://duckdb.org/docs/api/r#dbplyr . Some more useful examples can be found here https://arrow-user2022.netlify.app/data-wrangling#combining-arrow-with-duckdb . You may also use arrow package to work with parquet files https://arrow.apache.org/docs/r/.
For detailed data descriptions, see package vignettes using spod_codebook(ver = 1) and spod_codebook(ver = 2) and official methodology documents in References section.
spod_convert( type = c("od", "origin-destination", "os", "overnight_stays", "nt", "number_of_trips"), zones = c("districts", "dist", "distr", "distritos", "municipalities", "muni", "municip", "municipios"), dates = NULL, save_format = "duckdb", save_path = NULL, overwrite = FALSE, data_dir = spod_get_data_dir(), quiet = FALSE, max_mem_gb = NULL, max_n_cpu = max(1, parallelly::availableCores() - 1), max_download_size_gb = 1, ignore_missing_dates = FALSE )spod_convert( type = c("od", "origin-destination", "os", "overnight_stays", "nt", "number_of_trips"), zones = c("districts", "dist", "distr", "distritos", "municipalities", "muni", "municip", "municipios"), dates = NULL, save_format = "duckdb", save_path = NULL, overwrite = FALSE, data_dir = spod_get_data_dir(), quiet = FALSE, max_mem_gb = NULL, max_n_cpu = max(1, parallelly::availableCores() - 1), max_download_size_gb = 1, ignore_missing_dates = FALSE )
type |
The type of data to download. Can be |
zones |
The zones for which to download the data. Can be |
dates |
A The possible values can be any of the following:
|
save_format |
A You can also set |
save_path |
A
|
overwrite |
A |
data_dir |
The directory where the data is stored. Defaults to the value returned by |
quiet |
A |
max_mem_gb |
|
max_n_cpu |
The maximum number of threads to use. Defaults to the number of available cores minus 1. |
max_download_size_gb |
The maximum download size in gigabytes. Defaults to 1. |
ignore_missing_dates |
Logical. If |
Path to saved DuckDB database file or to a folder with parquet files in hive-style directory structure.
For the official website of the mobility study: Ministerio de Transportes y Movilidad Sostenible (MITMS) (2024). “Estudio de la movilidad con Big Data (Study of mobility with Big Data).” Data License: https://www.transportes.gob.es/el-ministerio/buen-gobierno/licencia_datos, https://www.transportes.gob.es/ministerio/proyectos-singulares/estudio-de-movilidad-con-big-data.
For v1 data methodology: Ministerio de Transportes, Movilidad y Agenda Urbana (MITMA) (2021). Análisis de la movilidad en España con tecnología Big Data durante el estado de alarma para la gestión de la crisis del COVID-19 (Analysis of mobility in Spain with Big Data technology during the state of alarm for COVID-19 crisis management). https://cdn.mitma.gob.es/portal-web-drupal/covid-19/bigdata/mitma_-_estudio_movilidad_covid-19_informe_metodologico_v3.pdf.
For v2 data methodology: Ministerio de Transportes y Movilidad Sostenible (MITMS) (2024). Estudio de movilidad de viajeros de ámbito nacional aplicando la tecnología Big Data. Informe metodológico (Study of National Traveler mobility Using Big Data Technology. Methodological Report). https://www.transportes.gob.es/recursos_mfom/paginabasica/recursos/a3_informe_metodologico_estudio_movilidad_mitms_v8.pdf.
For the spanishoddata R package: Kotov E, Vidal-Tortosa E, Cantú-Ros OG, Burrieza-Galán J, Herranz R, Gullón Muñoz-Repiso T, Lovelace R (2026).
“spanishoddata: A package for accessing and working with Spanish Open Mobility Big Data.”
Environment and Planning B: Urban Analytics and City Science.
ISSN 2399-8083.
doi:10.1177/23998083251415040.
Use spod_cite() to cite the package and the data with correct plain text, markdown, or BibTeX formats.
# Set data dir for file downloads spod_set_data_dir(tempdir()) # download and convert data dates_1 <- c(start = "2020-02-17", end = "2020-02-18") db_2 <- spod_convert( type = "number_of_trips", zones = "distr", dates = dates_1, overwrite = TRUE ) # now connect to the converted data my_od_data_2 <- spod_connect(db_2) # disconnect from the database spod_disconnect(my_od_data_2)# Set data dir for file downloads spod_set_data_dir(tempdir()) # download and convert data dates_1 <- c(start = "2020-02-17", end = "2020-02-18") db_2 <- spod_convert( type = "number_of_trips", zones = "distr", dates = dates_1, overwrite = TRUE ) # now connect to the converted data my_od_data_2 <- spod_connect(db_2) # disconnect from the database spod_disconnect(my_od_data_2)
This function is to ensure that DuckDB connections to CSV.gz files (created via spod_get()), as well as to DuckDB files or folders of parquet files (created via spod_convert()) are closed properly to prevent conflicting connections. Essentially this is just a wrapper around DBI::dbDisconnect() that reaches out into the .$src$con object of the tbl_duckdb_connection connection object that is returned to the user via spod_get() and spod_connect(). After disonnecting the database, it also frees up memory by running gc().
spod_disconnect(tbl_con, free_mem = TRUE)spod_disconnect(tbl_con, free_mem = TRUE)
tbl_con |
A |
free_mem |
A |
No return value, called for side effect of disconnecting from the database and freeing up memory.
# Set data dir for file downloads spod_set_data_dir(tempdir()) # basic example # create a connection to the v1 data without converting # this creates a duckdb database connection to CSV files od_distr <- spod_get( "od", zones = "distr", dates = c("2020-03-01", "2020-03-02") ) # disconnect from the database connection spod_disconnect(od_distr) # Advanced example # download and convert data dates_1 <- c(start = "2020-02-17", end = "2020-02-19") db_2 <- spod_convert( type = "od", zones = "distr", dates = dates_1, overwrite = TRUE ) # now connect to the converted data my_od_data_2 <- spod_connect(db_2) # disconnect from the database spod_disconnect(my_od_data_2)# Set data dir for file downloads spod_set_data_dir(tempdir()) # basic example # create a connection to the v1 data without converting # this creates a duckdb database connection to CSV files od_distr <- spod_get( "od", zones = "distr", dates = c("2020-03-01", "2020-03-02") ) # disconnect from the database connection spod_disconnect(od_distr) # Advanced example # download and convert data dates_1 <- c(start = "2020-02-17", end = "2020-02-19") db_2 <- spod_convert( type = "od", zones = "distr", dates = dates_1, overwrite = TRUE ) # now connect to the converted data my_od_data_2 <- spod_connect(db_2) # disconnect from the database spod_disconnect(my_od_data_2)
This function downloads the data files of the specified type, zones, dates and data version.
For detailed data descriptions, see package vignettes using spod_codebook(ver = 1) and spod_codebook(ver = 2) and official methodology documents in References section.
spod_download( type = c("od", "origin-destination", "os", "overnight_stays", "nt", "number_of_trips"), zones = c("districts", "dist", "distr", "distritos", "municipalities", "muni", "municip", "municipios", "lua", "large_urban_areas", "gau", "grandes_areas_urbanas"), dates = NULL, max_download_size_gb = 1, data_dir = spod_get_data_dir(), quiet = FALSE, return_local_file_paths = FALSE, ignore_missing_dates = FALSE, check_local_files = TRUE )spod_download( type = c("od", "origin-destination", "os", "overnight_stays", "nt", "number_of_trips"), zones = c("districts", "dist", "distr", "distritos", "municipalities", "muni", "municip", "municipios", "lua", "large_urban_areas", "gau", "grandes_areas_urbanas"), dates = NULL, max_download_size_gb = 1, data_dir = spod_get_data_dir(), quiet = FALSE, return_local_file_paths = FALSE, ignore_missing_dates = FALSE, check_local_files = TRUE )
type |
The type of data to download. Can be |
zones |
The zones for which to download the data. Can be |
dates |
A The possible values can be any of the following:
|
max_download_size_gb |
The maximum download size in gigabytes. Defaults to 1. |
data_dir |
The directory where the data is stored. Defaults to the value returned by |
quiet |
A |
return_local_file_paths |
Logical. If |
ignore_missing_dates |
Logical. If |
check_local_files |
Logical. Whether to check the file size of local files against known remote file sizes on the Amazon S3 storage. Defaults to |
Download the data files of specified type, zones, and dates
Nothing. If return_local_file_paths = TRUE, a character vector of the paths to the downloaded files.
For the official website of the mobility study: Ministerio de Transportes y Movilidad Sostenible (MITMS) (2024). “Estudio de la movilidad con Big Data (Study of mobility with Big Data).” Data License: https://www.transportes.gob.es/el-ministerio/buen-gobierno/licencia_datos, https://www.transportes.gob.es/ministerio/proyectos-singulares/estudio-de-movilidad-con-big-data.
For v1 data methodology: Ministerio de Transportes, Movilidad y Agenda Urbana (MITMA) (2021). Análisis de la movilidad en España con tecnología Big Data durante el estado de alarma para la gestión de la crisis del COVID-19 (Analysis of mobility in Spain with Big Data technology during the state of alarm for COVID-19 crisis management). https://cdn.mitma.gob.es/portal-web-drupal/covid-19/bigdata/mitma_-_estudio_movilidad_covid-19_informe_metodologico_v3.pdf.
For v2 data methodology: Ministerio de Transportes y Movilidad Sostenible (MITMS) (2024). Estudio de movilidad de viajeros de ámbito nacional aplicando la tecnología Big Data. Informe metodológico (Study of National Traveler mobility Using Big Data Technology. Methodological Report). https://www.transportes.gob.es/recursos_mfom/paginabasica/recursos/a3_informe_metodologico_estudio_movilidad_mitms_v8.pdf.
For the spanishoddata R package: Kotov E, Vidal-Tortosa E, Cantú-Ros OG, Burrieza-Galán J, Herranz R, Gullón Muñoz-Repiso T, Lovelace R (2026).
“spanishoddata: A package for accessing and working with Spanish Open Mobility Big Data.”
Environment and Planning B: Urban Analytics and City Science.
ISSN 2399-8083.
doi:10.1177/23998083251415040.
Use spod_cite() to cite the package and the data with correct plain text, markdown, or BibTeX formats.
# Set data dir for file downloads spod_set_data_dir(tempdir()) # Download the number of trips on district level for the a date range in March 2020 spod_download( type = "number_of_trips", zones = "districts", dates = c(start = "2020-03-20", end = "2020-03-21") ) # Download the number of trips on district level for select dates in 2020 and 2021 spod_download( type = "number_of_trips", zones = "dist", dates = c("2020-03-20", "2020-03-24", "2021-03-20", "2021-03-24") ) # Download the number of trips on municipality level using regex for a date range in March 2020 # (the regex will capture the dates 2020-03-20 to 2020-03-24) spod_download( type = "number_of_trips", zones = "municip", dates = "2020032[0-4]" )# Set data dir for file downloads spod_set_data_dir(tempdir()) # Download the number of trips on district level for the a date range in March 2020 spod_download( type = "number_of_trips", zones = "districts", dates = c(start = "2020-03-20", end = "2020-03-21") ) # Download the number of trips on district level for select dates in 2020 and 2021 spod_download( type = "number_of_trips", zones = "dist", dates = c("2020-03-20", "2020-03-24", "2021-03-20", "2021-03-24") ) # Download the number of trips on municipality level using regex for a date range in March 2020 # (the regex will capture the dates 2020-03-20 to 2020-03-24) spod_download( type = "number_of_trips", zones = "municip", dates = "2020032[0-4]" )
This function creates a DuckDB lazy table connection object from the specified type and zones. It checks for missing data and downloads it if necessary. The connnection is made to the raw CSV files in gzip archives, so analysing the data through this connection may be slow if you select more than a few days. You can manipulate this object using dplyr functions such as select, filter, mutate, group_by, summarise, etc. In the end of any sequence of commands you will need to add collect to execute the whole chain of data manipulations and load the results into memory in an R data.frame/tibble. See codebooks for v1 and v2 data in vignettes with spod_codebook(1) and spod_codebook(2).
If you want to analyse longer periods of time (especiially several months or even the whole data over several years), consider using the spod_convert and then spod_connect.
If you want to quickly get the origin-destination data with flows aggregated for a single day at municipal level and without any extra socio-economic variables, consider using the spod_quick_get_od function.
For detailed data descriptions, see package vignettes using spod_codebook(ver = 1) and spod_codebook(ver = 2) and official methodology documents in References section.
spod_get( type = c("od", "origin-destination", "os", "overnight_stays", "nt", "number_of_trips"), zones = c("districts", "dist", "distr", "distritos", "municipalities", "muni", "municip", "municipios", "lua", "large_urban_areas", "gau", "grandes_areas_urbanas"), dates = NULL, data_dir = spod_get_data_dir(), quiet = FALSE, max_mem_gb = NULL, max_n_cpu = max(1, parallelly::availableCores() - 1), max_download_size_gb = 1, duckdb_target = ":memory:", temp_path = spod_get_temp_dir(), ignore_missing_dates = FALSE )spod_get( type = c("od", "origin-destination", "os", "overnight_stays", "nt", "number_of_trips"), zones = c("districts", "dist", "distr", "distritos", "municipalities", "muni", "municip", "municipios", "lua", "large_urban_areas", "gau", "grandes_areas_urbanas"), dates = NULL, data_dir = spod_get_data_dir(), quiet = FALSE, max_mem_gb = NULL, max_n_cpu = max(1, parallelly::availableCores() - 1), max_download_size_gb = 1, duckdb_target = ":memory:", temp_path = spod_get_temp_dir(), ignore_missing_dates = FALSE )
type |
The type of data to download. Can be |
zones |
The zones for which to download the data. Can be |
dates |
A The possible values can be any of the following:
|
data_dir |
The directory where the data is stored. Defaults to the value returned by |
quiet |
A |
max_mem_gb |
|
max_n_cpu |
The maximum number of threads to use. Defaults to the number of available cores minus 1. |
max_download_size_gb |
The maximum download size in gigabytes. Defaults to 1. |
duckdb_target |
(Optional) The path to the duckdb file to save the data to, if a convertation from CSV is reuqested by the |
temp_path |
The path to the temp folder for DuckDB for intermediate spilling in case the set memory limit and/or physical memory of the computer is too low to perform the query. By default this is set to the |
ignore_missing_dates |
Logical. If |
A DuckDB lazy table connection object of class tbl_duckdb_connection.
For the official website of the mobility study: Ministerio de Transportes y Movilidad Sostenible (MITMS) (2024). “Estudio de la movilidad con Big Data (Study of mobility with Big Data).” Data License: https://www.transportes.gob.es/el-ministerio/buen-gobierno/licencia_datos, https://www.transportes.gob.es/ministerio/proyectos-singulares/estudio-de-movilidad-con-big-data.
For v1 data methodology: Ministerio de Transportes, Movilidad y Agenda Urbana (MITMA) (2021). Análisis de la movilidad en España con tecnología Big Data durante el estado de alarma para la gestión de la crisis del COVID-19 (Analysis of mobility in Spain with Big Data technology during the state of alarm for COVID-19 crisis management). https://cdn.mitma.gob.es/portal-web-drupal/covid-19/bigdata/mitma_-_estudio_movilidad_covid-19_informe_metodologico_v3.pdf.
For v2 data methodology: Ministerio de Transportes y Movilidad Sostenible (MITMS) (2024). Estudio de movilidad de viajeros de ámbito nacional aplicando la tecnología Big Data. Informe metodológico (Study of National Traveler mobility Using Big Data Technology. Methodological Report). https://www.transportes.gob.es/recursos_mfom/paginabasica/recursos/a3_informe_metodologico_estudio_movilidad_mitms_v8.pdf.
For the spanishoddata R package: Kotov E, Vidal-Tortosa E, Cantú-Ros OG, Burrieza-Galán J, Herranz R, Gullón Muñoz-Repiso T, Lovelace R (2026).
“spanishoddata: A package for accessing and working with Spanish Open Mobility Big Data.”
Environment and Planning B: Urban Analytics and City Science.
ISSN 2399-8083.
doi:10.1177/23998083251415040.
Use spod_cite() to cite the package and the data with correct plain text, markdown, or BibTeX formats.
# create a connection to the v1 data spod_set_data_dir(tempdir()) dates <- c("2020-02-14", "2020-03-14", "2021-02-14", "2021-02-14", "2021-02-15") nt_dist <- spod_get(type = "number_of_trips", zones = "distr", dates = dates) # nt_dist is a table view filtered to the specified dates # for advanced users only # access the source connection with all dates # list tables DBI::dbListTables(nt_dist$src$con) # disconnect spod_disconnect(nt_dist)# create a connection to the v1 data spod_set_data_dir(tempdir()) dates <- c("2020-02-14", "2020-03-14", "2021-02-14", "2021-02-14", "2021-02-15") nt_dist <- spod_get(type = "number_of_trips", zones = "distr", dates = dates) # nt_dist is a table view filtered to the specified dates # for advanced users only # access the source connection with all dates # list tables DBI::dbListTables(nt_dist$src$con) # disconnect spod_disconnect(nt_dist)
This function retrieves the data directory from the environment variable SPANISH_OD_DATA_DIR (and previously set by spod_set_data_dir()).
If the environment variable is not set, it returns the temporary directory.
spod_get_data_dir(quiet = FALSE)spod_get_data_dir(quiet = FALSE)
quiet |
A |
A character vector of length 1 containing the path to the data directory where the package will download and convert the data.
spod_set_data_dir(tempdir()) spod_get_data_dir()spod_set_data_dir(tempdir()) spod_get_data_dir()
Get all metadata for requested data version and identify all dates available for download.
spod_get_valid_dates(ver = NULL)spod_get_valid_dates(ver = NULL)
ver |
Integer. Can be 1 or 2. The version of the data to use. v1 spans 2020-2021, v2 covers 2022 and onwards. See more details in codebooks with |
A vector of type Date with all possible valid dates for the specified data version (v1 for 2020-2021 and v2 for 2020 onwards).
# Get all valid dates for v1 (2020-2021) data spod_get_valid_dates(ver = 1) # Get all valid dates for v2 (2020 onwards) data spod_get_valid_dates(ver = 2)# Get all valid dates for v1 (2020-2021) data spod_get_valid_dates(ver = 1) # Get all valid dates for v2 (2020 onwards) data spod_get_valid_dates(ver = 2)
Get spatial zones for the specified data version. Supports both v1 (2020-2021) and v2 (2022 onwards) data.
For detailed data descriptions, see package vignettes using spod_codebook(ver = 1) and spod_codebook(ver = 2) and official methodology documents in References section.
spod_get_zones( zones = c("districts", "dist", "distr", "distritos", "municipalities", "muni", "municip", "municipios", "lua", "large_urban_areas", "gau", "grandes_areas_urbanas"), ver = NULL, data_dir = spod_get_data_dir(), quiet = FALSE )spod_get_zones( zones = c("districts", "dist", "distr", "distritos", "municipalities", "muni", "municip", "municipios", "lua", "large_urban_areas", "gau", "grandes_areas_urbanas"), ver = NULL, data_dir = spod_get_data_dir(), quiet = FALSE )
zones |
The zones for which to download the data. Can be |
ver |
Integer. Can be 1 or 2. The version of the data to use. v1 spans 2020-2021, v2 covers 2022 and onwards. See more details in codebooks with |
data_dir |
The directory where the data is stored. Defaults to the value returned by |
quiet |
A |
An sf object (Simple Feature collection).
The columns for v1 (2020-2021) data include:
A character vector containing the unique identifier for each district, assigned by the data provider. This id matches the id_origin, id_destination, and id in district-level origin-destination and number of trips data.
A string with semicolon-separated identifiers of census districts classified by the Spanish Statistical Office (INE) that are spatially bound within the polygons for each id.
A string with semicolon-separated municipality identifiers (as assigned by the data provider) corresponding to each district id.
A string with semicolon-separated municipality identifiers classified by the Spanish Statistical Office (INE) corresponding to each id.
A string with semicolon-separated district names (from the v2 version of this data) corresponding to each district id in v1.
A string with semicolon-separated district identifiers (from the v2 version of this data) corresponding to each district id in v1.
A MULTIPOLYGON column containing the spatial geometry of each district, stored as an sf object. The geometry is projected in the ETRS89 / UTM zone 30N coordinate reference system (CRS), with XY dimensions.
The columns for v2 (2022 onwards) data include:
A character vector containing the unique identifier for each zone, assigned by the data provider.
A character vector with the name of each district.
A numeric vector representing the population of each district (as of 2022).
A string with semicolon-separated identifiers of census sections corresponding to each district.
A string with semicolon-separated identifiers of census districts as classified by the Spanish Statistical Office (INE) corresponding to each district.
A string with semicolon-separated identifiers of municipalities classified by the Spanish Statistical Office (INE) corresponding to each district.
A string with semicolon-separated identifiers of municipalities, as assigned by the data provider, that correspond to each district.
A string with semicolon-separated identifiers of LUAs (Local Urban Areas) from the provider, associated with each district.
A string with semicolon-separated district identifiers from v1 data corresponding to each district in v2. If no match exists, it is marked as NA.
A MULTIPOLYGON column containing the spatial geometry of each district, stored as an sf object. The geometry is projected in the ETRS89 / UTM zone 30N coordinate reference system (CRS), with XY dimensions.
For the official website of the mobility study: Ministerio de Transportes y Movilidad Sostenible (MITMS) (2024). “Estudio de la movilidad con Big Data (Study of mobility with Big Data).” Data License: https://www.transportes.gob.es/el-ministerio/buen-gobierno/licencia_datos, https://www.transportes.gob.es/ministerio/proyectos-singulares/estudio-de-movilidad-con-big-data.
For v1 data methodology: Ministerio de Transportes, Movilidad y Agenda Urbana (MITMA) (2021). Análisis de la movilidad en España con tecnología Big Data durante el estado de alarma para la gestión de la crisis del COVID-19 (Analysis of mobility in Spain with Big Data technology during the state of alarm for COVID-19 crisis management). https://cdn.mitma.gob.es/portal-web-drupal/covid-19/bigdata/mitma_-_estudio_movilidad_covid-19_informe_metodologico_v3.pdf.
For v2 data methodology: Ministerio de Transportes y Movilidad Sostenible (MITMS) (2024). Estudio de movilidad de viajeros de ámbito nacional aplicando la tecnología Big Data. Informe metodológico (Study of National Traveler mobility Using Big Data Technology. Methodological Report). https://www.transportes.gob.es/recursos_mfom/paginabasica/recursos/a3_informe_metodologico_estudio_movilidad_mitms_v8.pdf.
For the spanishoddata R package: Kotov E, Vidal-Tortosa E, Cantú-Ros OG, Burrieza-Galán J, Herranz R, Gullón Muñoz-Repiso T, Lovelace R (2026).
“spanishoddata: A package for accessing and working with Spanish Open Mobility Big Data.”
Environment and Planning B: Urban Analytics and City Science.
ISSN 2399-8083.
doi:10.1177/23998083251415040.
Use spod_cite() to cite the package and the data with correct plain text, markdown, or BibTeX formats.
# get polygons for municipalities for the v2 data municip_v2 <- spod_get_zones(zones = "municipalities", ver = 2) # get polygons for the districts for the v1 data distr_v1 <- spod_get_zones(zones = "districts", ver = 1)# get polygons for municipalities for the v2 data municip_v2 <- spod_get_zones(zones = "municipalities", ver = 2) # get polygons for the districts for the v1 data distr_v1 <- spod_get_zones(zones = "districts", ver = 1)
WARNING: this function may stop working at any time, as the API may change. This function provides a quick way to get daily aggregated (no hourly data) trip counts per origin-destination municipality from v2 data (2022 onward). Compared to spod_get(), which downloads large CSV files, this function downloads the data directly from the GraphQL API. An interactive web map with this data is available at https://mapas-movilidad.transportes.gob.es/. No data aggregation is performed on your computer (unlike in spod_get()), so you do not need to worry about memory usage and do not have to use a powerful computer with multiple CPU cores just to get this simple data. Only about 1 MB of data is downloaded for a single day. The limitation of this function is that it can only retrieve data for a single day at a time and only with total number of trips and total km travelled. So it is not possible to get any of the extra variables available in the full dataset via spod_get().
For detailed data descriptions, see package vignettes using spod_codebook(ver = 1) and spod_codebook(ver = 2) and official methodology documents in References section.
spod_quick_get_od( date = NA, min_trips = 100, distances = c("500m-2km", "2-10km", "10-50km", "50+km"), id_origin = NA, id_destination = NA )spod_quick_get_od( date = NA, min_trips = 100, distances = c("500m-2km", "2-10km", "10-50km", "50+km"), id_origin = NA, id_destination = NA )
date |
A character or Date object specifying the date for which to retrieve the data. If date is a character, the date must be in "YYYY-MM-DD" or "YYYYMMDD" format. |
min_trips |
A numeric value specifying the minimum number of journeys per origin-destination pair to retrieve. Defaults to 100 to reduce the amount of data returned. Can be set to 0 to retrieve all data. |
distances |
A character vector specifying the distances to retrieve. Valid values are "500m-2km", "2-10km", "10-50km", and "50+km". Defaults to |
id_origin |
A character vector specifying the origin municipalities to retrieve. If not provided, all origin municipalities will be included. Valid municipality IDs can be found in the dataset returned by |
id_destination |
A character vector specifying the target municipalities to retrieve. If not provided, all target municipalities will be included. Valid municipality IDs can be found in the dataset returned by |
A tibble containing the flows for the specified date, minimum number of journeys, distances and origin-destination pairs if specified. The columns are:
The date of the trips.
The origin municipality ID.
The target municipality ID.
The number of trips between the origin and target municipality.
The total length of trips in kilometers.
For the official website of the mobility study: Ministerio de Transportes y Movilidad Sostenible (MITMS) (2024). “Estudio de la movilidad con Big Data (Study of mobility with Big Data).” Data License: https://www.transportes.gob.es/el-ministerio/buen-gobierno/licencia_datos, https://www.transportes.gob.es/ministerio/proyectos-singulares/estudio-de-movilidad-con-big-data.
For v1 data methodology: Ministerio de Transportes, Movilidad y Agenda Urbana (MITMA) (2021). Análisis de la movilidad en España con tecnología Big Data durante el estado de alarma para la gestión de la crisis del COVID-19 (Analysis of mobility in Spain with Big Data technology during the state of alarm for COVID-19 crisis management). https://cdn.mitma.gob.es/portal-web-drupal/covid-19/bigdata/mitma_-_estudio_movilidad_covid-19_informe_metodologico_v3.pdf.
For v2 data methodology: Ministerio de Transportes y Movilidad Sostenible (MITMS) (2024). Estudio de movilidad de viajeros de ámbito nacional aplicando la tecnología Big Data. Informe metodológico (Study of National Traveler mobility Using Big Data Technology. Methodological Report). https://www.transportes.gob.es/recursos_mfom/paginabasica/recursos/a3_informe_metodologico_estudio_movilidad_mitms_v8.pdf.
For the spanishoddata R package: Kotov E, Vidal-Tortosa E, Cantú-Ros OG, Burrieza-Galán J, Herranz R, Gullón Muñoz-Repiso T, Lovelace R (2026).
“spanishoddata: A package for accessing and working with Spanish Open Mobility Big Data.”
Environment and Planning B: Urban Analytics and City Science.
ISSN 2399-8083.
doi:10.1177/23998083251415040.
Use spod_cite() to cite the package and the data with correct plain text, markdown, or BibTeX formats.
od_1000 <- spod_quick_get_od( date = "2022-01-01", min_trips = 1000 )od_1000 <- spod_quick_get_od( date = "2022-01-01", min_trips = 1000 )
This function fetches the municipalities (for now this is the only option) geometries from the mapas-movilidad website and returns a sf object with the municipalities geometries. This is intended for use with the flows data retrieved by the spod_quick_get_od() function. An interactive web map with this data is available at https://mapas-movilidad.transportes.gob.es/. These municipality geometries only include Spanish municipalities (and not the NUTS3 regions in Portugal and France) and do not contain extra columns that you can get with the spod_get_zones() function. The function caches the retrieved geometries in memory of the current R session to reduce the number of requests to the mapas-movilidad website.
For detailed zone definitions and methodology, see Ministerio de Transportes, Movilidad y Agenda Urbana (MITMA) (2021). Análisis de la movilidad en España con tecnología Big Data durante el estado de alarma para la gestión de la crisis del COVID-19 (Analysis of mobility in Spain with Big Data technology during the state of alarm for COVID-19 crisis management). https://cdn.mitma.gob.es/portal-web-drupal/covid-19/bigdata/mitma_-_estudio_movilidad_covid-19_informe_metodologico_v3.pdf. for v1 data and Ministerio de Transportes y Movilidad Sostenible (MITMS) (2024). Estudio de movilidad de viajeros de ámbito nacional aplicando la tecnología Big Data. Informe metodológico (Study of National Traveler mobility Using Big Data Technology. Methodological Report). https://www.transportes.gob.es/recursos_mfom/paginabasica/recursos/a3_informe_metodologico_estudio_movilidad_mitms_v8.pdf. for v2 data.
spod_quick_get_zones(zones = "municipalities")spod_quick_get_zones(zones = "municipalities")
zones |
A character string specifying the zones to retrieve. Valid values are "municipalities", "muni", "municip", and "municipios". Defaults to "municipalities". |
A sf object with the municipalities geometries to match with the data retrieved with spod_quick_get_od().
Ministerio de Transportes y Movilidad Sostenible (MITMS) (2024). “Estudio de la movilidad con Big Data (Study of mobility with Big Data).” Data License: https://www.transportes.gob.es/el-ministerio/buen-gobierno/licencia_datos, https://www.transportes.gob.es/ministerio/proyectos-singulares/estudio-de-movilidad-con-big-data.
municipalities_sf <- spod_quick_get_zones()municipalities_sf <- spod_quick_get_zones()
This function sets the data directory in the environment variable SPANISH_OD_DATA_DIR, so that all other functions in the package can access the data. It also creates the directory if it doesn't exist.
spod_set_data_dir(data_dir, quiet = FALSE)spod_set_data_dir(data_dir, quiet = FALSE)
data_dir |
The data directory to set. |
quiet |
A |
Nothing. If quiet is FALSE, prints a message with the path and confirmation that the path exists.
spod_set_data_dir(tempdir())spod_set_data_dir(tempdir())