--- title: "Use ecocomDP Data" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Use ecocomDP Data} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") options(tibble.print_min = 4L, tibble.print_max = 4L) ``` A set of "use" functions help you search, read, manipulate, plot, and save ecocomDP data. ```{r, eval=FALSE, setup} library(ecocomDP) ``` ```{r include=FALSE} library(ecocomDP) ``` ## Search Find data of interest by searching on one or more query parameters. All available datasets are returned with an empty call to `search_data()`: ```{r eval=FALSE} search_data() #> # A tibble: 82 x 11 #> source id title description abstract years sampling_interv~ sites url #> #> 1 EDI edi.3~ Fores~ "This da~ 8 0.01 https~ #> 2 EDI edi.2~ Zoopl~ "This da~ 9 0.19 https~ #> 3 EDI edi.3~ Long ~ "This da~ 21 0.02 https~ #> 4 EDI edi.3~ Dune ~ "This da~ 7 0.15 https~ #> # ... with 78 more rows, and 2 more variables: source_id , #> # source_id_url ``` The "taxa" parameter enables searching on any taxonomic rank value. Find datasets about insects: ```{r eval=FALSE} search_data(taxa = "Insecta") #> # A tibble: 13 x 11 #> source id title description abstract years sampling_interv~ sites url #> #> 1 EDI edi.2~ Moss ~ "This da~ 23 821.61 https~ #> 2 EDI edi.2~ Bonan~ "This da~ 38 0.41 https~ #> 3 EDI edi.3~ Aphid~ "This da~ 4 0 https~ #> 4 EDI edi.3~ CGR02~ "This da~ 30 0.06 https~ #> # ... with 9 more rows, and 2 more variables: source_id , #> # source_id_url ``` The "num_years" parameter lets you find datasets within a range of sampling years. Datasets with 20 to 40 years of sampling: ```{r eval=FALSE} search_data(num_years = c(20, 40)) #> # A tibble: 22 x 11 #> source id title description abstract years sampling_interv~ sites url #> #> 1 EDI edi.3~ Long ~ NA "This da~ 21 0.02 https~ #> 2 EDI edi.2~ Moss ~ NA "This da~ 23 822. https~ #> 3 EDI edi.3~ Annua~ NA "This da~ 20 0 https~ #> 4 EDI edi.3~ SGS-L~ NA "This da~ 26 0.05 https~ #> # ... with 18 more rows, and 2 more variables: source_id , #> # source_id_url ``` The "area" parameter facilitates searches by geographic regions. Datasets originating from the Northeast United States: ```{r eval=FALSE} search_data(area = c(47, -70, 38, -90)) #> # A tibble: 55 x 11 #> source id title description abstract years sampling_interv~ sites url #> #> 1 EDI edi.3~ Fores~ "This da~ 8 0.01 https~ #> 2 EDI edi.3~ Long ~ "This da~ 21 0.02 https~ #> 3 EDI edi.3~ Dune ~ "This da~ 7 0.15 https~ #> 4 EDI edi.3~ SGS-L~ "This da~ 7 0 https~ #> # ... with 51 more rows, and 2 more variables: source_id , #> # source_id_url ``` Combine query parameters to narrow results: ```{r eval=FALSE} search_data(taxa = "Insecta", num_years = c(20, 100), area = c(47, -70, 38, -90)) #> # A tibble: 4 x 11 #> source id title description abstract years sampling_interv~ sites url #> #> 1 EDI edi.3~ CGR02~ NA "This da~ 30 0.06 https~ #> 2 EDI edi.2~ North~ NA "This da~ 36 10.5 https~ #> 3 EDI edi.2~ North~ NA "This da~ 35 10.8 https~ #> 4 EDI edi.2~ North~ NA "This da~ 36 10.5 https~ #> # ... with 2 more variables: source_id , source_id_url ``` Assign results to a variable and learn more: ```{r eval=FALSE} r <- search_data(taxa = "Insecta", num_years = c(20, 100), area = c(47, -70, 38, -90)) r$title #> [1] "CGR02 Sweep Sampling of Grasshoppers on Konza Prairie LTER watersheds (Reformatted to ecocomDP Design Pattern)" #> [2] "North Temperate Lakes LTER: Pelagic Macroinvertebrate Summary 1983 - current (Reformatted to ecocomDP Design Pattern)" #> [3] "North Temperate Lakes LTER: Benthic Macroinvertebrates 1981 - current (Reformatted to ecocomDP Design Pattern)" #> [4] "North Temperate Lakes LTER: Pelagic Macroinvertebrate Abundance 1983 - current (Reformatted to ecocomDP Design Pattern)" ``` See `?search_data` for additional query parameters. ## Read Read data from host APIs to get the newest authoritative version. Read from the host API: ```{r eval=FALSE} dataset_1 <- read_data("edi.193.5") #> Reading edi.193.5 #> [0%] Downloaded 0 bytes... #> [0%] Downloaded 0 bytes... #> [0%] Downloaded 0 bytes... #> [0%] Downloaded 0 bytes... #> [0%] Downloaded 0 bytes... #> [0%] Downloaded 0 bytes... #> [0%] Downloaded 0 bytes... #> [0%] Downloaded 0 bytes... #> #> Validating edi.193.5: #> Required tables #> Column names #> Required columns #> Column classes #> Datetime formats #> Primary keys #> Composite keys #> Referential integrity #> Latitude and longitude format #> Latitude and longitude range #> Elevation #> variable_mapping ``` Read from the host API with filters when datasets are large (currently only for NEON datasets): ```{r eval=FALSE} dataset_2 <- read_data( id = "neon.ecocomdp.20120.001.001", site = c("COMO", "LECO", "SUGG"), startdate = "2017-06", enddate = "2019-09", check.size = FALSE) #> Finding available files #> |==================================================================| 100% #> #> Downloading files totaling approximately 1.588594 MB #> Downloading 20 files #> |====================================================================| 100% #> #> Unpacking zip files using 1 cores. #> |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s #> Stacking operation across a single core. #> Stacking table inv_fieldData #> |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s #> Stacking table inv_persample #> |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s #> Stacking table inv_pervial #> |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s #> Stacking table inv_taxonomyProcessed #> |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=02s #> Stacking table inv_taxonomyRaw #> |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=02s #> Copied the most recent publication of validation file to /stackedFiles #> Copied the most recent publication of categoricalCodes file to /stackedFiles #> Copied the most recent publication of variable definition file to /stackedFiles #> Finished: Stacked 5 data tables and 3 metadata tables! #> Stacking took 4.732454 secs #> Joining, by = c("uid", "sampleID") #> Joining, by = "sampleID" #> #> Validating neon.ecocomdp.20120.001.001: #> Required tables #> Column names #> Required columns #> Column classes #> Datetime formats #> Primary keys #> Composite keys #> Referential integrity #> Latitude and longitude format #> Latitude and longitude range #> Elevation #> variable_mapping ``` A data are returned as a list of metadata, tables, and validation issues (if there are any). The dataset identifier is at the top of this list. ```{r include=FALSE} dataset_1 <- ants_L1 ``` ```{r} str(dataset_1) ``` ## Manipulate Working with a "flattened" dataset simplifies common tasks (select, filter, arrange, group, summarize). A "flat" version is where all tables have been joined and spread wide except for the core observation variables, which remain in long form. ```{r} flat <- flatten_data(dataset_1) flat ``` ## Plot Visually explore data with the plotting functions: ```{r fig.height=4, fig.width=7} plot_taxa_accum_time(flat) plot_taxa_diversity(flat) plot_taxa_diversity(flat, time_window_size = "month") plot_taxa_diversity(flat, time_window_size = "year") plot_sample_space_time(flat) plot_taxa_shared_sites(flat) plot_taxa_rank(flat, facet_var = "location_id") plot_taxa_occur_freq( data = flat, facet_var = "location_id", color_var = "taxon_rank") plot_taxa_abund( data = flat, facet_var = "location_id", color_var = "taxon_rank", trans = "log10") plot_sites(flat) ``` ## Save Save local copies and read them back in later. Save a local copy as .rds: ```{r include=FALSE} datasets <- c(ants_L1, ants_L1) ``` ```{r, eval=FALSE} datasets <- list(dataset_1, dataset_2) mypath <- paste0(tempdir(), "/data") dir.create(mypath) save_data(datasets, mypath) ``` Read a local copy from .rds ```{r, eval=FALSE} datasets <- read_data(from = paste0(mypath, "/datasets.rds")) #> Validating edi.193.5: #> Required tables #> Column names #> Required columns #> Column classes #> Datetime formats #> Primary keys #> Composite keys #> Referential integrity #> Latitude and longitude format #> Latitude and longitude range #> Elevation #> variable_mapping #> Validating neon.ecocomdp.20120.001.001: #> Required tables #> Column names #> Required columns #> Column classes #> Datetime formats #> Primary keys #> Composite keys #> Referential integrity #> Latitude and longitude format #> Latitude and longitude range #> Elevation #> variable_mapping ``` Save a local copy as .csv: ```{r, eval=FALSE} save_data(datasets, mypath, type = ".csv") ``` Read a local copy from .csv: ```{r, eval=FALSE} datasets <- read_data(from = mypath) #> Validating edi.193.5: #> Required tables #> Column names #> Required columns #> Column classes #> Datetime formats #> Primary keys #> Composite keys #> Referential integrity #> Latitude and longitude format #> Latitude and longitude range #> Elevation #> variable_mapping #> Validating neon.ecocomdp.20120.001.001: #> Required tables #> Column names #> Required columns #> Column classes #> Datetime formats #> Primary keys #> Composite keys #> Referential integrity #> Latitude and longitude format #> Latitude and longitude range #> Elevation #> variable_mapping ```