Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Handling of Query Limit Issues #13

Open
Nova-Scotia opened this issue Jan 27, 2020 · 6 comments
Open

Improve Handling of Query Limit Issues #13

Nova-Scotia opened this issue Jan 27, 2020 · 6 comments
Assignees
Labels
bug todo Priority features to implement

Comments

@Nova-Scotia
Copy link

I just updated kiwisR to take advantage of some newer functionality (ability to query other fields with ki_timeseries_values - thanks for adding this)!

Previously, I could run this code without errors:

library(tidyverse)
library(kiwisR)

# List all stations in swmc hub that match "SNOW-WLF" 
mystations <- ki_station_list(hub = myhub) %>% 
  filter(str_sub(station_no, 1, 8) == "SNOW-WLF" &
           !station_no %in% c("SNOW-WLF-XX", "SNOW-WLF-TEMP"))

# List all codes that connect the station codes with their available time series.
my_station_ids <- unique(mystations$station_id)

# List all timeseries associated with each code.
available_ts <- ki_timeseries_list(
  hub = myhub, 
  station_id = my_station_ids
)

Now, when I run the last line of code (list all timeseries associated with each code), I get the error, "Error: object of type 'closure' is not subsettable".

I tried putting the last line in a loop and it works ok - I thought I could find the problem station by doing this here, but I didn't.

for (i in seq_along(my_station_ids)) {
available_ts <- ki_timeseries_list(
  hub = myhub, 
  station_id = my_station_ids[i]
)
}

Can you reproduce this issue? If so, any idea on how to fix it?

@Nova-Scotia
Copy link
Author

Nova-Scotia commented Jan 28, 2020

The code works this morning for test db. Haven't retried it on prod. Maybe it's related to the recent issues in production db?

@rywhale
Copy link
Owner

rywhale commented Jan 28, 2020

@Nova-Scotia Could be server-related but I think this might also be due to passing too many station_ids to ki_timeseries_list. If I run your example with hub="swmc" I get 200+ unique station_ids:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(stringr)
library(kiwisR)
#> Warning: package 'kiwisR' was built under R version 3.6.2

# List all stations in swmc hub that match "SNOW-WLF"
mystations <- ki_station_list(
  hub = "swmc"
  ) %>%
  filter(
    str_sub(station_no, 1, 8) == "SNOW-WLF" &
    !station_no %in% c(
      "SNOW-WLF-XX", 
      "SNOW-WLF-TEMP"
      )
    )

# List all codes that connect the station codes with their available time series.
my_station_ids <- unique(mystations$station_id)

print(length(my_station_ids))
#> [1] 218

Created on 2020-01-28 by the reprex package (v0.3.0)

You can get around this by chunking your query (e.g.- by filtering ts_name when you call ki_timeseries_list instead of getting every time series). Alternatively, you can loop and build the table with purrr::map_df. As a bonus, here's a method that will add a progress bar:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(stringr)
library(kiwisR)
#> Warning: package 'kiwisR' was built under R version 3.6.2
library(purrr)

# List all stations in swmc hub that match "SNOW-WLF"
mystations <- ki_station_list(
  hub = "swmc"
  ) %>%
  filter(
    str_sub(station_no, 1, 8) == "SNOW-WLF" &
    !station_no %in% c(
      "SNOW-WLF-XX", 
      "SNOW-WLF-TEMP"
      )
    )

# List all codes that connect the station codes with their available time series.
my_station_ids <- unique(mystations$station_id)

# Init progress bar
pb <- progress_estimated(length(my_station_ids))

station_ts <- my_station_ids %>%
  purrr::map_df(
    function(stn_id){
      # Add tick to bar
      pb$tick()$print()
      
      ki_timeseries_list(
        hub = "swmc",
        station_id = stn_id
      )
    }
  )

head(station_ts)
#> # A tibble: 6 x 6
#>   station_name station_id ts_id  ts_name from                to                 
#>   <chr>        <chr>      <chr>  <chr>   <dttm>              <dttm>             
#> 1 ALEXANDRIA   967631     12280~ SPG.Av~ NA                  NA                 
#> 2 ALEXANDRIA   967631     12156~ SPG.9.P NA                  NA                 
#> 3 ALEXANDRIA   967631     12156~ SPG.10~ NA                  NA                 
#> 4 ALEXANDRIA   967631     12156~ SPG.5.P NA                  NA                 
#> 5 ALEXANDRIA   967631     12156~ SPG.Av~ NA                  NA                 
#> 6 ALEXANDRIA   967631     12280~ SPG.Me~ NA                  NA

Created on 2020-01-28 by the reprex package (v0.3.0)

Open to suggestions on long-term fixes for the query limit issues. Two potential options:

  • detect the number of ids provided by the user and chunk the requests in the background (like above with added rate limiting)
  • add a warning to the docs and return a more informative error when this happens

@Nova-Scotia
Copy link
Author

Nova-Scotia commented Jan 28, 2020

Nice ideas - thanks so much for taking the time to look into this. I'll try out your posted solution and I LOVE progress bars :). The server limit is definitely an annoying barrier.

Just wanted to note though - my previous script did work before I updated from an older version of kiwisR to the newer version. Did anything change? Test also has about the same amount of data as prod.

@Nova-Scotia
Copy link
Author

Also, I like the first option if you're going to add a perma-fix - including the progress bar!

@rywhale
Copy link
Owner

rywhale commented Jan 28, 2020

@Nova-Scotia There are some changes coming down in R 4.0 related to classes/inheritance and I had to adjust some of the internal type checks to reflect this (#11, commit). More information on this here if you're interested in learning more.

I'm guessing that we're seeing this message when kiwisR fails to properly present the error message returned by the server upon reaching/surpassing the query limit. As to why you didn't hit the limit before- have no idea tbh. Perhaps the query limit is flexible/linked to how busy the server is 🤷‍♂ ?

@rywhale rywhale changed the title Error with ki_timeseries_list after updating kiwisR: "Error: object of type 'closure' is not subsettable" Improve Handling of Query Limit Issues Jan 28, 2020
@rywhale rywhale added bug todo Priority features to implement and removed needs investigation labels Jan 28, 2020
@rywhale rywhale self-assigned this Feb 14, 2020
@Nova-Scotia
Copy link
Author

Nova-Scotia commented Mar 6, 2020

Hey Ryan, just a quick FYI - it seems like the behaviour of the server is unpredictable. I just tried a minor variation on the code you supplied and got an error:

# Init progress bar
pb <- progress_estimated(length(stations$station_id))

# Grab data in chunks so you don't hit the server limits
station_ts <- stations$station_id %>%
  purrr::map_df(
    function(stn_id){
      # Add tick to bar
      pb$tick()$print()
      
      ki_timeseries_list(
        hub = myhub,
        station_id = stn_id
      )
    }
  )

| | 0% ~11 s remaining Error: object of type 'closure' is not subsettable

I dislike these server limits very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug todo Priority features to implement
Projects
None yet
Development

No branches or pull requests

2 participants