Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error for categories released over time #217

Open
robjhyndman opened this issue Aug 31, 2022 · 7 comments
Open

Error for categories released over time #217

robjhyndman opened this issue Aug 31, 2022 · 7 comments

Comments

@robjhyndman
Copy link

readabs::read_abs("8501.0", tables = 11)
#> Finding URLs for tables corresponding to ABS catalogue 8501.0
#> Attempting to download files from catalogue 8501.0, Retail Trade, Australia
#> Downloading https://www.abs.gov.au/statistics/industry/retail-and-wholesale-trade/retail-trade-australia/latest-release/8501011.xlsx
#> Warning in utils::download.file(url = url, destfile = destfile, mode = "wb", :
#> cannot open URL 'https://www.abs.gov.au/statistics/industry/retail-and-
#> wholesale-trade/retail-trade-australia/jul-2022/8501011.xlsx': HTTP status was
#> '404 Not Found'
#> Error in utils::download.file(url = url, destfile = destfile, mode = "wb", : cannot open URL 'https://www.abs.gov.au/statistics/industry/retail-and-wholesale-trade/retail-trade-australia/latest-release/8501011.xlsx'

Created on 2022-08-31 with reprex v2.0.2

This occurs when some tables are released earlier than other tables. In this example, it will error until 5 September when the remaining tables are released.

@MattCowgill
Copy link
Owner

Ah this is frustrating. Thanks for bringing it to my attention, @robjhyndman.

@MattCowgill
Copy link
Owner

The problem is on the ABS side. Their Time Series Directory API returns https://www.abs.gov.au/statistics/industry/retail-and-wholesale-trade/retail-trade-australia/latest-release/8501011.xlsx as the URL for this table. However, as the table wasn't included in the latest release, that URL is invalid.

Two options:

  1. Ask the ABS to amend this, so that the URL returned by the TSD is valid.
  2. Amend readabs's behaviour. I think the best option here is to first try the URL returned by the ABS TSD API, but then try a modified URL if that fails. The modified URL would replace latest-release in the URL with (for eg) jun-2022. We can infer the latest month from the ProductIssue field in the XML returned by the TSD API.

I will try (1) first, and will do (2) if the ABS is unable to assist.

@MattCowgill
Copy link
Owner

Note to self: contacted ABS on 2022-09-01 at 11:16am

@MattCowgill
Copy link
Owner

If the ABS responds that they are unable to help here, the changes made in #218 will allow me to address this problem

@MattCowgill
Copy link
Owner

Hi @robjhyndman, I received the following response from the ABS:

In this scenario, the latest issue of a topic has been released, but a scheduled release of additional time series tables is pending, the ‘latest-release’ URL for the pending files gives a '404 not found'.  We believe that directing users to the tables that are in the previous release of Retail Trade could potentially be misleading and so would prefer to leave the behaviour as it is.

As I see it, that leaves me three options with read_abs():

  1. Leave the current behaviour as is (requesting a table that hasn’t yet been released returns a generic error)
  2. Add a message (requesting a table that hasn’t yet been released yet returns an error with an informative message)
  3. Download the previous month’s spreadsheet (for releases like Retail Trade with this staggered release design, first try to download the latest release using the TSD; if that fails, try downloading the previous month’s table).

I am leaning towards 3. Any thoughts?

@robjhyndman
Copy link
Author

I agree. 3 sounds like the least pain for users. Perhaps issue a message explaining that the table returned is from the previous month.

@MattCowgill
Copy link
Owner

Note to self: if download fails with latest, try again by modifying the url - replace latest-release with tolower(format(as.Date(check_latest_date(cat_no, tables)), "%b-%Y")) ?

This is fiddlier than it should be

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants