Open
Description
I've noticed that fetching a large table from labkey via Rlabkey
is significantly slower than other API client like JavaScript, so I profiled the labkey.selectRows
call for a large table (182779 rows) from DataSpace.
profvis::profvis(Rlabkey::labkey.selectRows(
baseUrl = "https://dataspace.cavd.org",
folderPath = "/CAVD",
schemaName = "study",
queryName = "ICS" # 182779 rows
))
As we can see, actual fetching of data via POST
only takes fraction of time in labkey.SelectRows
call, and the majority of time is spent processing the response (processResponse
) and creating a data.frame object (makeDF
).
We can break it down in 5 steps:
- fetch raw data via
POST
- parse json (with simplifying to data.frame) to a list to check status
- parse text from raw
- parse json (without simplifying to data.frame) to a list from text
- make a data.frame from list via c++ code
We can see that there are redundancies in this process.
- We are parsing parsing json twice (step 2 and step 4)
- We are creating data.frame twice (step 2 and step 5)
Another thing we should note is that jsonlite::fromJSON(simplifyDataFrame=TRUE)
is more efficient in creating a data.frame than Rlabkey:::listToMatrix
.
Can you please take a look into this and make changes accordingly?
Metadata
Metadata
Assignees
Labels
No labels