Skip to content

Profiling of labkey.post #36

Open
Open
@juyeongkim

Description

@juyeongkim

I've noticed that fetching a large table from labkey via Rlabkey is significantly slower than other API client like JavaScript, so I profiled the labkey.selectRows call for a large table (182779 rows) from DataSpace.

profvis::profvis(Rlabkey::labkey.selectRows(
  baseUrl = "https://dataspace.cavd.org",
  folderPath = "/CAVD",
  schemaName = "study",
  queryName = "ICS" # 182779 rows
))

image

As we can see, actual fetching of data via POST only takes fraction of time in labkey.SelectRows call, and the majority of time is spent processing the response (processResponse) and creating a data.frame object (makeDF).

We can break it down in 5 steps:

  1. fetch raw data via POST
  2. parse json (with simplifying to data.frame) to a list to check status
  3. parse text from raw
  4. parse json (without simplifying to data.frame) to a list from text
  5. make a data.frame from list via c++ code

We can see that there are redundancies in this process.

  • We are parsing parsing json twice (step 2 and step 4)
  • We are creating data.frame twice (step 2 and step 5)

Another thing we should note is that jsonlite::fromJSON(simplifyDataFrame=TRUE) is more efficient in creating a data.frame than Rlabkey:::listToMatrix.

Can you please take a look into this and make changes accordingly?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions