Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stat_steplineribbon() for step curves? #249

Closed
ASKurz opened this issue Apr 30, 2020 · 8 comments
Closed

stat_steplineribbon() for step curves? #249

ASKurz opened this issue Apr 30, 2020 · 8 comments
Milestone

Comments

@ASKurz
Copy link

ASKurz commented Apr 30, 2020

Would you consider support for step functions as in a Kaplan-Meier (KM) survival curve? Say I fit a KM-type model with brms::brm() on continuous-time-to-event data with t cases. The output from posterior_samples() would look something like this.

library(tidyverse)
library(brms)
library(tidybayes)
library(pammtools)  # for geom_stepribbon()

t <- 30

# make the samples
posterior_samples <-
  tibble(parameter = str_c("b_time", 1:t),
         Estimate = seq(from = -4, to = 0, length.out = t),
         Est.Error = seq(from = 0.9, to = 1.8, length.out = t),
         samples = map2(Estimate, Est.Error, rnorm, n = 4000)) %>% 
  unnest(samples) %>% 
  mutate(iter = rep(1:4000, times = t)) %>% 
  select(iter, parameter, samples) %>% 
  pivot_wider(names_from = parameter,
              values_from = samples)

Using posterior_samples() as the starting point, the workflow to get those samples in a format ready for plotting a survival curve with interval bands is something like this.

# wrangle
posterior_samples %>% 
  mutate(b_time0 = -Inf) %>% 
  select(b_time0, b_time1:str_c("b_time", t), iter) %>% 
  set_names(c(0:t, "iter")) %>% 
  pivot_longer(-iter,
               names_to = "interval",
               names_ptypes = list(interval = double())) %>% 
  group_by(iter) %>% 
  mutate(survival = cumprod(1 - inv_logit_scaled(value))) %>% 
  group_by(interval) %>% 
  # I can finally summarize!
  median_qi(survival) %>% 
  
  # now plot
  ggplot(aes(x = interval)) +
  geom_stepribbon(aes(ymin = .lower, ymax = .upper),
                  alpha = 1/3) +
  geom_step(aes(y = survival))

It’d be sweet if tidybayes offered something that allowed me to skip the median_qi() step and pump those MCMC draws into a geom that combined the sensibilities of my geom_stepribbon() + geom_step() combo. Maybe call it stat_steplineribbon().

@mjskay mjskay added this to the Next release milestone Apr 30, 2020
@mjskay
Copy link
Owner

mjskay commented Apr 30, 2020

cool yeah, good idea! I think I would probably make it an option to the existing lineribbon geom (like step = TRUE). Would be an easy thing to add... possibly I don't have time until further into May when I switch over to summer mode, but I'll add this issue to the next release milestone so I remember to look at it this summer :)

(also would totally accept a PR that does this)

@ASKurz
Copy link
Author

ASKurz commented Apr 30, 2020

A simple step = TRUE option would be a dream! As to the PR, I'd love to but I don't think my programming skills are ready to help package development, yet. When I look at this, I'm baffled.

@mjskay
Copy link
Owner

mjskay commented May 1, 2020

No worries! Yeah, the internals for geoms are especially intimidating, I don't blame you --- it took me a long time to work up the courage to try building one.

I needed a break from some other stuff and was feeling code-y so I put together a solution. Basically there's now a step argument that's default is FALSE, but can be TRUE, "mid", "hv", or "vh" (where the last three work the same as the direction argument to geom_step). TRUE is just an alias for "mid", as when you use "hv" or "vh" either the leftmost or rightmost interval on the ribbon is not visible (it ends up with a width of 0). This departs from the geom_step default (which is "hv"), but it seems to me that a default that removes an uncertainty interval is not great. As someone actually using these things I would be curious your thoughts though.

You can see the difference here (points shown for reference):

step = TRUE / step = "mid"
image

step = "hv" (what geom_step does by default)
image

step = "vh"
image

Anyway, this change means all the lineribbon geoms have a "step" argument now (geom_lineribbon, stat_lineribbon, and stat_dist_lineribbon). It's on the dev branch if you wanna check it out.

@ASKurz
Copy link
Author

ASKurz commented May 1, 2020

Dude. This looks sweet.

I see what you mean about not wanting to lose an uncertainty interval with hv or vh. I’ve become accustomed to using default hv and tacking on an extra level for x that duplicates the corresponding y-values.

Hey, do I execute something other than devtools::install_github("mjskay/tidybayes") to get the dev version? I’ve reinstalled tidybayes using that code and am getting the warning: “Ignoring unknown parameters: step” when I try to play with step = TRUE.

@mjskay
Copy link
Owner

mjskay commented May 1, 2020

Yeah it's on the dev branch currently (haven't merged to master because of Travis issues but it's the only change on there and the tests pass, so we're not talking bleeding edge really). I think the syntax is install_github("mjskay/tidybayes@dev")?

@ASKurz
Copy link
Author

ASKurz commented May 1, 2020

Works like a dream!

@mjskay
Copy link
Owner

mjskay commented May 1, 2020

Yay! 🙂

@ASKurz
Copy link
Author

ASKurz commented May 1, 2020

First use out in the wild: https://twitter.com/SolomonKurz/status/1256236803952164864

@mjskay mjskay closed this as completed in 4406bfe May 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants