Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory and mbind problems in convGDX2MIF / reportFE #549

Closed
orichters opened this issue Feb 27, 2024 · 24 comments
Closed

memory and mbind problems in convGDX2MIF / reportFE #549

orichters opened this issue Feb 27, 2024 · 24 comments
Assignees

Comments

@orichters
Copy link
Contributor

orichters commented Feb 27, 2024

  • Running output.R -> single -> reporting.R runs out of memory with standard --mem=8000 setting, even --mem=12000 fails.
  • Reason seems to be that reportFE requires more memory
  • Using output.R -> single -> reporting, @orichters started output generation for several AMT testOneRegi runs. Result: Until 2023-12-22, it works, afterwards, it fails, see cd /p/projects/remind/modeltests/remind/output; grep oom-kill testOne*/log_output.txt
  • @0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q found out it is gdx::readGDX(), triggered by vm_demFENonEnergySector:
> foo <- gdx::readGDX('./output/testOneRegi-Base-AMT_2024-02-16_22.08.20/fulldata.gdx', 'vm_demFENonEnergySector', field = 'l', restore_zeros = TRUE, react = 'silent')
> format(object.size(foo), units = 'auto')
[1] "3 Gb"
> bar <- gdx::readGDX('./output/testOneRegi-Base-AMT_2024-02-16_22.08.20/fulldata.gdx', 'vm_demFENonEnergySector', field = 'l', restore_zeros = FALSE, react = 'silent')
> format(object.size(bar), units = 'auto')
[1] "138.9 Kb"
  • According to @0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q, one cannot simply set restore_zeros = FALSE because we declare all variables over the general supersets (e.g., tall, all_enty), but usually define them only over subsets (t, entyFE). restore_zeros = FALSE reads in only data that is not zero, which is usually what we want. But if some axis of the data (e.g. all values in 2010, or everything fehos) happens to be entirely zero (usually that happens for specific gdxes only), that dimension of the resulting magpie object differs (e.g. c(2005, 2015, …) instead of c(2005, 2010, 2015, …)) and it is incompatible with all the rest of the objects we have. What would be needed is a function to expands a variable read in with restore_zeros = FALSE to the actual sets it should have. But that is a moving target, as it depends on all the other variables/parameters used in conjunction with it.
  • magpie_expand is not the solution

Some code to start with:

library(magclass)
t <- c(1995, 2005, 2015)
s <- c('AFR', 'CPA')
A <- dimSums(maxample('pop')[s,t,'A2'], 3)
B <- dimSums(maxample('pop')[s,t[-2],'B1'], 3)

The question is how to be able to sum, mbind etc. A and B.

The problem is that the dimensions each variable is defined (not declared) over in the GAMS code (e.g. t instead of tall, se2fe(entySE,entyFE,te) instead of all all_enty/all_enty/all_te combinations) is not pre-defined, and the problem depends on which other variables the variables is combined with (in magpie operations) and what data those variables have.

@orichters
Copy link
Contributor Author

orichters commented Feb 27, 2024

  1. code suggestion by @0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q to expand a to the joint size of a and b:
somebody_name_this_function <- function(a, b, fill = 0) {
    r <- new.magpie(cells_and_regions = union(getRegions(a), getRegions(b)),
                    years = sort(union(getYears(a, as.integer = TRUE), 
                                       getYears(b, as.integer = TRUE))),
                    names = union(getNames(a), getNames(b)),
                    fill = fill,
                    sets = names(dimnames(b)))
   
    r[getRegions(a),getYears(a),getNames(a)] <- a
    return(r)
}
B_ <- B[,t[-2],] %>% somebody_name_this_function(A)
A + B_

which returns:

An object of class "magpie"
, , 1
     t
i        y1995   y2005   y2015
  AFR 1105.333  696.44 1821.22
  CPA 2561.270 1429.53 3018.20
> system.time({
+     x <- vm_demFENonEnergySector <- readGDX(gdx, "vm_demFENonEnergySector",
+                                             field = "l", restore_zeros = TRUE,
+                                             react = "silent")[,t,]})
   user  system elapsed 
 12.134   4.380  16.553 
> format(object.size(x), units = 'auto')
[1] "1.5 Gb"
> system.time({
+     ref <- new.magpie(
+         cells_and_regions = readGDX(gdx, 'regi'),
+         years = readGDX(gdx, 't'),
+         names = quitte::cartesian(
+             readGDX(gdx, 'se2fe') %>% 
+                 mutate(x = paste(.data$all_enty, .data$all_enty1,
+                                  sep = '.')) %>% 
+                 pull('x'),
+             'indst',
+             readGDX(gdx, 'all_emiMkt')),
+         fill = 0,
+         sets = c('t', 'regi', 'entySE', 'entyFE', 'sector', 'emiMkt'))
+     
+     
+     
+     y <- vm_demFENonEnergySector <- readGDX(gdx, "vm_demFENonEnergySector",
+                                             field = "l", restore_zeros = FALSE,
+                                             react = "silent") %>% 
+         somebody_name_this_function(ref)})
   user  system elapsed 
  0.133   0.000   0.129 
> format(object.size(y), units = 'auto')
[1] "257.8 Kb"

@orichters
Copy link
Contributor Author

I rather think of a one-sided function that restricts and expands x to the size of ref:

somebody_name_this_function <- function(x, ref, fill = 0) {
  r <- new.magpie(cells_and_regions = getRegions(ref),
                  years = getYears(ref),
                  names = getNames(x),
                  fill = fill,
                  sets = names(dimnames(ref)))
  r[intersect(getRegions(x), getRegions(r)), intersect(getYears(x), getYears(r)), getNames(x)] <- x
  return(r)
}

@0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q
Copy link
Member

"Nicht mein Zirkus, nicht meine Affen."

@0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q
Copy link
Member

region <- c("AAA", "BBB", "CCC")
t      <- paste0("y", c(2000, 2005, 2010))
name   <- c("foo", "bar", "bazz")

A <- new.magpie(cells_and_regions = region, years = t, names = name, fill = 1,
                sets = c("region", "t", "name"))

# spatial ----
## x smaller then ref ----
expect_equal(somebody_name_this_function(A[region[-2],,], A),
             `mselect<-`(A, region = region[2], value = 0))

## x larger then ref ----
expect_equal(somebody_name_this_function(A, A[region[-2],,]),
             A[region[-2],,])

# temporal ----
## x smaller then ref ----
expect_equal(somebody_name_this_function(A[,t[-2],], A),
             `mselect<-`(A, t = t[2], value = 0))

## x larger then ref ----
expect_equal(somebody_name_this_function(x = A, ref = A[,t[-2],]),
             A[,t[-2],])

# names ----
## x smaller then ref ----
expect_equal(somebody_name_this_function(A[,,name[-2]], A),
             `mselect<-`(A, name = name[2], value = 0))

## x larger then ref ----
expect_equal(somebody_name_this_function(A, A[,,name[-2]]),
             A[,,name[-2]])

@0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q
Copy link
Member

I found a (somewhat) reliable way to assess the size of the problem.

/p/projects/remind/modeltests/remind/output/SSP2EU-PkBudg1050-AMT_2024-02-24_01.16.37 (develop|?? M)$ /usr/bin/time -v Rscript -e 'format(object.size(gdx::readGDX("fulldata.gdx", "vm_demFENonEnergySector", field = "l", restore_zeros = FALSE)), units = "auto")' 2>&1 | grep -E '(\[|Maximum resident set size)'
[1] "66 Kb"
	Maximum resident set size (kbytes): 131152
/p/projects/remind/modeltests/remind/output/SSP2EU-PkBudg1050-AMT_2024-02-24_01.16.37 (develop|?? M)$ /usr/bin/time -v Rscript -e 'format(object.size(gdx::readGDX("fulldata.gdx", "vm_demFENonEnergySector", field = "l", restore_zeros = TRUE)), units = "auto")' 2>&1 | grep 
-E '(\[|Maximum resident set size)'
[1] "3 Gb"
	Maximum resident set size (kbytes): 9496012

So the returned object goes from 66 kb to 3 GB (with filtering for t it is 1.5 GB I believe), while peak memory demand increases by 9.3 GB.

Peak memory demand of convGDX2MIF() seems to be 16 GB currently:

$ sbatch --wrap "/usr/bin/time -v Rscript -e 'x <- remind2::convGDX2MIF(\"/p/projects/remind/modeltests/remind/output/SSP2EU-
PkBudg1050-AMT_2024-02-24_01.16.37/fulldata.gdx\")' 2>&1 | grep 'Maximum resident set size'" --output=foo.log --qos="priority" --mem=480
00 --mail-type=END
$ cat foo.log 
	Maximum resident set size (kbytes): 15919132

@orichters
Copy link
Contributor Author

Peak memory demand of convGDX2MIF() seems to be 16 GB currently:

That fits my experience that using --mem=16000 was sufficient.

@fbenke-pik
Copy link
Contributor

Will work on this next week (KW10)

@fbenke-pik
Copy link
Contributor

fbenke-pik commented Mar 1, 2024

I rather think of a one-sided function that restricts and expands x to the size of ref:

Keeping the use case of this helper in remind2 reporting in mind, I'd assume it should

  • expand x to ref, but not restrict it to ref
  • only be applied to dim 1 and 2 (region and year) Nvm, thinking about it more, I guess it won't hurt to expand to dim 3

@fbenke-pik
Copy link
Contributor

@0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q could you point me to gdxes with irregularities that could make reportEmi and reportFE crash if I don't restore zeros?

@0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q
Copy link
Member

Nope.
@mellamoSimon worked on this, I only surmised why he did what he did. I have no idea if that is a problem that crops up regularly, or if he encountered it at all, or if he just did as the Romans do.
But in theory you should be able to test this with every instance of readGDX(restore_zeros = TRUE):

$ ag 'restore_zeros *= *T'
R/reportFE.R
82:  vm_demFENonEnergySector <- readGDX(gdx, "vm_demFENonEnergySector", field = "l", restore_zeros = T, react = "silent")[,t,]*TWa_2_EJ

R/reportSE.R
77:    storLoss <- readGDX(gdx, name = c("v32_storloss", "v_storloss"), field = "l", restore_zeros = TRUE, format = "first_found") * pm_conv_TWa_EJ
434:  pm_fuExtrOwnCons <- readGDX(gdx, "pm_fuExtrOwnCons", restore_zeros = T)[,,getNames(pm_fuExtrOwnCons_reduced)]

R/reportPrices.R
991:  o01_CESderivatives <- readGDX(gdx, "o01_CESderivatives", restore_zeros = T, react = "silent")

R/reportEmi.R
149:  # note: this has to be read in with restore_zeros=T because sometimes it contains only non-zero values for "ETS" emiMkt
151:  pm_IndstCO2Captured <- readGDX(gdx, "pm_IndstCO2Captured", restore_zeros = T,
196:  vm_emiIndCCS <- readGDX(gdx, "vm_emiIndCCS", field = "l", restore_zeros = T)[, t, ]
332:  vm_plasticsCarbon <- readGDX(gdx, "vm_plasticsCarbon", field = "l", restore_zeros = T, react = "silent")[,t,]
338:    vm_feedstockEmiUnknownFate  <- readGDX(gdx, "vm_feedstockEmiUnknownFate", field = "l", restore_zeros = T, react = "silent")[,t,]
339:    vm_incinerationEmi          <- readGDX(gdx, "vm_incinerationEmi", field = "l", restore_zeros = T, react = "silent")[,t,]
340:    vm_nonIncineratedPlastics   <- readGDX(gdx, "vm_nonIncineratedPlastics", field = "l", restore_zeros = T, react = "silent")[,t,]
2704:  p47_LULUCFEmi_GrassiShift <- readGDX(gdx, "p47_LULUCFEmi_GrassiShift", restore_zeros = T, react = "silent")[getRegions(out), getYears(out),]

@mellamoSimon
Copy link
Contributor

hey, yeah, that's a good assumption, I eventually set restore_zeros = T because the script would fail otherwise. I did that long ago though so I don't recall the exact mismatch of dimensions that would happen. I can try to reproduce it and let you know? @fbenke-pik

@0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q
Copy link
Member

Keeping the use case of this helper in remind2 reporting in mind, I'd assume it should

  • expand x to ref, but not restrict it to ref

But the use case in remind2 does always include also a restriction. All the examples above have something like [,t,] attached to the readGDX() call (except for storLoss, which does it in a funky way a couple of lines below).
I agree with @orichters on this one.

@fbenke-pik
Copy link
Contributor

fbenke-pik commented Mar 6, 2024

Thanks Simón. Right now I favour the following approach:

  • Avoid restoring zeros whenever I see no immediate need to do so
  • Run adjusted reporting on all the latest AMT gdxes
  • Deal with the crashes coming out of this individually
  • Merge and keep an eye on the reporting

Let me know in case there are objections. If not, then nothing to do on your end, Simón.

@mellamoSimon
Copy link
Contributor

Hi Falk, that sounds good, thank you!

@fbenke-pik
Copy link
Contributor

Looks like this PR should solve the problems:
#557

Could solve the problems mostly by introducing new options to readGDX (pik-piam/gdx#8).

As of now, I don't see a need for somebody_name_this_function.

Validation:

  • I reran validation on the latest REMIND runs that actually produced a gdx /p/tmp/benke/validation
  • They all succeeded with mem=8000
  • Summation problems seem to be the same as /p/tmp/benke/validation/validate_remind2-29209181.out
  • Some gaps have gotten smaller actually (does not have to be due to this PR though, but other changes since original reporting ran), none have increased from what I can see

@fbenke-pik
Copy link
Contributor

Leaving this open, as it will probably become relevant in the future again. (see #524)

@0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q
Copy link
Member

The remindmodel/remind#1609 problem comes down to the input.gdx, which is from the current set of CES parameters sports a whole lot of zeros for SE/FE technologies that do not actually exist.

readGDX(gdx, 'vm_demFENonEnergySector', field = 'l', restore_zeros = FALSE)[,t,] %>%
    magclass_to_tibble() %>% 
    anti_join(readGDX(gdx, 'sefe')) %>% 
    filter(!is.na(value)) %>% 
    distinct(all_enty, all_enty1, .keep_all = TRUE)
Joining with `by = join_by(all_enty, all_enty1)`
# A tibble: 19 × 7
   all_regi  ttot all_enty all_enty1 emi_sectors all_emiMkt value
   <chr>    <int> <chr>    <chr>     <chr>       <chr>      <dbl>
 1 EUR       2005 seliqbio fegas     indst       ETS            0
 2 EUR       2005 seliqbio fesos     indst       ETS            0
 3 EUR       2005 seliqsyn fegas     indst       ETS            0
 4 EUR       2005 seliqsyn fesos     indst       ETS            0
 5 EUR       2005 sesobio  fegas     indst       ETS            0
 6 EUR       2005 sesobio  fehos     indst       ETS            0
 7 EUR       2005 seel     fegas     indst       ETS            0
 8 EUR       2005 seel     fehos     indst       ETS            0
 9 EUR       2005 seel     fesos     indst       ETS            0
10 EUR       2005 seh2     fegas     indst       ETS            0
11 EUR       2005 seh2     fehos     indst       ETS            0
12 EUR       2005 seh2     fesos     indst       ETS            0
13 EUR       2005 segabio  fehos     indst       ETS            0
14 EUR       2005 segabio  fesos     indst       ETS            0
15 EUR       2005 segasyn  fehos     indst       ETS            0
16 EUR       2005 segasyn  fesos     indst       ETS            0
17 EUR       2005 sehe     fegas     indst       ETS            0
18 EUR       2005 sehe     fehos     indst       ETS            0
19 EUR       2005 sehe     fesos     indst       ETS            0

Probably some REMIND/GAMS carry-over.

@ricardarosemann
Copy link
Contributor

I also came across the issue of the non-existing SE/FE technologies today, when trying to do reporting for a Remind Base run, which failed due to the vm_demFeSector/vm_demFENonEnergySector mismatch. Falk's merge fortunately fixed that.
I tried to trace down the origin of this error and I would guess that the non-existing SE/FE technologies might come from (in Remind)

q37_FossilFeedstock_Base(t,regi,entyFe,emiMkt)$(
                         entyFE2sector2emiMkt_NonEn(entyFe,"indst",emiMkt)
                     AND cm_emiscen eq 1                                   ) ..
  sum(entySe, vm_demFENonEnergySector(t,regi,entySe,entyFe,"indst",emiMkt))
  =e=
  sum(entySeFos,
    vm_demFENonEnergySector(t,regi,entySeFos,entyFe,"indst",emiMkt)
  )
;

as there is no sefe mapping active here (and this equation is only active for Base runs). Maybe it's worth looking into this.

@0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q
Copy link
Member

Thanks Ricarda.
@orichters, was remindmodel/remind#1609 a Base run?

@orichters
Copy link
Contributor Author

The error that occured? That was in a testOneRegi.

@ricardarosemann
Copy link
Contributor

testOneRegi runs should also be subject to the equation I posted - as it also has cm_emiscen = 1 (the default).

@0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q
Copy link
Member

No, you can run testOneRegi for any configuration you want. Baseline, NPi, Policy, calibration.
q37_FossilFeedstock_Base is standing next to the Guillotine anyhow, so …

@orichters
Copy link
Contributor Author

Well, it was a testOneRegi as called by make test, so with default settings.

@fbenke-pik
Copy link
Contributor

Closing, as this seems to be stable now, not causing any further issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants