diff --git a/DESCRIPTION b/DESCRIPTION index 1e66cb25..5938a3a3 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -2,7 +2,7 @@ Package: vtreat Type: Package Title: A Statistically Sound 'data.frame' Processor/Conditioner Version: 1.6.0 -Date: 2020-03-07 +Date: 2020-03-10 Authors@R: c( person("John", "Mount", email = "jmount@win-vector.com", role = c("aut", "cre")), person("Nina", "Zumel", email = "nzumel@win-vector.com", role = c("aut")), @@ -25,8 +25,8 @@ Imports: stats, digest Suggests: - rquery (>= 1.4.3), - rqdatatable (>= 1.2.6), + rquery (>= 1.4.4), + rqdatatable (>= 1.2.7), data.table (>= 1.12.2), isotone, lme4, diff --git a/NEWS.md b/NEWS.md index 2c7211c1..8dc16ac7 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,5 +1,5 @@ -# vtreat 1.6.0 2020/03/07 +# vtreat 1.6.0 2020/03/10 * More S3 methods. * Back-port pyvtreat recommendation code to Rvtreat. diff --git a/cran-comments.md b/cran-comments.md index 5bbd7adb..da98a0b0 100644 --- a/cran-comments.md +++ b/cran-comments.md @@ -11,46 +11,35 @@ * checking for file ‘vtreat/DESCRIPTION’ ... OK * checking extension type ... Package * this is package ‘vtreat’ version ‘1.6.0’ - * checking CRAN incoming feasibility ... NOTE + * checking CRAN incoming feasibility ... Note_to_CRAN_maintainers Maintainer: ‘John Mount ’ - Number of updates in past 6 months: 7 - Status: 1 NOTE ### Windows rhub::check_for_cran() - 715#> setting _R_CHECK_FORCE_SUGGESTS_ to false - 716#> setting R_COMPILE_AND_INSTALL_PACKAGES to never - 717#> setting R_REMOTES_STANDALONE to true - 718#> setting R_REMOTES_NO_ERRORS_FROM_WARNINGS to true - 719#> setting _R_CHECK_FORCE_SUGGESTS_ to true - 720#> setting _R_CHECK_CRAN_INCOMING_USE_ASPELL_ to true - 721#> * using log directory 'C:/Users/USERtgqFjUHpvO/vtreat.Rcheck' - 722#> * using R Under development (unstable) (2020-01-22 r77697) - 723#> * using platform: x86_64-w64-mingw32 (64-bit) - 724#> * using session charset: ISO8859-1 - 725#> * using option '--as-cran' - 726#> * checking for file 'vtreat/DESCRIPTION' ... OK - 727#> * checking extension type ... Package - 728#> * this is package 'vtreat' version '1.6.0' - 729#> * checking CRAN incoming feasibility ... NOTE - 730#> Maintainer: 'John Mount ' - 731#> Number of updates in past 6 months: 7 - 743#> * checking for future file timestamps ... NOTE - 744#> unable to verify current time - 774#> * checking sizes of PDF files under 'inst/doc' ... NOTE - 775#> Unable to find GhostScript executable to run checks on size reduction - 790#> Status: 3 NOTEs - Ghostscript and time notes are a property of the testing facility, not of the package. - -### Linux - - rhub::check_for_cran() - + 582#> setting _R_CHECK_FORCE_SUGGESTS_ to false + 583#> setting R_COMPILE_AND_INSTALL_PACKAGES to never + 584#> setting R_REMOTES_STANDALONE to true + 585#> setting R_REMOTES_NO_ERRORS_FROM_WARNINGS to true + 586#> setting _R_CHECK_FORCE_SUGGESTS_ to true + 587#> setting _R_CHECK_CRAN_INCOMING_USE_ASPELL_ to true + 588#> * using log directory 'C:/Users/USERGKUapBOMCz/vtreat.Rcheck' + 589#> * using R Under development (unstable) (2020-03-08 r77917) + 590#> * using platform: x86_64-w64-mingw32 (64-bit) + 591#> * using session charset: ISO8859-1 + 592#> * using option '--as-cran' + 593#> * checking for file 'vtreat/DESCRIPTION' ... OK + 594#> * checking extension type ... Package + 595#> * this is package 'vtreat' version '1.6.0' + 596#> * checking CRAN incoming feasibility ... Note_to_CRAN_maintainers + 597#> Maintainer: 'John Mount ' + 639#> * checking sizes of PDF files under 'inst/doc' ... NOTE + 640#> Unable to find GhostScript executable to run checks on size reduction + 655#> Status: 1 NOTE + Ghostscript note is a proprety of the check infrastructure, not the package. ## Downstream dependencies No declared reverse dependencies ( https://github.com/WinVector/vtreat/blob/master/extras/check_reverse_dependencies.md ). Mount and Zumel are not mis-spellings. - diff --git a/docs/articles/MultiClassVtreat.html b/docs/articles/MultiClassVtreat.html index 4cc678e1..9fd668ef 100644 --- a/docs/articles/MultiClassVtreat.html +++ b/docs/articles/MultiClassVtreat.html @@ -115,7 +115,7 @@

Multi Class vtreat

John Mount

-

2020-02-28

+

2020-03-10

Source: vignettes/MultiClassVtreat.Rmd diff --git a/docs/articles/SavingTreamentPlans.html b/docs/articles/SavingTreamentPlans.html index 65f73717..ba0ffb32 100644 --- a/docs/articles/SavingTreamentPlans.html +++ b/docs/articles/SavingTreamentPlans.html @@ -115,7 +115,7 @@

Saving Treatment Plans

John Mount

-

2020-02-28

+

2020-03-10

Source: vignettes/SavingTreamentPlans.Rmd diff --git a/docs/articles/VariableImportance.html b/docs/articles/VariableImportance.html index d5057fed..7a337067 100644 --- a/docs/articles/VariableImportance.html +++ b/docs/articles/VariableImportance.html @@ -115,7 +115,7 @@

vtreat Variable Importance

John Mount

-

2020-02-28

+

2020-03-10

Source: vignettes/VariableImportance.Rmd @@ -144,9 +144,9 @@

2020-02-28

d, varlist = c("x", "x_noise"), outcomename = "y") -
## [1] "vtreat 1.6.0 start initial treatment design Fri Feb 28 14:21:50 2020"
-## [1] " start cross frame work Fri Feb 28 14:21:51 2020"
-## [1] " vtreat::mkCrossFrameNExperiment done Fri Feb 28 14:21:51 2020"
+
## [1] "vtreat 1.6.0 start initial treatment design Tue Mar 10 16:03:13 2020"
+## [1] " start cross frame work Tue Mar 10 16:03:13 2020"
+## [1] " vtreat::mkCrossFrameNExperiment done Tue Mar 10 16:03:13 2020"
sf <- cfe$treatments$scoreFrame
 knitr::kable(sf[, c("varName", "rsq", "sig")])
diff --git a/docs/articles/vtreat.html b/docs/articles/vtreat.html index cf73280b..02b22780 100644 --- a/docs/articles/vtreat.html +++ b/docs/articles/vtreat.html @@ -115,7 +115,7 @@

vtreat package

John Mount, Nina Zumel

-

2020-02-28

+

2020-03-10

Source: vignettes/vtreat.Rmd @@ -241,8 +241,8 @@

2020-02-28

verbose=FALSE)print(treatmentsC$scoreFrame[, c('origName', 'varName', 'code', 'rsq', 'sig', 'extraModelDegrees')])
##   origName   varName  code         rsq        sig extraModelDegrees
-## 1        x    x_catP  catP 0.057741424 0.45748159                 2
-## 2        x    x_catB  catB 0.019483838 0.66603146                 2
+## 1        x    x_catP  catP 0.111456141 0.30194137                 2
+## 2        x    x_catB  catB 0.033761011 0.56994212                 2
 ## 3        z         z clean 0.237601767 0.13176020                 0
 ## 4        z   z_isBAD isBAD 0.296065432 0.09248399                 0
 ## 5        x  x_lev_NA   lev 0.296065432 0.09248399                 0
@@ -283,9 +283,9 @@ 

2020-02-28

verbose=FALSE) print(treatmentsN$scoreFrame[, c('origName', 'varName', 'code', 'rsq', 'sig', 'extraModelDegrees')])
##   origName   varName  code          rsq       sig extraModelDegrees
-## 1        x    x_catP  catP 2.105263e-01 0.2528101                 2
-## 2        x    x_catN  catN 4.310345e-03 0.8772535                 2
-## 3        x    x_catD  catD 2.302479e-01 0.2288609                 2
+## 1        x    x_catP  catP 3.700306e-01 0.1095637                 2
+## 2        x    x_catN  catN 1.088889e-01 0.4247287                 2
+## 3        x    x_catD  catD 3.743113e-01 0.1069707                 2
 ## 4        z         z clean 2.880952e-01 0.1701892                 0
 ## 5        z   z_isBAD isBAD 3.333333e-01 0.1339746                 0
 ## 6        x  x_lev_NA   lev 3.333333e-01 0.1339746                 0
diff --git a/docs/articles/vtreatCrossFrames.html b/docs/articles/vtreatCrossFrames.html
index e0c3929c..7817bd20 100644
--- a/docs/articles/vtreatCrossFrames.html
+++ b/docs/articles/vtreatCrossFrames.html
@@ -115,7 +115,7 @@
       

vtreat cross frames

John Mount, Nina Zumel

-

2020-02-28

+

2020-03-10

Source: vignettes/vtreatCrossFrames.Rmd @@ -168,13 +168,13 @@

'y',TRUE, rareCount=0 # Note: usually want rareCount>0, setting to zero to illustrate problem )

-
## [1] "vtreat 1.6.0 inspecting inputs Fri Feb 28 14:21:57 2020"
-## [1] "designing treatments Fri Feb 28 14:21:57 2020"
-## [1] " have initial level statistics Fri Feb 28 14:21:57 2020"
-## [1] " scoring treatments Fri Feb 28 14:21:58 2020"
-## [1] "have treatment plan Fri Feb 28 14:21:58 2020"
-## [1] "rescoring complex variables Fri Feb 28 14:21:58 2020"
-## [1] "done rescoring complex variables Fri Feb 28 14:21:58 2020"
+
## [1] "vtreat 1.6.0 inspecting inputs Tue Mar 10 16:03:18 2020"
+## [1] "designing treatments Tue Mar 10 16:03:18 2020"
+## [1] " have initial level statistics Tue Mar 10 16:03:18 2020"
+## [1] " scoring treatments Tue Mar 10 16:03:18 2020"
+## [1] "have treatment plan Tue Mar 10 16:03:18 2020"
+## [1] "rescoring complex variables Tue Mar 10 16:03:18 2020"
+## [1] "done rescoring complex variables Tue Mar 10 16:03:19 2020"
dTrainTreated <- vtreat::prepare(treatments,dTrain,
   pruneSig=c() # Note: usually want pruneSig to be a small fraction, setting to null to illustrate problems
 )
@@ -255,13 +255,13 @@

'y',TRUE, rareCount=0 # Note: usually want rareCount>0, setting to zero to illustrate problem ) -
## [1] "vtreat 1.6.0 inspecting inputs Fri Feb 28 14:21:58 2020"
-## [1] "designing treatments Fri Feb 28 14:21:58 2020"
-## [1] " have initial level statistics Fri Feb 28 14:21:58 2020"
-## [1] " scoring treatments Fri Feb 28 14:21:58 2020"
-## [1] "have treatment plan Fri Feb 28 14:21:59 2020"
-## [1] "rescoring complex variables Fri Feb 28 14:21:59 2020"
-## [1] "done rescoring complex variables Fri Feb 28 14:21:59 2020"
+
## [1] "vtreat 1.6.0 inspecting inputs Tue Mar 10 16:03:19 2020"
+## [1] "designing treatments Tue Mar 10 16:03:19 2020"
+## [1] " have initial level statistics Tue Mar 10 16:03:19 2020"
+## [1] " scoring treatments Tue Mar 10 16:03:19 2020"
+## [1] "have treatment plan Tue Mar 10 16:03:19 2020"
+## [1] "rescoring complex variables Tue Mar 10 16:03:19 2020"
+## [1] "done rescoring complex variables Tue Mar 10 16:03:19 2020"
-
## [1] "vtreat 1.6.0 start initial treatment design Fri Feb 28 14:21:59 2020"
-## [1] " start cross frame work Fri Feb 28 14:22:00 2020"
-## [1] " vtreat::mkCrossFrameCExperiment done Fri Feb 28 14:22:00 2020"
+
## [1] "vtreat 1.6.0 start initial treatment design Tue Mar 10 16:03:19 2020"
+## [1] " start cross frame work Tue Mar 10 16:03:20 2020"
+## [1] " vtreat::mkCrossFrameCExperiment done Tue Mar 10 16:03:20 2020"
treatments <- prep$treatments
 
 knitr::kable(treatments$scoreFrame[,c('varName','sig')])
diff --git a/docs/articles/vtreatGrouping.html b/docs/articles/vtreatGrouping.html index a03ec889..5b01cd9c 100644 --- a/docs/articles/vtreatGrouping.html +++ b/docs/articles/vtreatGrouping.html @@ -115,7 +115,7 @@

Grouping Example

Nina Zumel, Nate Sutton

-

2020-02-28

+

2020-03-10

Source: vignettes/vtreatGrouping.Rmd diff --git a/docs/articles/vtreatOverfit.html b/docs/articles/vtreatOverfit.html index fa372626..e47f53b3 100644 --- a/docs/articles/vtreatOverfit.html +++ b/docs/articles/vtreatOverfit.html @@ -115,7 +115,7 @@

vtreat overfit

John Mount, Nina Zumel

-

2020-02-28

+

2020-03-10

Source: vignettes/vtreatOverfit.Rmd @@ -148,13 +148,13 @@

-
## [1] "vtreat 1.6.0 inspecting inputs Fri Feb 28 14:22:05 2020"
-## [1] "designing treatments Fri Feb 28 14:22:05 2020"
-## [1] " have initial level statistics Fri Feb 28 14:22:05 2020"
-## [1] " scoring treatments Fri Feb 28 14:22:05 2020"
-## [1] "have treatment plan Fri Feb 28 14:22:05 2020"
-## [1] "rescoring complex variables Fri Feb 28 14:22:05 2020"
-## [1] "done rescoring complex variables Fri Feb 28 14:22:06 2020"
+
## [1] "vtreat 1.6.0 inspecting inputs Tue Mar 10 16:03:25 2020"
+## [1] "designing treatments Tue Mar 10 16:03:25 2020"
+## [1] " have initial level statistics Tue Mar 10 16:03:25 2020"
+## [1] " scoring treatments Tue Mar 10 16:03:25 2020"
+## [1] "have treatment plan Tue Mar 10 16:03:25 2020"
+## [1] "rescoring complex variables Tue Mar 10 16:03:25 2020"
+## [1] "done rescoring complex variables Tue Mar 10 16:03:25 2020"
dTrainTreated <- vtreat::prepare(treatments,dTrain,
   pruneSig=c() # Note: usually want pruneSig to be a small fraction, setting to null to illustrate problem
 )
@@ -252,13 +252,13 @@

rareCount=0, # Note set this to something larger, like 5 rareSig=c() # Note set this to something like 0.3 ) -
## [1] "vtreat 1.6.0 inspecting inputs Fri Feb 28 14:22:06 2020"
-## [1] "designing treatments Fri Feb 28 14:22:06 2020"
-## [1] " have initial level statistics Fri Feb 28 14:22:06 2020"
-## [1] " scoring treatments Fri Feb 28 14:22:06 2020"
-## [1] "have treatment plan Fri Feb 28 14:22:06 2020"
-## [1] "rescoring complex variables Fri Feb 28 14:22:06 2020"
-## [1] "done rescoring complex variables Fri Feb 28 14:22:06 2020"
+
## [1] "vtreat 1.6.0 inspecting inputs Tue Mar 10 16:03:25 2020"
+## [1] "designing treatments Tue Mar 10 16:03:25 2020"
+## [1] " have initial level statistics Tue Mar 10 16:03:25 2020"
+## [1] " scoring treatments Tue Mar 10 16:03:25 2020"
+## [1] "have treatment plan Tue Mar 10 16:03:25 2020"
+## [1] "rescoring complex variables Tue Mar 10 16:03:25 2020"
+## [1] "done rescoring complex variables Tue Mar 10 16:03:25 2020"
dTrainTreated <- vtreat::prepare(treatments,dTrain,
                                  pruneSig=c() # Note: set this to filter, like 0.05 or 1/nvars
 )
@@ -331,9 +331,9 @@ 

xdat <- vtreat::mkCrossFrameCExperiment(dTrain,'x','y',TRUE, rareCount=0, # Note set this to something larger, like 5 rareSig=c())

-
## [1] "vtreat 1.6.0 start initial treatment design Fri Feb 28 14:22:06 2020"
-## [1] " start cross frame work Fri Feb 28 14:22:06 2020"
-## [1] " vtreat::mkCrossFrameCExperiment done Fri Feb 28 14:22:06 2020"
+
## [1] "vtreat 1.6.0 start initial treatment design Tue Mar 10 16:03:25 2020"
+## [1] " start cross frame work Tue Mar 10 16:03:25 2020"
+## [1] " vtreat::mkCrossFrameCExperiment done Tue Mar 10 16:03:25 2020"
treatments <- xdat$treatments
 print(treatments$scoreFrame)
##   varName varMoves          rsq          sig needsSplit extraModelDegrees
diff --git a/docs/articles/vtreatRareLevels.html b/docs/articles/vtreatRareLevels.html
index 3460ea36..50ae0e65 100644
--- a/docs/articles/vtreatRareLevels.html
+++ b/docs/articles/vtreatRareLevels.html
@@ -115,7 +115,7 @@
       

vtreat Rare Levels

John Mount

-

2020-02-28

+

2020-03-10

Source: vignettes/vtreatRareLevels.Rmd diff --git a/docs/articles/vtreatScaleMode.html b/docs/articles/vtreatScaleMode.html index 46235d8d..29cab651 100644 --- a/docs/articles/vtreatScaleMode.html +++ b/docs/articles/vtreatScaleMode.html @@ -115,7 +115,7 @@

vtreat scale mode

Win-Vector LLC

-

2020-02-28

+

2020-03-10

Source: vignettes/vtreatScaleMode.Rmd @@ -194,12 +194,12 @@

2020-02-28

slopeFrame$badSlope <- ifelse(is.na(slopeFrame$slope), TRUE, abs(slopeFrame$slope - 1) > 1.e-8) print(slopeFrame)
-
##     varName          mean slope       sig badSlope
-## 1    x_catP  1.850372e-17     1 0.3289524    FALSE
-## 2    x_catB  1.387779e-17     1 0.3161341    FALSE
-## 3  x_lev_NA -6.938894e-18     1 0.2076623    FALSE
-## 4 x_lev_x_a  0.000000e+00     1 0.4097258    FALSE
-## 5 x_lev_x_b  0.000000e+00    NA 1.0000000     TRUE
+
##     varName          mean slope        sig badSlope
+## 1    x_catP  1.850372e-17     1 0.03392101    FALSE
+## 2    x_catB  1.387779e-17     1 1.00000000    FALSE
+## 3  x_lev_NA -6.938894e-18     1 0.20766228    FALSE
+## 4 x_lev_x_a  0.000000e+00     1 0.40972582    FALSE
+## 5 x_lev_x_b  0.000000e+00    NA 1.00000000     TRUE

The above claims are true with the exception of the derived variable x_lev_x.b. This is because the outcome variable y has identical distribution when the original variable x==‘b’ and when x!=‘b’ (on half the time in both cases). This means y is perfectly independent of x==‘b’ and the regression slope must be zero (thus, cannot be 1). vtreat now treats this as needing to scale by a multiplicative factor of zero. Note also that the significance level associated with x_lev_x.b is large, making this variable easy to prune. The varMoves and significance facts in treatmentsC$scoreFrame are about the un-scaled frame (where x_lev_x.b does in fact move).

For a good discussion of the application of y-aware scaling to Principal Components Analysis please see here.

Previous versions of vtreat (0.5.22 and earlier) would copy variables that could not be sensibly scaled into the treated frame unaltered. This was considered the “most faithful” thing to do. However we now feel that this practice was not safe for many downstream procedures, such as principal components analysis and geometric clustering.

@@ -276,9 +276,9 @@

cEraw <- vtreat::mkCrossFrameNExperiment(dTrainN, c('x1','x2','x3'),'y', scale=TRUE) -
## [1] "vtreat 1.6.0 start initial treatment design Fri Feb 28 14:22:13 2020"
-## [1] " start cross frame work Fri Feb 28 14:22:13 2020"
-## [1] " vtreat::mkCrossFrameNExperiment done Fri Feb 28 14:22:13 2020"
+
## [1] "vtreat 1.6.0 start initial treatment design Tue Mar 10 16:03:31 2020"
+## [1] " start cross frame work Tue Mar 10 16:03:31 2020"
+## [1] " vtreat::mkCrossFrameNExperiment done Tue Mar 10 16:03:31 2020"
## [1] "x1" "x2" "x3"
@@ -298,9 +298,9 @@

cEscaled <- vtreat::mkCrossFrameNExperiment(dTrainN, c('x1','x2','x3'),'yScaled', scale=TRUE) -
## [1] "vtreat 1.6.0 start initial treatment design Fri Feb 28 14:22:13 2020"
-## [1] " start cross frame work Fri Feb 28 14:22:13 2020"
-## [1] " vtreat::mkCrossFrameNExperiment done Fri Feb 28 14:22:14 2020"
+
## [1] "vtreat 1.6.0 start initial treatment design Tue Mar 10 16:03:31 2020"
+## [1] " start cross frame work Tue Mar 10 16:03:31 2020"
+## [1] " vtreat::mkCrossFrameNExperiment done Tue Mar 10 16:03:32 2020"
## [1] "x1" "x2" "x3"
diff --git a/docs/articles/vtreatSignificance.html b/docs/articles/vtreatSignificance.html index c55b2266..1235ffb8 100644 --- a/docs/articles/vtreatSignificance.html +++ b/docs/articles/vtreatSignificance.html @@ -115,7 +115,7 @@

vtreat significance

John Mount, Nina Zumel

-

2020-02-28

+

2020-03-10

Source: vignettes/vtreatSignificance.Rmd @@ -169,13 +169,13 @@

2020-02-28

## 2 FALSE lev002 lev002F ## 252 FALSE lev002 lev002F
treatmentsC <- vtreat::designTreatmentsC(d,c('catVarNoise','catVarPerfect'),'y',TRUE)
-
## [1] "vtreat 1.6.0 inspecting inputs Fri Feb 28 14:22:16 2020"
-## [1] "designing treatments Fri Feb 28 14:22:16 2020"
-## [1] " have initial level statistics Fri Feb 28 14:22:16 2020"
-## [1] " scoring treatments Fri Feb 28 14:22:16 2020"
-## [1] "have treatment plan Fri Feb 28 14:22:17 2020"
-## [1] "rescoring complex variables Fri Feb 28 14:22:17 2020"
-## [1] "done rescoring complex variables Fri Feb 28 14:22:17 2020"
+
## [1] "vtreat 1.6.0 inspecting inputs Tue Mar 10 16:03:34 2020"
+## [1] "designing treatments Tue Mar 10 16:03:34 2020"
+## [1] " have initial level statistics Tue Mar 10 16:03:34 2020"
+## [1] " scoring treatments Tue Mar 10 16:03:34 2020"
+## [1] "have treatment plan Tue Mar 10 16:03:34 2020"
+## [1] "rescoring complex variables Tue Mar 10 16:03:34 2020"
+## [1] "done rescoring complex variables Tue Mar 10 16:03:34 2020"
# Estimate effect significance (not coefficient significance).
 estSigGLM <- function(xVar,yVar,numberOfHiddenDegrees=0) {
   d <- data.frame(x=xVar,y=yVar,stringsAsFactors = FALSE)
diff --git a/docs/articles/vtreatSplitting.html b/docs/articles/vtreatSplitting.html
index 9e9f4783..70197797 100644
--- a/docs/articles/vtreatSplitting.html
+++ b/docs/articles/vtreatSplitting.html
@@ -115,7 +115,7 @@
       

vtreat splitting

John Mount, Nina Zumel

-

2020-02-28

+

2020-03-10

Source: vignettes/vtreatSplitting.Rmd @@ -192,7 +192,7 @@

## [1] 1 ## ## [[2]]$app -## [1] 2 3 +## [1] 3 2 ## ## ## attr(,"splitmethod") @@ -272,18 +272,18 @@

## ## [[2]] ## [[2]]$train -## [1] 2 3 5 +## [1] 3 4 5 ## ## [[2]]$app -## [1] 1 4 +## [1] 2 1 ## ## ## [[3]] ## [[3]]$train -## [1] 1 4 5 +## [1] 1 2 5 ## ## [[3]]$app -## [1] 2 3 +## [1] 3 4 ## ## ## attr(,"splitmethod") diff --git a/docs/articles/vtreatVariableTypes.html b/docs/articles/vtreatVariableTypes.html index e140ff71..eac83dd1 100644 --- a/docs/articles/vtreatVariableTypes.html +++ b/docs/articles/vtreatVariableTypes.html @@ -115,7 +115,7 @@

Variable Types

Win-Vector LLC

-

2020-02-28

+

2020-03-10

Source: vignettes/vtreatVariableTypes.Rmd @@ -148,13 +148,13 @@

z=c(1,2,3,4,NA,6),y=c(FALSE,FALSE,TRUE,FALSE,TRUE,TRUE), stringsAsFactors = FALSE) treatmentsC <- designTreatmentsC(dTrainC,colnames(dTrainC),'y',TRUE)

-
## [1] "vtreat 1.6.0 inspecting inputs Fri Feb 28 14:22:22 2020"
-## [1] "designing treatments Fri Feb 28 14:22:22 2020"
-## [1] " have initial level statistics Fri Feb 28 14:22:22 2020"
-## [1] " scoring treatments Fri Feb 28 14:22:22 2020"
-## [1] "have treatment plan Fri Feb 28 14:22:22 2020"
-## [1] "rescoring complex variables Fri Feb 28 14:22:22 2020"
-## [1] "done rescoring complex variables Fri Feb 28 14:22:22 2020"
+
## [1] "vtreat 1.6.0 inspecting inputs Tue Mar 10 16:03:38 2020"
+## [1] "designing treatments Tue Mar 10 16:03:38 2020"
+## [1] " have initial level statistics Tue Mar 10 16:03:38 2020"
+## [1] " scoring treatments Tue Mar 10 16:03:39 2020"
+## [1] "have treatment plan Tue Mar 10 16:03:39 2020"
+## [1] "rescoring complex variables Tue Mar 10 16:03:39 2020"
+## [1] "done rescoring complex variables Tue Mar 10 16:03:39 2020"
scoreColsToPrint <- c('origName','varName','code','rsq','sig','extraModelDegrees')
 print(treatmentsC$scoreFrame[,scoreColsToPrint])
##   origName   varName  code        rsq        sig extraModelDegrees
@@ -202,23 +202,23 @@ 

z=c(1,2,3,4,NA,6),y=as.numeric(c(FALSE,FALSE,TRUE,FALSE,TRUE,TRUE)), stringsAsFactors = FALSE) treatmentsN <- designTreatmentsN(dTrainN,colnames(dTrainN),'y')

-
## [1] "vtreat 1.6.0 inspecting inputs Fri Feb 28 14:22:22 2020"
-## [1] "designing treatments Fri Feb 28 14:22:22 2020"
-## [1] " have initial level statistics Fri Feb 28 14:22:22 2020"
-## [1] " scoring treatments Fri Feb 28 14:22:22 2020"
-## [1] "have treatment plan Fri Feb 28 14:22:22 2020"
-## [1] "rescoring complex variables Fri Feb 28 14:22:22 2020"
-## [1] "done rescoring complex variables Fri Feb 28 14:22:22 2020"
+
## [1] "vtreat 1.6.0 inspecting inputs Tue Mar 10 16:03:39 2020"
+## [1] "designing treatments Tue Mar 10 16:03:39 2020"
+## [1] " have initial level statistics Tue Mar 10 16:03:39 2020"
+## [1] " scoring treatments Tue Mar 10 16:03:39 2020"
+## [1] "have treatment plan Tue Mar 10 16:03:39 2020"
+## [1] "rescoring complex variables Tue Mar 10 16:03:39 2020"
+## [1] "done rescoring complex variables Tue Mar 10 16:03:39 2020"
print(treatmentsN$scoreFrame[,scoreColsToPrint])
-
##   origName   varName  code       rsq       sig extraModelDegrees
-## 1        x    x_catP  catP 0.2857143 0.2745766                 2
-## 2        x    x_catN  catN 0.1052632 0.5304117                 2
-## 3        x    x_catD  catD 0.1111111 0.5185185                 2
-## 4        z         z clean 0.3045045 0.2562868                 0
-## 5        z   z_isBAD isBAD 0.2000000 0.3739010                 0
-## 6        x  x_lev_NA   lev 0.2000000 0.3739010                 0
-## 7        x x_lev_x_a   lev 0.1111111 0.5185185                 0
-## 8        x x_lev_x_b   lev 0.0000000 1.0000000                 0
+
##   origName   varName  code          rsq       sig extraModelDegrees
+## 1        x    x_catP  catP 4.385965e-01 0.1518345                 2
+## 2        x    x_catN  catN 1.110223e-16 1.0000000                 2
+## 3        x    x_catD  catD 1.111111e-01 0.5185185                 2
+## 4        z         z clean 3.045045e-01 0.2562868                 0
+## 5        z   z_isBAD isBAD 2.000000e-01 0.3739010                 0
+## 6        x  x_lev_NA   lev 2.000000e-01 0.3739010                 0
+## 7        x x_lev_x_a   lev 1.111111e-01 0.5185185                 0
+## 8        x x_lev_x_b   lev 0.000000e+00 1.0000000                 0

The treatment of numeric targets is similar to that of categorical targets. In the numeric case the possible derived variable types are: