Skip to content

danbeutler/r-server-introduction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DAT213x - Analyzing Big Data with Microsoft R Server

Introduction to Microsoft R Server using the RevoScaleR package and some NYC taxi trip datasets.

For detailed information about this great EDX course by Microsoft go and visit:
👉 https://courses.edx.org/courses/course-v1:Microsoft+DAT213x+3T2016/info

Topics covered

  • File handling
  • Simple and complex transformation
  • Examining datasets
  • Examining outliers by filtering
  • Create sample datasets
  • Visualize data using tabs, plots and maps
  • Showing trends and distribution using plots
  • Reorder factor levels
  • Creating clusters
  • Split files in order to create training data
  • Create and examine linear model predictings
  • Compare predictions
  • Judging performance aspects on predictions

Basic functionality of package RevoScaleR

  • rxImport, RrxXdfData, RxTextData -> reading files
  • rxGetInfo -> display basic information about XDF files
  • rxSummary -> univariate summaries of objects within XDF files
  • rxDataStep -> data transformation
  • rxSplit -> split dataset into multiple sets
  • rxCube, rxCrossTab -> contingency tables
  • rxHistogram -> histogram plots
  • rxFactors -> factor variable recording
  • rxKmeans -> k-means clustering
  • rxLinMod -> linear models
  • rxDTree -> parallel external memory algorithm for classification and regression trees
  • rxDForest -> parallel external memory algorithm for classification and regression decision forests
  • rxPredict -> compute predicted values and residuals
  • rxQuantile -> approximate quantiles

About

DAT213x - Analyzing Big Data with Microsoft R Server

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages