From 6363ee1a71960eb2008256a77bf8bb01b9ec3a77 Mon Sep 17 00:00:00 2001 From: kierbn Date: Sat, 1 Mar 2025 16:29:23 +0000 Subject: [PATCH 1/2] Update Lab 2 - RDD, DataFrame, ML pipeline, and parallelization.md --- Lab 2 - RDD, DataFrame, ML pipeline, and parallelization.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Lab 2 - RDD, DataFrame, ML pipeline, and parallelization.md b/Lab 2 - RDD, DataFrame, ML pipeline, and parallelization.md index 8b3eb22..59b34d9 100644 --- a/Lab 2 - RDD, DataFrame, ML pipeline, and parallelization.md +++ b/Lab 2 - RDD, DataFrame, ML pipeline, and parallelization.md @@ -530,8 +530,8 @@ Starting from this lab, you need to use *as many DataFrame functions as possible Load the Aug95 NASA access log data in Lab 1 and create a DataFrame with FIVE columns by **specifying** the schema according to the description in the downloaded html file. Use this DataFrame for the following questions. -2. Find out the number of **unique** hosts in total (i.e. in August 1995)? -3. Find out the most frequent visitor, i.e. the host with the largest number of visits. +2. Find out the number of **unique** hosts in total (i.e. in August 1995)? [Answer: 75060 Unique Hosts] +3. Find out the most frequent visitor, i.e. the host with the largest number of visits. [Answer: "edams.ksc.nasa.gov] ### Linear regression for advertising From 25c6e088e3ec98e68602812274f08002658790b2 Mon Sep 17 00:00:00 2001 From: kierbn Date: Sat, 1 Mar 2025 16:48:15 +0000 Subject: [PATCH 2/2] Update Lab 2 - RDD, DataFrame, ML pipeline, and parallelization.md --- Lab 2 - RDD, DataFrame, ML pipeline, and parallelization.md | 1 + 1 file changed, 1 insertion(+) diff --git a/Lab 2 - RDD, DataFrame, ML pipeline, and parallelization.md b/Lab 2 - RDD, DataFrame, ML pipeline, and parallelization.md index 59b34d9..8c42b4b 100644 --- a/Lab 2 - RDD, DataFrame, ML pipeline, and parallelization.md +++ b/Lab 2 - RDD, DataFrame, ML pipeline, and parallelization.md @@ -536,6 +536,7 @@ Starting from this lab, you need to use *as many DataFrame functions as possible ### Linear regression for advertising 4. Add regularization to the [linear regression for advertising example](#example-linear-regression-for-advertising) and evaluate the prediction performance against the performance without any regularization. Study at least three different regularization settings. +[Answer: Adding increasing regularisation parameters (0.1, 0.2, 0.5) increases each of the predictions each time. Not really sure what else to put here without just copy pasting it all in?] ### Logistic regression for document classification