From 9dc4714125f95f621936290544c3b0a4d24b5a19 Mon Sep 17 00:00:00 2001
From: evalevanto <evatom86@gmail.com>
Date: Wed, 23 May 2018 03:05:30 +0300
Subject: [PATCH] my submission

---
 README.md | 40 +++++++++++++++-------------------------
 1 file changed, 15 insertions(+), 25 deletions(-)

diff --git a/README.md b/README.md
index 7e8fd89..7b22316 100644
--- a/README.md
+++ b/README.md
@@ -2,40 +2,30 @@
 ## Due: 23rd 9:00am May or Earlier
 
 
-##### There are six tasks in this challenge, complete 1, 2 and 3 and write your answers in a ReadMe. Then choose between 4, 5, 6 and complete one of those. The submission instructions are specified at the bottom of the challenge page.
-
 ## Task 1
 ###### If you copy paste a set of steps more than 3 times, it’s time to write a what?
-
+It is time to write a function. It enables code reuse, preventing code redundancy.
 
 ## Task 2
 ###### Given a dataset on any one of Africa’s Talking products: Voice, SMS, Payments and USSD. Discuss the steps you would take to analyse the data to reach a conclusion.
+Finding and understanding a problem to solve and the business need is the first step. This will drive my analysis and shape my project.  
+
+After this, obtaining the data follows. Connecting to a database; plugging into a data lake or store or using APIs to source payments data for instance are some of the ways to collect relevant data.
+ 
+The data might be messy- missing values or nconsistencies- and to get quality results, cleaning has to be done. This involves formatting the data to a prefferable format, defining extent and sample of data to use and fixing or removal of missing data. 
+
+Next comes undrstanding the data: Exploration. Here, visualisation of the various fields and digging into the various features by clustering to get an insight on how the data looks like and further extract features from the data.
+ 
+After this, working with machine learning algorithms to model the data and get predictive and interpretive results begins.
+Depending on the problem and the behaviour of the data, choose an algorithm which has a good F-score and one that generalises well. 
+
+These results are then interpreted and used to bulid new solutions or improve on existing ones.
 
 ## Task 3
 ###### Give an example explaining how K-means clustering works.
-
-## Task 4
-###### Given a Gigabyte of weather data, how would you go about calculating the mean temperature of a particular place and plotting a graph to show change in variation of daily temperature.
+K-Means algorithm is a clustering algorithm. It takes unlabelled datasets and cluster them in groups. It does so by first initialising points(centroids)- this is done randomly and the number of centroids depend on the number of groups the data will be clustered into. The algorithm iterates through two steps: assigning clusters to the examples then moving the centroid position. Basically, the algorithm goes through the data set and assigns each data point to one of the centroids depending the one it is closest to; then each of the centroids are moved to the average position of the data points assigned to it. These too steps are repeated until the centroids cannot move any further: that they are in their assigned data points' average. At this point, the algorithm will have converged and thus have found the clusters in the data.
 
 ## Task 5
 ###### Suppose there’s a contest in which people each pick a number between 0 to 100. The winner of this contest will be the person who picks the number closest to ⅕ of the average. What’s the winning number.
+100 cannot be the winning number. Given that 1/5 of the average cannot be 100. A rational player will not choose a number more than 20. If all players base their choice on such knowledge and assume all other players do so too, then the winning number will be no greater that 4 as the 1/5 of the average will be 4. This continous assumption will boil down to the most rational number chosen by all players to be zero. This is however, not realistic as not every player might base their answer on "rationality". The winning number is hard to infer. 
 
-## Task 6
-###### Uber recently released data that you can find on uber movement website. Download and analyse interesting data that you find. State the insights that you got from analysing the data. Please send us your work as notebook format you prefer.
-
-<br><br>
-
-## How to submit
-1. Check out the format for submitting your code [here](http://atdevoutreach.viewdocs.io/DataInternshipCodeChallengeMay2018/CodeChallengeSteps/)
-
-2.  Make sure when creating a branch to use your correct phone Number, as this is what we will use to get back to you.
-
-> NB: As a branch-name you can also use your email.
-> See you on the other side, and best of luck!
-
-
-## Join Slack
-In case you have any questions, join our Slack [here](https://slackin-africastalking.now.sh/) and join the #internship-challenge channel.
-
-## About Africa's Talking Code Challenges
-Please read the overview [here.](http://atdevoutreach.viewdocs.io/DataInternshipCodeChallengeMay2018/)