Skip to content

+254701935216 Data challenge submission #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 38 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,37 +1,55 @@
# Data Internship Code Challenge - 22nd May 2018
## Due: 23rd 9:00am May or Earlier


##### There are six tasks in this challenge, complete 1, 2 and 3 and write your answers in a ReadMe. Then choose between 4, 5, 6 and complete one of those. The submission instructions are specified at the bottom of the challenge page.

## Task 1
###### If you copy paste a set of steps more than 3 times, it’s time to write a what?

##Data redundancy function

## Task 2
###### Given a dataset on any one of Africa’s Talking products: Voice, SMS, Payments and USSD. Discuss the steps you would take to analyse the data to reach a conclusion.

## Task 3
###### Give an example explaining how K-means clustering works.

## Task 4
###### Given a Gigabyte of weather data, how would you go about calculating the mean temperature of a particular place and plotting a graph to show change in variation of daily temperature.
##Step i
###### Manipulate the data by creating a pivot table using excel or any other statistica package such as Stata, Spss.
Pivot tabe helps us filter and sort data by different variables and culculate standard devition and mean of the data.
##Step ii
######Check on outliers, trends, correlations, variations. this will help focus the analysis on the questions and any other objective.
##Step iii
######Interprate the results by either failing to reject or rejecting the hyothesis. In our interpratation we should ask ourself: How is data defend us against objection? How is the data answering the original question.Is there any limitations in our conclusion?

## Task 5
###### Suppose there’s a contest in which people each pick a number between 0 to 100. The winner of this contest will be the person who picks the number closest to ⅕ of the average. What’s the winning number.

## Task 6
###### Uber recently released data that you can find on uber movement website. Download and analyse interesting data that you find. State the insights that you got from analysing the data. Please send us your work as notebook format you prefer.
## Task 3
#### Give an example explaining how K-means clustering works.
### K-means clustering is exploratory data technique for a complete dataset analysis.
## Example.Apply K-mean clustering for the following dataset for two clusters.
x 10, 15, 13, 16
y 8, 12, 10, 14
K=2
Euclidian distance {(x,y) (a,b)} =√((x-〖a)〗^2+(y-〖b)〗^2 )
= √((10-〖15)〗^2+(8-〖12)〗^2 )
= √41
=6.403
Calculate Euclidian distance from cluster one =√((10-〖13)〗^2+(8-〖10)〗^2 )
=√13
=3.605
Calculate Euclidian distance from cluster two = √((15-〖13)〗^2+(12-〖10)〗^2 )
=√8
=2.828
Calculate updated class centroid = ((15+10)/2,(12+8)/2)
= (12.5, 10)
Calculate Euclidian distance from cluster one =√((16-〖12.5)〗^2+(14-〖10)〗^2 )
= √28.25
= 5.315
Calculate Euclidian distance from cluster third = √((16-〖13)〗^2+(14-〖10)〗^2 )
= √25
= 5
C_1 (10,8) (13,10)
C_2 (15,12) (16,14)

<br><br>

## How to submit
1. Check out the format for submitting your code [here](http://atdevoutreach.viewdocs.io/DataInternshipCodeChallengeMay2018/CodeChallengeSteps/)
## Task 4
###### Given a Gigabyte of weather data, how would you go about calculating the mean temperature of a particular place and plotting a graph to show change in variation of daily temperature.

2. Make sure when creating a branch to use your correct phone Number, as this is what we will use to get back to you.
### Using Ms Excel creat a pivot table for the data, fiter to obtain the data of a particular place of interest, Using mean function calculate the mean for that place. Finally, using pivot-gragh, plot a bar graph of dailly temperature against time

> NB: As a branch-name you can also use your email.
> See you on the other side, and best of luck!


## Join Slack
Expand Down