Skip to content

The My Mr Clean project combines the most repetitive words on this Site and makes it visually appealing

Notifications You must be signed in to change notification settings

abd1bayev/My-Mr-Clean

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

My-Mr-Clean

What we are going to do is:

1.Get some article from Wikipedia to work with.

2.Extract meaningful and usable content from this article.

3.Clean up and filter the data to narrow the scope to relevant words

4.Build a simple frequency model.

5.Analysing the article based on this model.

The most frequent words should be more relevant to match this Ozone layer article content. If the user's search is made of the words "Ozone" and "gas", we can vectorize articles and represent them with a vector of size 2. The coefficient of the vectorized article would be the frequencies of the words from the search. By vectorizing all articles this way, we can build some kind of distance between articles. Another data engineering technique we could have had explored is stemming to get even more insight about the article.

Here ends our little journey into data exploration and information retrieval. If the database engineers of our team are ready, we can use some of the techniques we have seen to build up a small proff of concept!

image image

About

The My Mr Clean project combines the most repetitive words on this Site and makes it visually appealing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published