My-Mr-Clean

What we are going to do is:

1.Get some article from Wikipedia to work with.

2.Extract meaningful and usable content from this article.

3.Clean up and filter the data to narrow the scope to relevant words

4.Build a simple frequency model.

5.Analysing the article based on this model.

The most frequent words should be more relevant to match this Ozone layer article content. If the user's search is made of the words "Ozone" and "gas", we can vectorize articles and represent them with a vector of size 2. The coefficient of the vectorized article would be the frequencies of the words from the search. By vectorizing all articles this way, we can build some kind of distance between articles. Another data engineering technique we could have had explored is stemming to get even more insight about the article.

Here ends our little journey into data exploration and information retrieval. If the database engineers of our team are ready, we can use some of the techniques we have seen to build up a small proff of concept!

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
MR_Clean.ipynb		MR_Clean.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

My-Mr-Clean

About

Releases

Packages

Languages

abd1bayev/My-Mr-Clean

Folders and files

Latest commit

History

Repository files navigation

My-Mr-Clean

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages