- Multicollinearity check in R
- Data transformations
- Outlier
- Missing value
- Skewness vs Kurtosis
- Sampling for imbalanced sample
- Degrees of freedom
- T - test
- ANOVA
- AUC
- Levenes Test
- Introduction to Principal Component Analysis
- Mahalanobis distance
- 11 dimensionality reduction techniques
Microsoft has some free #datascience and #machinelearning courses on GitHub
- Handling High Cardinality
- What is nn.Embedding
- Categorical Embedder: Encoding Categorical Variables via Neural Networks
- Want to Become a Data Scientist
- Difference of Data Science, Machine Learning and Data Mining
- Data Science, Machine Learning, BI Explained in a Amazing Paragraphs
- What is Predictive Analytics
- 𝐍𝐔𝐌𝐄𝐑𝐈𝐂𝐀𝐋 𝐎𝐏𝐓𝐈𝐌𝐈𝐙𝐀𝐓𝐈𝐎𝐍 𝐁𝐎𝐎𝐊 𝐛𝐲 𝐉𝐨𝐫𝐠𝐞 𝐍𝐨𝐜𝐞𝐝𝐚𝐥 𝐒𝐭𝐞𝐩𝐡𝐞𝐧 𝐚𝐧𝐝 𝐉. 𝐖𝐫𝐢𝐠𝐡𝐭
- 𝐆𝐞𝐧𝐞𝐫𝐚𝐥𝐢𝐳𝐞𝐝 𝐋𝐢𝐧𝐞𝐚𝐫 𝐌𝐨𝐝𝐞𝐥𝐬 𝐛𝐲 𝐏. 𝐌𝐜𝐂𝐮𝐥𝐥𝐚𝐠𝐡 𝐚𝐧𝐝 𝐉.𝐀.𝐍𝐞𝐝𝐥𝐞𝐫
- Introduction to Econometrics with R
- Getting Started Tensorflow
- How to implement deep learning in r using keras and tensorflow
- Deep Learning Research
- Trending DS deep learning methods
- explaining BERT simply using sketches
- Data science ecosystem: R vs Python vs Substitutes
- R for ML
- PYCaret
- automl Frameworks
- Task scheduling with python
- R Studio Blog: r vs python
- Quora Blog: r vs python
- Why r for data science and not python
- XanderHorn autoML
- How can r users learn python for data science
- r numpy
- pandas vs datatable
- Equivalents in r python and perl
- Exploratory data analysis with r
- Exploratory data analysis eda
- Python standard env
- Python Read Write tables
- psycopg2 tutorial
- ggplot2 package
- How to add a background image to ggplot2 graphs
- 7 visualizations you should learn in r
- Analyzing the 8 best visualization techniques
- Data Visualization Techniques
- Plotly
- Matplotlib
- Matplotlib org
- Shiny vs dash a side by side comparison
R Shiny
Optimization
Decision Tree
Random Forest
- machine learning random forest from scratch with python
- Random forests from scratch
- An implementation and explanation of the random forest in python
- Understanding random forest and hyper parameter tuning
- Bootstrapping and oob samples in random forests
- Hyperparameter tuning the random forest
- Random forests h2o
GBM
XGBoost
- XGBoost read the docs
- XGB github
- Introduction to XGBoost
- Ensemble r machine learning
- Ensemble learning
- XGBoost tuning of regularization
- XGBoost algorithm
- Fraud Detection
LightGBM
- LGBM - Laurae
- Lightgbm Parameters Guide
- Talkingdata adtracking fraud detection
- L1 and L2 Regularisation
- Microsoft malware prediction
- titanic voting pipeline stack and guide
- Text Classification
- Sentiment analysis countvectorizer TF IDF
- Sentiment analysis TF IDF
- Transformers bert roberta
- Roberta fastai huggingface transformers
- Transformer with LSTM
- How to preprocessing for glove part1 eda
- How to preprocessing for glove part2 usage
- how to preprocessing when using embeddings
- What is cohort analysis and how should i use
- A beginners guide to cohort analysis
- Cohort and multi touch attribution
- What can you do with a cohort analysis
- How to use cohort data to analyze user behavior
- Benefits of performing a cohort analysis
- RFM segmentation
- Behavioral Cohorts
- Cohorts git
- Uber Orbit python library
- Facebook releases prophet its free forecasting tools for python and r
- Weather forecast with regression models
- ARIMA model statsmodels python
- Time series analysis
- sklearn model selection TimeSeriesSplit
- Ensemble of trees for forecasting time series
- TSrepr time series representations
- Time series analysis using ARIMA model in r
- ARIMA model time series forecasting python
- Time series model of forecasting future power demand
- Timeseries classification
- Statsmodels tsa ARIMA
- Prediction task with Multivariate Time Series and VAR model
- NEURAL NETWORKS for algorithmic trading
- Awesome deep trading
- PAA
- Web traffic time series forecast
- MultiQuantile Neural Hierarchical Interpolation for Time Series
R usefull packages to learn time series (Credits to Matt Dancho
)
- #timetk time series data wrangling + visualization
- #modeltime time series forecasting
- #modeltime.ensemble make ensemble forecasts
- #modeltime.h2o use AutoML for forecasting
- #modeltime.gluonts use deep learning for forecasting
- #modeltime.resample backtest your forecast
- #boostime forecasting with lightgbm and catboost
- ML understanding using R
- Gitpage - Shapwaterfall
- python - Shapwaterfall
- SHAP
- SHAP example
- SHAP from Scratch - part1
- black box models are actually more explainable than a logistic regression
- shap waterfall
Links are copied from Linkdin post from Ian Johnson, Silicon Valley, CA, United States
- LightGBM
- Gradient boosting
- Support vector machine (SVM)
- TensorFlow Multiple linear regression
- Neural network
- TensorFlow neural network
- XGBoost
- Multiple linear regression, decision trees, random forest
- Clustering a tutorial for cluster analysis with r
- The 5 clustering algorithms data scientists need to know
- isolation forest from scratch
- Anomaly Detection LSTM AutoEncoder
- Extended Isolation Forest
- Paper: Isolation-based Anomaly Detection
- Isolation Forest - A simple walkthorugh video
- AzureML
- anomaly detection and mahalanobis distance
- Multivariate Time-series Anomaly Detection via Graph Attention Network
- Handling Incomplete Heterogeneous Data using VAEs
- VAEs in the Presence of Missing Data
- Dimensionality Reduction Using Deep Learning: Autoencoder
- Why churn
- Customer churn prediction for subscription businesses using machine learning
- Customer churn Logistic Regression with R
- Churn classification
- Survival prediction using cost sensitive learning
- 6 factors to consider before building a predictive model for life insurance
- Predicting customer churn
- Project - detecting early alzheimer
- What drives b2b customer attrition
- Define customer churn b2b
- Churn Rate
- Identify churn at risk customers
- Customer retention metrics
- New customers vs return customers
- Customer purchase behaviour analytics
- Estimating customer churn based on usage data
Deep Learning Method
**Search Distance Methods
- Web Scraping Machine Learning using python
- Web Scraping Hackmageddon
- Web Scraping Data Preprocessing machine learning model
- Recommender systems in python
- Content based filtering
📌 SQL Fundamentals, CRUD Operations & Setting Environment - https://lnkd.in/ekBxGU2c
📌 Primary Key vs Unique Key, Auto Increment Values - https://lnkd.in/eXSugBVX
📌 DDL vs DML, Truncate vs Delete - https://lnkd.in/eCEj6NHc
📌 Foreign Key Constraint - https://lnkd.in/ebfYyM2b
📌 Distinct, Order By, Limit, Like Keyword - https://lnkd.in/ec-McKnC
📌 Order of execution in SQL - https://lnkd.in/eShPzDCJ
📌 Aggregate Functions in SQL - https://lnkd.in/e2HQQZj3
📌 Datatypes in SQL - https://lnkd.in/eJ7prXMR
📌 Logical Operators in SQL - https://lnkd.in/eubjUHeD
📌 Joins in SQL - https://lnkd.in/e63jvjec
📌 Difference between where and having in mysql - https://lnkd.in/eTwb9pcJ
📌 Over Clause & Partition By Clause - https://lnkd.in/ewspqCVS
📌 Row Number Function in MySQL - https://lnkd.in/eK9-Ef4P
📌 Rank & Dense Rank - https://lnkd.in/en83Pr5V
SQL Advanced (2 videos)
📌 CTE in SQL - https://lnkd.in/e-cKsd89
📌 SQL internals - https://lnkd.in/erwxZY8J
SQL Leetcode (8 problems)
📌 LeetCode 175 - combine 2 tables https://lnkd.in/eMmX8DQa
📌 Leetcode 176 - Second highest salary (3 approaches) https://lnkd.in/eaDRxzSd
📌 Leetcode 178,180,181 - Rank Scores, Consecutive Numbers, Employees Earning More Than Their Managers https://lnkd.in/efibXrXG
📌 Leetcode 182,183 - Duplicate Emails, Customers Who Never Order https://lnkd.in/e9ZSr9s2
📌 LeetCode 184 - Department Highest Salary https://lnkd.in/evpeZrJh
📌 Link to subscribe to my youtube channel - https://lnkd.in/geJt-sMS
- Arun Jagota - Published in Towards Data Science
- List of awesome ML Learning
- BERT
- PDF table extractions
- SQL
- data storytelling
- quantide - Chapter 6 | Creating R packages
- analyticsvidhya - How I created a package in R & published it on CRAN / GitHub
- hvitfeldt - usethis-workflow-for-package-development
- kbroman - Writing vignettes
- r-pkgs - Releasing a package
- r-bio - An introduction to Git and how to use it with RStudio
- stackoverflow - R CMD check --as-cran warning
- mjdenny - R Package Development Pictorial
Build package
devtools::document(roclets=c('rd', 'collate', 'namespace', 'vignette'))
devtools::build()
devtools::use_news_md()
devtools::use_code_of_conduct()
devtools::use_cran_badge()
devtools::use_cran_comments()
Upload it to CRAN
devtools::submit_cran()
Developing github web page using below package
pkgdown::build_site().
- basic writing and formatting syntax
- pdf document format
- authoring basics
- kableExtra
- knitr-markdown
- rmarkdown.rstudio.com
library(hexSticker)
library(UCSCXenaTools)
library(extrafont)
font_import()
loadfonts(device = "win")
library(showtext)
## Loading Google fonts (http://www.google.com/fonts)
font_add_google("Archivo Black", "arb")
imgurl <- image_read("man/figures/dml_icon3.png")
## Design Stickers
sticker(imgurl,
s_width = 1,
s_height = 1.2,
package="MyRpackage", p_size=20, s_x=1, s_y=.75,
url = "https://CRAN.R-project.org/package=MyRpackage",
u_color = "white", u_size = 3,
h_fill="dodgerblue4", h_color='blue',
p_family ="arb",
filename="man/figures/dml_logo.png")
For examples to check the number of downloads of 'data.table' R package