# Part-time data science masters

Why and where to pursue masters in data science part-time.

# Probability and discrete distributions

Revision of probability and discrete distributions.

# Linear Algebra – Matrices

Notes on Matrices. Matrix multiplication, Transpose, Range, rank, Null space, Determinant

# Linear Algebra – Vectors

Notes on vectors and their properties. Explains spans, subspaces, dot products, basis, norms etc.

# Handling imbalanced classes

This blog talks about handling imbalance in data for classification using different sampling methods.

# IIMB Business Analytics and Intelligence

A review on the BAI executive course at IIM Bangalore.

# The math behind ANN (ANN- Part 2)

Discussing the theory behind ANN and the math behind back-propagation.

# Artificial Neural Network – Part 1

First among the series of blogs on ANN. Implemented 'AND-gate' logic as an ANN from scratch.

# CART classification

Step by step explanation of CART decision tree classification using Titanic dataset.

# Curse of dimensionality

Explaining the curse of dimensionality using a relevant example

# CHAID decision trees

Step by step explanation of CHAID decision trees using the Titanic data set

# K means clustering

K means clustering explained using customer segmentation in R. Touches on Silhouette statistic, Calinski and Harabasz index and Elbow curve.

# Exploratory factor analysis

After a brief introduction to PCA and CFA, hypothesis tests like KMO,Bartlett's test of sphericity are introduced. In PCA, Scree plot, eigenvalues, validation and interpreting the factors is discussed.

# Stationarity tests

Dickey fuller unit root test and Ljung box independence tests are discussed using attendance data set.

# Hierarchical Clustering

Blog on hierarchical clustering using dendogram for beer customer segmentation.

# Stationarity

Discussion about stationary, random walk, deterministic drift and other vocabulary related to time series

# ARIMA

ARIMA using the Box-Jenkins approach. Discussed Dickey fuller, Ljung−Box Test and KPSS tests. Built and validated a forecast for in-time data in attendance data.

# Analytic Hierarchy Process

A multi criteria decision of selecting a phone is explained using AHP.

Forecasting sales of new products using Bass model. Calculating p, q and m for iPhone sales using gradient descent. Cool visualizations and code provided.

Customer Lifetime value and steady state retention probability using Markov chains. Markov chains, steady state, homogeneity and Anderson− Goodman test and CLT explained. Used data from UCI m/c learning repository.

# Inventory planning model

A multi period integer programming model was used to predict when and by how much quantity a new purchase has to be done in a Kirana store.

# Linear Programming

Linear programming in R along with sensitivity analysis and cool visualizations.

# Linear regression

A complete analytical journey of linear regression. From EDA, model building, model diagnostics, residual plots, outlier treatment, co-linearity effects, transformation of variables, model re-building and validation for Boston housing price prediction problem.

# Part and partial correlation

Understanding part (semi partial) and partial correlation coefficients in multiple regression model. Deriving the multiple R-Squared and beta coefficients from basics. Inspired from Business Analytics: The Science of Data-Driven Decision Making by Dinesh Kumar.

# Logistic Regression

A complete walk through of logistic regression. From EDA to model diagnostics with cool plots.

# KNN imputation

Handling missing values in original mtcars data set by imputation using KNN algorithm.

# Recommendation systems

Recommendation systems using associate mining rules

# Chi Square test of independence

This post deals with Chi Square test of independence. Along with the R code, a contingency table, and mosaic plot are also presented.

# Chi-Square goodness of fit test

A brief introduction to ChiSquare goodness of fit test using attendance data. Codes for Chi-sq plots along with post-hoc Cramers V are available.

# Analysis of variance (Anova)

Anova hypothesis test on unemployment data. Post hoc analysis and visualisations are presented.

# Hypothesis test for population parameters

Discussion on hypothesis testing. Introduction to z-test and t-test, Code for visualization of z-test and t-test.

# Why are basics important?

Explains why basics are important using a simple example.

# Time series EDA

Tutorial on Time Series EDA. Contains time plot, seasonal plots and correlogram plots (ACF) for in-time problem with reusable R code.

# Multicollinear analysis

Tutorial on Multicollinearity which is the third part of EDA. Plot of Correlation matrix and network for in-time problem with reusable code.

# Multivariate Analysis

Tutorial on Multivariate analysis which is the second part of EDA. Explained using in-time problem with reusable R code.

# Handling Google maps location data

Getting traffic, vehicle used, location and journey time from Google Maps. Integrating these factors for in-time problem.

# Get started with Python | Notebook for Beginner’s level

Explanation of class size paradox using Amrita University placement data. Contains reusable R code for web scraping.

# Univariate Analysis on in-time

Tutorial on Univariate analysis which is the first part of EDA. Explained using in-time problem with reusable R code.