K means clustering

K means clustering explained using customer segmentation in R. Touches on Silhouette statistic, Calinski and Harabasz index and Elbow curve.
Read More

Exploratory factor analysis

After a brief introduction to PCA and CFA, hypothesis tests like KMO,Bartlett's test of sphericity are introduced. In PCA, Scree plot, eigenvalues, validation and interpreting the factors is discussed.
Read More

Stationarity

Discussion about stationary, random walk, deterministic drift and other vocabulary related to time series
Read More

ARIMA

ARIMA using the Box-Jenkins approach. Discussed Dickey fuller, Ljung−Box Test and KPSS tests. Built and validated a forecast for in-time data in attendance data.
Read More

Adoption of new product

Forecasting sales of new products using Bass model. Calculating p, q and m for iPhone sales using gradient descent. Cool visualizations and code provided.
Read More

Customer Lifetime Value

Customer Lifetime value and steady state retention probability using Markov chains. Markov chains, steady state, homogeneity and Anderson− Goodman test and CLT explained. Used data from UCI m/c learning repository.
Read More

Inventory planning model

A multi period integer programming model was used to predict when and by how much quantity a new purchase has to be done in a Kirana store.
Read More

Linear regression

A complete analytical journey of linear regression. From EDA, model building, model diagnostics, residual plots, outlier treatment, co-linearity effects, transformation of variables, model re-building and validation for Boston housing price prediction problem.
Read More

Part and partial correlation

Understanding part (semi partial) and partial correlation coefficients in multiple regression model. Deriving the multiple R-Squared and beta coefficients from basics. Inspired from Business Analytics: The Science of Data-Driven Decision Making by Dinesh Kumar.
Read More

Time series EDA

Tutorial on Time Series EDA. Contains time plot, seasonal plots and correlogram plots (ACF) for in-time problem with reusable R code.
Read More

Multicollinear analysis

Tutorial on Multicollinearity which is the third part of EDA. Plot of Correlation matrix and network for in-time problem with reusable code.
Read More

Multivariate Analysis

Tutorial on Multivariate analysis which is the second part of EDA. Explained using in-time problem with reusable R code.
Read More

Class size paradox

Explanation of class size paradox using Amrita University placement data. Contains reusable R code for web scraping.
Read More

Extracting data from mechanical models

Recently I came across problems trying to automate simple engineering routines, and the first step in a large number of these problems is extracting data from already existing models for further analysis.

Read More

The boy girl paradox

Imagine that a family has two children, one of whom we know to be a boy. What then is the probability that the other child is a boy? The obvious answer is to say that the probability is 1/2—after all, the other child can only be either a boy or a girl, and the chances of a baby being born a boy or a girl are (essentially) equal. In a two-child family, however, there are actually four possible combinations of children: two boys (MM), two girls (FF), an older boy and a younger girl (MF), and an older girl and a younger boy (FM). We already know that one of the children is a boy, meaning we can eliminate the combination FF, but that leaves us with three equally possible combinations of children in which at least one is a boy—namely MM, MF, and FM. This means that the probability that the other child is a boy—MM—must be 1/3, not 1/2.

Read More