Linear Classification#


Logistic Regression#

In Logistic Regression, all we need to know is a sigmoind function, which scales our predictions to (0, 1) and its object function, which uses MLE to find the parameters. Object function is also the negative cross entropy. So we either use gradient ascent to update parameters using MLE or use gradient descent using cross entropy. People tend to use the latter one.



In statistics, the odds for or odds of some events reflect the likelihood that the event will take place while odds against reflect the likelihood that it will not.

The odds of event is defined as the ratio of the probability when it happens and the probability when it doesn't happen.


z=log(y1y)z = \log (\frac{y}{1-y})

The log of the probability of the 1 label divided by the probability of the 0 label.

Loss function for Logistic Regression#




Decision Tree#

Random Forest#


Linear Separable#

Q \& A#

What is classification? Which models would you use to solve a classification problem?#

Classification is the process of predicting the class of given data points.

What is logistic regression? When do we need to use it?#

Logistic Regression was used in the biological sciences in early twentieth century.

The reason we don't use linear regression here is that linear regression is unbounded, whereas logistic regression strictly ranges from 0 to 1.


What is overfitting?#

Variance It measures how difference our function is from the expected classifier given different dataset. Variance is an error because of the high dimensionality complexity of our model. We try too hard to fit on the given training dataset. Some noise(different than the noise mentioned above, think of exception/outliers) is brought to model. We put too much faith in it. This is overfitting.

How to validate your models?#

Use validation dataset to validate models

Why do we need to split our data into three parts: train, validation, and test?#

Good performance that the model gave us on the train dataset doesn't represent it will perform well on unknown dataset. The ability of being generalized is important, which is why we alwasys use validation dataset to validate our model before we apply it into untouch dataset or test dataset for real porformance.

Can you explain how cross-validation works?#

  • Exhaustive cross-validation
  • Leave-p-out cross-validation

What is K-fold cross-validation?#

How do we choose K in K-fold cross-validation? What’s your favorite K?#