In Logistic Regression, all we need to know is a sigmoind function, which scales our predictions to (0, 1) and its object function, which uses MLE to find the parameters. Object function is also the negative cross entropy. So we either use gradient ascent to update parameters using MLE or use gradient descent using cross entropy. People tend to use the latter one.
In statistics, the odds for or odds of some events reflect the likelihood that the event will take place while odds against reflect the likelihood that it will not.
The odds of event is defined as the ratio of the probability when it happens and the probability when it doesn't happen.
The log of the probability of the 1 label divided by the probability of the 0 label.
Classification is the process of predicting the class of given data points.
Logistic Regression was used in the biological sciences in early twentieth century.
The reason we don't use linear regression here is that linear regression is unbounded, whereas logistic regression strictly ranges from 0 to 1.
Variance It measures how difference our function is from the expected classifier given different dataset. Variance is an error because of the high dimensionality complexity of our model. We try too hard to fit on the given training dataset. Some noise(different than the noise mentioned above, think of exception/outliers) is brought to model. We put too much faith in it. This is overfitting.
Use validation dataset to validate models
Good performance that the model gave us on the train dataset doesn't represent it will perform well on unknown dataset. The ability of being generalized is important, which is why we alwasys use validation dataset to validate our model before we apply it into untouch dataset or test dataset for real porformance.