Confusion matrix is a table which contains information about predicted values and actual values in a classification model
It has four parts namely true positive ,true negative, false positive and false negative
It can be used to calculate accuracy, precision and recall
While performing the an experiment hypothesis testing to is used to analyze the various factors that are assumed to have an impact on the outcome of experiment An hypothesis is some kind of assumption and hypothesis testing is used to determine whether the stated hypothesis is true or not
Initial assumption is called null hypothesis and the opposite alternate hypothesis
In hypothesis testing, p value helps to arrive at a conclusion. When p -value is too small then null hypothesis is rejected and alternate is accepted. When p- value is large then null hypothesis is accepted.
Type-I error is we reject the null hypothesis which was supposed to be accepted. It represents false positive
Type-II error represents we accept the null hypothesis which was supposed to be rejected. It represents false negative.
- Deletion of values
- Guess the value
- Average Substitution
- Regression based substitution
- Multiple Imputation
When building a statistical model the objective is reduce the value of the cost function that is associated with the model. Gradient descent is an iterative optimization technique used to determine the minima of the cost function
Supervised learning are the class of algorithms in which model is trained by explicitly labelling the outcome. Ex. Regression, Classification
Unsupervised learning no output is given and the algorithm is made to learn the outcomes implicity Ex. Association, Clustering
Regularization is used to penalize the model when it overfits the model. It predominantly helps in solving the overfitting problem.
High Bias is an underlying error wrong assumption that makes the model to underfit. High Variance in a model means noise in data has been too taken seriously by the model which will result in overfitting.
Typically we would like to have a model with low bias and low variance
- Introduce Regularization
- Perform Cross Validation
- Reduce the number of features
- Increase the number of entries