Top 500 Data Science Questions and Answers

What is confusion matrix?

Confusion matrix is a table which contains information about predicted values and actual values in a classification model

It has four parts namely true positive ,true negative, false positive and false negative

It can be used to calculate accuracy, precision and recall

What is hypothesis testing?

While performing the an experiment hypothesis testing to is used to analyze the various factors that are assumed to have an impact on the outcome of experiment An hypothesis is some kind of assumption and hypothesis testing is used to determine whether the stated hypothesis is true or not

Initial assumption is called null hypothesis and the opposite alternate hypothesis

What is a p-value in statistics?

In hypothesis testing, p value helps to arrive at a conclusion. When p -value is too small then null hypothesis is rejected and alternate is accepted. When p- value is large then null hypothesis is accepted.

What is gradient descent?

When building a statistical model the objective is reduce the value of the cost function that is associated with the model. Gradient descent is an iterative optimization technique used to determine the minima of the cost function

Difference between bias and variance tradeoff?

High Bias is an underlying error wrong assumption that makes the model to underfit. High Variance in a model means noise in data has been too taken seriously by the model which will result in overfitting.

Typically we would like to have a model with low bias and low variance

How to solve overfitting?

  • Introduce Regularization
  • Perform Cross Validation
  • Reduce the number of features
  • Increase the number of entries
  • Ensembling