Data Science Interview Questions and Answers
Q1. What is a feature vector?
A feature vector is an n-dimensional vector that features some number values that depicts some object. In machine learning concept, feature vectors are used to illustrate numeric or figurative characteristics – which are usually called features – of an object in a mathematical form.
Q2. How to make a decision tree.
There are some step-by-step methods to build a decision-making tree.
- First of all take the whole data set as an input.
- inspect for a split that increases the separation of the classes. A split is any test that dissects the data into two different sets.
- Apply that split to the input data and divide the step.
- Apply once again the steps 1 to 2 to the divided data.
- pause when we meet some stopping basis.
- Now this step is known as pruning. Here we are cleaning up the tree if we went too far ahead doing splits.
Q3. Explain root cause analysis.
Root cause analysis was at the beginning evolved to analyze industrial mishaps and incidents but is now widely used in other sectors as well. this technique helps us to solve various problems that are used for taking out the root causes of mistakes or other serious problems. A factor is usually called a root cause if its removal from the problem sequence avoid the ending undesirable event from happening again.
Q4. What is logistic regression?
Logistic Regression is usually called as the logit model. This technique foresees and predicts the binary end result from a linear combination of predictor variables.
Q5. What do you mean by Recommender Systems?
Recommender systems are a subsidiary of information distilling systems that are prone to forecast the predispositions or ratings that a user would give to a particular product.
Q6. What is cross-validation?
It is a model validation technique for assessing the formation of a statistical analysis and the common to an individualistic data set. It is chiefly utilized in backgrounds where the objective is predictive and one goes on to evaluate how exactly a model will fulfil in action. The aim of cross-validation is to label a data set to experiment the model in the training stage so that we can limit the problems like overfitting and gather ideas on how the model will become common to an unfettered data set.
Q7. Explain Collaborative Filtering?
The filtering procedure used by several recommender systems to find patterns and guidance by working with various perspectives, many data sources, and a lot of agents.
Q8. Do gradient descent methods boil down to the same point at the end?
No, they do not usually meet at the similar point. Because in some cases they attain a local minima or a local optima point, where it is easier to set the point. But we will not reach the global optima point. This is managed by the data and the starting optimization.
Q9. What is the objective of A/B Testing?
This is a statistical hypothesis testing for experiments with random values with two different variables, which is namely A and B. The goal of A/B testing is to figure out any alterations to a given web page to heighten or increase the end reault of a strategic work plan.
Q10. What are some of the disadvantage of the linear model?
Some notable downsides of the linear model are:
- The supposition of linearity of the errors.
- It cannot be utilized for calculating outcomes or binary outcomes.
- There are overfitting issues that cannot be easily solved.
Q11. What are the Laws of Large Numbers?
In fact, it is a proposition that explains the result of performing the same experiment over and over again. This theorem set in motion the basis of a kind of thinking that depends on the frequency. It depicts that the sample mean, the sample variance and the sample standard deviation comes together to what they are trying to really calculate.
Q12. What are the confounding variables?
These are generally irrelevant variables in a statistical model that correspond directly or indirectly with both the dependent and the independent variables. The calculation fails and falls flat to compensate for the confounding factor.
Q13. What is star schema?
It is a usual database schema with a table at its centre. Satellite tables map identifies to physical names or explanations and can be linked to the central fact table using the identifiaction roles; these tables are called as lookup tables and are usually useful in real-time applications, as they save a lot of storage memory. Sometimes star schemas require many layers of short descriptions to recuperate information even more faster.
Q14. How often should an algorithm be updated?
- we need to upgrade an algorithm when we need the model to develop as data flows through the infrastructure.
- If the subtle data source is altering.
- If there is an instance of non-stationarity.
Q15. What are Eigenvalue and Eigenvector?
Eigenvectors are used for comprehending linear transformations. In data analysis, we do estimate the eigenvectors for a interconnection or covariance matrix. Eigenvalues are the directions along which a specific kind of linear transformation works by flipping, compressing or stretching.
Q16. What are the purposes of resampling?
- Calculating the exactness of sample statistics with subsets of attainable data or bringing randomly with replacement from a set of data point functions.
- Replacing labels on data points when working with significance tests.
- Validating models by using random subsets such as bootstrapping and cross validation.
Q17. What is selective bias?
Selection bias, as usual, is a difficult situation where error is initiated because of a non-random population sample model.
Q18. What are the three types of biases that can happen during sampling?
There are usually three major kinds of biases that are prevalent during sampling process.
- Selection bias
- Under coverage bias
- Survivorship bias
Q19. what is a survivorship bias?
It is the logical error when we do focus the aspects that helps surviving some process and easily looking for those that did not because of their inability to focus in the prominence. This can pave the way to wrong conclusions in multiple different means.