## Machine Learning

- What is the bias-variance tradeoff?
- How is KNN different from k-means?
- How would you implement the k-means algorithm?
- How do you choose the k in k-means clustering?
- What are the pros and cons of the k-means algorithm?
- What does an ROC curve show?
- What is the difference between a type 1 and type 2 error?
- Define precision and recall.
- What is k-fold cross validation?
- Explain what a false positive and a false negative are. Provide examples when false positives are more important than false negatives, false negatives are more important than false positives.
- When would you use random forests vs. SVM and why?
- Why is dimension reduction important?
- What is principal component analysis? Explain the sort of problems you would use PCA for.
- What are the assumptions required for linear regression? What if some of these assumptions are violated?
- What are some of the steps for data wrangling and data cleaning before applying machine learning algorithms?
- What is multicollinearity and how do we deal with it?
- You are given a dataset on cancer detection. You have built a classification model and achieved an accuracy of 98%. Is this model ready to be used in production?
- How would you evaluate an algorithm on unbalanced data?