Mungeol Heo: ML

Friday, December 30, 2016

Recommended approach
- Start with a simple algorithm that you can implement quickly. Implement it and test it on your cross-validation data.
- Plot learning curves to decide if more data, more features, etc.
- Error analysis
  - Manually examine the examples (in cross validation set) that your algorithm made errors on. See if you spot any systematic trend in what type of examples it is making errors on.
How to decrease test error
- Get more training examples (fixes high variance)
- Try smaller sets of features (fixes high variance)
- Try getting additional features (fixes high bias)
- Try adding polynomial features (fixes high bias)
- Try tuning parameters
  - Regularization
    - Try decreasing lambda (fixes high bias)
    - Try increasing lambda (fixes high variance)
  - other
solutions for overfitting (high variance)
- reduce the number of features
  - model selection
- regularization
  - keep all the features, but reduce magnitude/values of parameters theta j
  - works well when we have a lot of features, each of which contributes a bit to predicting y
Terminology
- Underfit = high bias
- Overfit = high variance

Mungeol Heo