Mungeol Heo: AI

Optimal probability cutoff point
- 2017 Mastering Machine Learning with Python in Six Steps
Bias and Variance
- High variance
- High bias
- Bias = Optimal error rate (“unavoidable bias”) + Avoidable bias = training error
- Optimal error rate / unavoidable bias
  - Use human-level performance to estimate the optimal error rate and also set achievable “desired error rate.”
- Avoidable bias
  - More complex model
    - DL: increase the model size, such as the number of neurons/layers
  - More features
  - More polynomial features
  - Reduce or eliminate regularization
- Variance = dev error - training error
  - More training data
    - Collect more
    - Data augmentation
  - Regularization
    - Works well when we have a lot of features, each of which contributes a bit to predicting y
  - Early stoping
  - Fewer features
    - Model selection or feature selection
    - Dimension reduction
  - Noise robustness
  - Sparse representation
  - More simple model
    - Try others first
    - DL: decrease the model size, such as the number of neurons/layers
  - If you find that your dev set performance is much better than your test set performance, it is a sign that you have overfitted to the dev set.
    - In this case, get a fresh dev set/get more dev set data
- Both
  - Choosing the right model parameters
    - Regularization
      - Try decreasing lambda (fixes high bias)
      - Try increasing lambda (fixes high variance)
  - Modify input features based on insights from error analysis
  - Modify model architecture
    - Such as neural network architecture, so that it is more suitable for your problem
Data Mismatch
- Try to understand what properties of the data differ between the training and the dev set distributions.
- Try to find more training data that better matches the dev set examples that your algorithm has trouble with.
Regularization
- L1 or lasso
- L2 or ridge
  - When to use l1 and l2
- Elastic Net
  - The elastic net is just a linear combination of the L1 and L2 regularizing penalties. This way, you get the benefits of sparsity for really poor predictive features while also keeping decent and great features with smaller weights to provide a good generalization. The only trade-off now is there are two instead of one hyperparameters to tune with the two different Lambda regularization parameters.
- Dropout
  - When to use dropout
Hyperparameter
- Tune Hyperparameters When Comparing Models
Neurons stop learning
- Lower the learning rate
  - Increase the number of epoch or steps
- Use other activation function, like leaky Relu
- Use dropout
- Batch normalization
  - weight normalization, layer normalization, self normalizing networks
  - Redesign the network
    - Identity shortcut
Visualization
- Tensorboard
  - https://tensorboard.dev/
- TFDV
- TFMA
Error analysis
- 2018 Machine learning yearning
  - P30, P32, P52
The Optimization Verification test
- 2018 Machine learning yearning
  - P85

Mungeol Heo

Wednesday, August 22, 2018

AI - Tuning

No comments:

Post a Comment