- Optimal probability cutoff point
- 2017 Mastering Machine Learning with Python in Six Steps
- Bias and Variance
- High variance
- High bias
-
- Bias = Optimal error rate (“unavoidable bias”) + Avoidable bias = training error
- Optimal error rate / unavoidable bias
- Use human-level performance to estimate the optimal error rate and also set achievable “desired error rate.”
- Avoidable bias
- More complex model
- DL: increase the model size, such as the number of neurons/layers
- More features
- More polynomial features
- Reduce or eliminate regularization
- More complex model
- Variance = dev error - training error
- More training data
- Collect more
- Data augmentation
- Regularization
- Works well when we have a lot of features, each of which contributes a bit to predicting y
- Early stoping
- Fewer features
- Model selection or feature selection
- Dimension reduction
- Noise robustness
Sparse representation
- More simple model
- Try others first
- DL: decrease the model size, such as the number of neurons/layers
If you find that your dev set performance is much better than your test set performance, it is a sign that you have overfitted to the dev set.
In this case, get a fresh dev set/get more dev set data
- More training data
- Both
- Choosing the right model parameters
- Regularization
- Try decreasing lambda (fixes high bias)
- Try increasing lambda (fixes high variance)
- Regularization
- Modify input features based on insights from error analysis
- Modify model architecture
- Such as neural network architecture, so that it is more suitable for your problem
- Choosing the right model parameters
- Data Mismatch
- Try to understand what properties of the data differ between the training and the dev set distributions.
- Try to find more training data that better matches the dev set examples that your algorithm has trouble with.
- Regularization
- L1 or lasso
- L2 or ridge
When to use l1 and l2
In practice, usually, the L2-norm provides more generalizable models than the L1 norm. However, we will end up with much more complex heavy models if we use L2 instead of L1. This happens because often features have a high correlation with each other, and L1 regularization which use one of them and throw the other away, whereas L2 regularization will keep both features and keep their weight magnitudes small. So with L1, you can end up with a smaller model but it may be less predictive.
- Elastic Net
- The elastic net is just a linear combination of the L1 and L2 regularizing penalties. This way, you get the benefits of sparsity for really poor predictive features while also keeping decent and great features with smaller weights to provide a good generalization. The only trade-off now is there are two instead of one hyperparameters to tune with the two different Lambda regularization parameters.
- Dropout
When to use dropout
You also want to use this on larger networks because there is more capacity for the model to learn independent representations. In other words, there are more possible paths for the network to try. The more you drop out, therefore the less you keep, the stronger the regularization.
- Hyperparameter
- Tune Hyperparameters When Comparing Models
- Neurons stop learning
Lower the learning rate
Increase the number of epoch or steps
Learn slow
- Use other activation function, like leaky Relu
Use dropout
Limit the ability to learn
Batch normalization
weight normalization, layer normalization, self normalizing networks
Redesign the network
Identity shortcut
have auxiliary outputs at intermediate layers in the network
have alternate routes through the network that are shorter
Train faster
- Visualization
- Tensorboard
TFDV
- Tensorflow Data Validation
Monitor the difference between training, validation and test dataset
TFMA
- TensorFlow Model Analysis
Check the ROC curve for each class
Check hourly performance
- Error analysis
- 2018 Machine learning yearning
- P30, P32, P52
- 2018 Machine learning yearning
- The Optimization Verification test
- 2018 Machine learning yearning
- P85
- 2018 Machine learning yearning
Wednesday, August 22, 2018
AI - Tuning
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.