Wednesday, August 22, 2018

AI - Tuning

  • Optimal probability cutoff point
    • 2017 Mastering Machine Learning with Python in Six Steps
  • Bias and Variance

    •  
    • High variance


    • High bias
      •  

    • Bias = Optimal error rate (“unavoidable bias”) + Avoidable bias = training error
    • Optimal error rate / unavoidable bias
      • Use human-level performance to estimate the optimal error rate and also set achievable “desired error rate.”
    • Avoidable bias
      • More complex model
        • DL: increase the  model size, such as the number of neurons/layers
      • More features
      • More polynomial features
      • Reduce or eliminate regularization
    • Variance = dev error - training error
      • More training data
        • Collect more
        • Data augmentation
      • Regularization
        • Works well when we have a lot of features, each of which contributes a bit to predicting y
      • Early stoping
      • Fewer features
        • Model selection or feature selection
        • Dimension reduction
      • Noise robustness
      • Sparse representation

      • More simple model
        • Try others first
        • DL: decrease the  model size, such as the number of neurons/layers
      • If you find that your dev set performance is much better than your test set performance, it is a sign that you have overfitted to the dev set. 

        • In this case, get a fresh dev set/get more dev set data

    • Both
      • Choosing the right model parameters
        • Regularization
          • Try decreasing lambda (fixes high bias)
          • Try increasing lambda (fixes high variance)
      • Modify input features based on insights from error analysis
      • Modify model architecture
        • Such as neural network architecture, so that it is more suitable for your problem
  • Data Mismatch


    • Try to understand what properties of the data differ between the training and the dev set distributions.
    • Try to find more training data that better matches the dev set examples that your algorithm has trouble with.
  •  Regularization
    • L1 or lasso
    • L2 or ridge
      • When to use l1 and l2

        • In practice, usually, the L2-norm provides more generalizable models than the L1 norm. However, we will end up with much more complex heavy models if we use L2 instead of L1. This happens because often features have a high correlation with each other, and L1 regularization which use one of them and throw the other away, whereas L2 regularization will keep both features and keep their weight magnitudes small. So with L1, you can end up with a smaller model but it may be less predictive.

    • Elastic Net
      • The elastic net is just a linear combination of the L1 and L2 regularizing penalties. This way, you get the benefits of sparsity for really poor predictive features while also keeping decent and great features with smaller weights to provide a good generalization. The only trade-off now is there are two instead of one hyperparameters to tune with the two different Lambda regularization parameters.
    • Dropout
      • When to use dropout

        • You also want to use this on larger networks because there is more capacity for the model to learn independent representations. In other words, there are more possible paths for the network to try. The more you drop out, therefore the less you keep, the stronger the regularization.

  • Hyperparameter
    • Tune Hyperparameters When Comparing Models
  • Neurons stop learning
    • Lower the learning rate

      • Increase the number of epoch or steps

      • Learn slow

    • Use other activation function, like leaky Relu
    • Use dropout

      • Limit the ability to learn

    • Batch normalization

      • weight normalization, layer normalization, self normalizing networks

      • Redesign the network

        • Identity shortcut

        • have auxiliary outputs at intermediate layers in the network

        • have alternate routes through the network that are shorter

      • Train faster

  • Visualization
    • Tensorboard
    • TFDV 

      • Tensorflow Data Validation
      • Monitor the difference between training, validation and test dataset

    • TFMA

      • TensorFlow Model Analysis
      • Check the ROC curve for each class

      • Check hourly performance

  • Error analysis
    • 2018 Machine learning yearning
      • P30, P32, P52
  • The Optimization Verification test
    • 2018 Machine learning yearning
      • P85

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.