Mungeol Heo

The recommended approach
- ML
- - Start with a simple algorithm that you can implement quickly. Implement it and test it on your cross-validation data.
  - Plot learning curves to decide if more data, more features, etc.
  - Error analysis
    - Manually examine the examples (in cross-validation set) that your algorithm made errors on. See if you spot any systematic trend in what type of examples it is making errors on.
- DL
  - Try to have the same number of hidden units in every layer
    - Usually the more the better, but computational
  - ReLU variants
    - Softplus, leaky relu, prelu, relu6, elu
- Common
  - Learning rate
  - Learning rate automation
    - https://www.tensorflow.org/api_docs/python/tf/keras/optimizers
      - Some optimizers support it
    - https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/schedules
  - Continue to verify and monitor your data since it may change for many reasons in reality
  - Stop learning in particular circumstances
    - E.g. service failure -> only a few users can access the service -> incomplete/incorrect data
- Transfer learning
  - Where to cut?
  - Do I make the source models weights trainable, as in allowing to change values during subsequent model training or do I make them constant?
  - Whether to make the pre-trained embeddings trainable or not.
  - Pretrained embedding
    - Tensorflow hub
      - E.g.
        https://github.com/mungeol/training-data-analyst/blob/master/courses/machine_learning/deepdive/09_sequence/reusable-embeddings.ipynb
  - Pretrained model
    - Tensorflow hub
      - https://www.tensorflow.org/hub
    - Keras applications
      - https://keras.io/applications/
Rules of Machine Learning: Best Practices for ML Engineering

See also
- Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?
- Choosing the right estimator

Mungeol Heo

Wednesday, August 22, 2018

AI

No comments:

Post a Comment