Mungeol Heo: ML - Algorithm

Correlation
- Spark
  - Pearson
    - val houses = spark.createDataFrame(Seq( (1620000d,2100), (1690000d,2300), (1400000d,2046), (2000000d,4314), (1060000d,1244), (3830000d,4608), (1230000d,2173), (2400000d,2750), (3380000d,4010), (1480000d,1959) )).toDF("price","size")
    - houses.stat.corr("price","size")
    - Or
    - https://spark.apache.org/docs/latest/mllib-statistics.html#correlations
  - Spearman
    - https://spark.apache.org/docs/latest/mllib-statistics.html#correlations
Regularization
- L1 or lasso
- L2 or ridge
gradient descent (parameter learning)
- (batch) gradient descent
  - small data
  - choose learning rate
- stochasitc gradient descent
  - big data
  - online / streaming learning
  - data shuffering
  - step size
    - learning curve
    - step size that decreases with iterations is very important
  - coefficient
    - never use lastet learned coefficients
      - never converge
    - use everage
  - regularization
  - mini batch (suggested)
    - batch size = 100 (general)
- conjugate gradient, BFGS and L-BFGS V.S. gradient descent
  - advantages
    - no need to manually pick alpha whih is learning rate
    - often fater than gradient descent
  - disadvantage
    - more complex
normal equation
- versus gradient descent
  - no need to choose alpha
  - do not need to iterate
  - slow if features number is very large
- feature scaling is not actually necessary
- when X transpose X will be non-invertible?
  - redundant features
    - e.g. x1 = size of feet ^ 2, x2 = size of m ^ 2 -> x1 = (3.28) ^ 2 * x2
  - too many features
    - e.g. 100 features and 10 training data
  - solutions
    - delete some features
    - use regularization
linear regression
- y = continious value (e.g. house price)
logistic regression
- classification
  - y = 1, 0
- multiclass classification
  - y = 1, 2, 3 ...
  - one-vs-all (one-vs-rest)
boosting
- face detection, malware classification, credit fraud detection, ads click through rate estimation, sales forecasting, ranking webpages for search, higgs boson detection
  - netflex
- amsemble methods
  - adaBoost
    - basic classification
  - gradient boosting
    - beyond basic classification
  - random forests
    - bagging
    - simpler
    - easier to parallelize
    - typically higher error
neural networks
- try to have same number of hidden units in every layer
  - usually the more the better, but computational
- random initialization
- forward propagation
- compute cost function
- back propagation
- gradient checking
stemming
- for text mining / classifier

Mungeol Heo

Friday, December 30, 2016

ML - Algorithm

No comments:

Post a Comment