- Correlation
- Spark
- Pearson
- val houses = spark.createDataFrame(Seq( (1620000d,2100), (1690000d,2300), (1400000d,2046), (2000000d,4314), (1060000d,1244), (3830000d,4608), (1230000d,2173), (2400000d,2750), (3380000d,4010), (1480000d,1959) )).toDF("price","size")
- houses.stat.corr("price","size")
- Or
- https://spark.apache.org/docs/latest/mllib-statistics.html#correlations
- Spearman
- Pearson
- Spark
- Regularization
- L1 or lasso
- L2 or ridge
- gradient descent (parameter learning)
- (batch) gradient descent
- small data
- choose learning rate
- stochasitc gradient descent
- big data
- online / streaming learning
- data shuffering
- step size
- learning curve
- step size that decreases with iterations is very important
- coefficient
- never use lastet learned coefficients
- never converge
- use everage
- never use lastet learned coefficients
- regularization
- mini batch (suggested)
- batch size = 100 (general)
- conjugate gradient, BFGS and L-BFGS V.S. gradient descent
- advantages
- no need to manually pick alpha whih is learning rate
- often fater than gradient descent
- disadvantage
- more complex
- advantages
- (batch) gradient descent
- normal equation
- versus gradient descent
- no need to choose alpha
- do not need to iterate
- slow if features number is very large
- feature scaling is not actually necessary
- when X transpose X will be non-invertible?
- redundant features
- e.g. x1 = size of feet ^ 2, x2 = size of m ^ 2 -> x1 = (3.28) ^ 2 * x2
- too many features
- e.g. 100 features and 10 training data
- solutions
- delete some features
- use regularization
- redundant features
- versus gradient descent
- linear regression
- y = continious value (e.g. house price)
- logistic regression
- classification
- y = 1, 0
- multiclass classification
- y = 1, 2, 3 ...
- one-vs-all (one-vs-rest)
- classification
- boosting
- face detection, malware classification, credit fraud detection, ads click through rate estimation, sales forecasting, ranking webpages for search, higgs boson detection
- netflex
- amsemble methods
- adaBoost
- basic classification
- gradient boosting
- beyond basic classification
- random forests
- bagging
- simpler
- easier to parallelize
- typically higher error
- adaBoost
- face detection, malware classification, credit fraud detection, ads click through rate estimation, sales forecasting, ranking webpages for search, higgs boson detection
- neural networks
- try to have same number of hidden units in every layer
- usually the more the better, but computational
- random initialization
- forward propagation
- compute cost function
- back propagation
- gradient checking
- try to have same number of hidden units in every layer
- stemming
- for text mining / classifier
Friday, December 30, 2016
ML - Algorithm
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.