Friday, November 27, 2015

Machine learning tools

  1. confusion matrix
    1. https://en.wikipedia.org/wiki/Confusion_matrix
    2. python function
      1. http://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html
      2. http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html
  2. cross-validation
    1. https://en.wikipedia.org/wiki/Cross-validation_(statistics)
    2. python function
      1. http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.cross_val_score.html
    3. k-fold
      1. https://en.wikipedia.org/wiki/Cross-validation_(statistics)#k-fold_cross-validation
      2. python function
        1. http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.KFold.html
    4. leave-one-out
      1. https://en.wikipedia.org/wiki/Cross-validation_(statistics)#Leave-one-out_cross-validation
      2. python function
        1. http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.LeaveOneOut.html
  3. standard error of the mean
    1. https://en.wikipedia.org/wiki/Standard_error#Standard_error_of_the_mean
    2. python function
      1. http://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.stats.sem.html
  4. one hot encoding
    1. https://en.wikipedia.org/wiki/One-hot
    2. python function
      1. http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

How to choose a statistical test


C -> Q
  1. ANOVA
    1. https://en.wikipedia.org/wiki/Analysis_of_variance
    2. python function
  2. Tukey's honestly significant difference test
    1. https://en.wikipedia.org/wiki/Tukey%27s_range_test
    2. python function
      1. http://statsmodels.sourceforge.net/devel/generated/statsmodels.sandbox.stats.multicomp.MultiComparison.html

Graphing decisions flowchart