Friday, 23 January 2015

If you are wondering what a Decision Tree - A Supervised Machine Learning Method is, here is a good video for a starter. Random Forest included.

One thing though, you got to use a bit of imagination when the lecture is talking about a specific example, without seeing the whiteboard!

Do let me know if you have found a better one. Thanks.



What's interesting in this video:

  • build tree with attributes based on entropy
  • avoid over-fitting, by 
    • stop growing the tree when it's statistically insignificant
    • prune
  • how to deal with continuous attributes? Threshold
  • beware of singleton
    • pitfall of entropy - try Gain Ratio
  • Pros:
    • Interpretable
    • Good at handling noise
    • Good at handling missing data
    • fast and compact
  • Cons:
    • can only separate data with straight lines
    • greedy, leading to over-fit, but can prune or stop growing when information gain is insignificant
  • Random Forest:
    • a group of trees, grown like Decision Tree, with two differences:
      • grow with a subset of test data
      • grow with a subset of attributes
    • vote - majority wins

No comments:

Post a Comment