Wednesday, 28 January 2015

A Mini Comparison of C++ and Java.



C++
Java
Write once, compile anywhere.
  • Compile into native machine code to take advantage of platform-specific features. 
  • Need to be re-compiled for each target platform.

Write once, compile once - portability
  • Compiled into virtual machine byte code, to be run on Java Virtual Machine, which requires Java Runtime Environment.
  • Independent of operating system.

Manual memory management.        
  • Programmer controlled.
    • Low-level approach.
    • High-lever approach – Garbage Collection and RAII.

Automatic Garbage Collection.
  • Program automatically finds and reclaims memory that’s no longer required.
  • Eliminates bugs due to memory leaks and dangling pointers.
  • Overhead.
    • Run more slowly.
    • Over-allocate memory.
    • May not free memory at best possible time.
  • Mechanisms:
    • reference counting – track any variable is reference an object (won’t work for circular references).
    • mark and sweep – marking of objects that can be accessed, unmark objects that are de-allocated (will work for circular references), less efficient.

Multiple inheritances.
Single inheritance.
  •      Multiple inheritance can be simulated via interface.

Methods need to be explicitly declared virtual1.
Non-static2 Java methods are always virtual1.
Data type sizes is implementation-dependent.
Defined sizes for primitive data type.
Linked list needs to be implemented.
Linked list is implemented in standard library.
A string is character arrays.
A string is an object of String class.
  1. A virtual method is a method which its implementation is determined at run time, dependent on the actual type or class that is invoking the object.
  2. A static method in Java is a method that can be called without needing to instantiate an object of class. An example:  public static void main(String[] args)




Tuesday, 27 January 2015

How Many is a Yottabyte?


1
20
byte
1024
210
kilobyte
1024 kilobyte
220
megabyte
1024 megabyte
230
gigabyte
1024 gigabyte
240
zetabyte
1024 zetabyte
250
petabyte
1024 petabyte
260
exabyte
1024 exabyte
270
zettabyte
1024 zettabyte
280
yottabyte

Friday, 23 January 2015

If you are wondering what a Decision Tree - A Supervised Machine Learning Method is, here is a good video for a starter. Random Forest included.

One thing though, you got to use a bit of imagination when the lecture is talking about a specific example, without seeing the whiteboard!

Do let me know if you have found a better one. Thanks.



What's interesting in this video:

  • build tree with attributes based on entropy
  • avoid over-fitting, by 
    • stop growing the tree when it's statistically insignificant
    • prune
  • how to deal with continuous attributes? Threshold
  • beware of singleton
    • pitfall of entropy - try Gain Ratio
  • Pros:
    • Interpretable
    • Good at handling noise
    • Good at handling missing data
    • fast and compact
  • Cons:
    • can only separate data with straight lines
    • greedy, leading to over-fit, but can prune or stop growing when information gain is insignificant
  • Random Forest:
    • a group of trees, grown like Decision Tree, with two differences:
      • grow with a subset of test data
      • grow with a subset of attributes
    • vote - majority wins