Tuesday, February 2, 2016

Confusion Matrix

A confusion matrix is a classification system used to distinguish different variables and summarize the results of a supervised learning algorithm (such as the deep learning algorithm being worked on). It is composed of rows and columns where the columns represent the results that the algorithm predicted and the rows represent the actual classes of the objects tested.


Predicted

Actual

Dog
Cat
Fish
Dog
6                       
2
0
Cat
3
7                                   
0
Fish
0
0
11                      


In the above example there is a group of 8 dogs, 10 cats, and 11 fish and the results are as above; The algorithm has accurately identified 6 dogs as dogs, 7 cats as cats, and 9 fish as fish. The accurate results are easy to distinguish because they form a diagonal line from the top left of the graph to the bottom right (highlighted in green). In this example we can identify the algorithms mistakes as well. As shown above (in red), the algorithm mistook 2 dogs as being cats, and 3 cats as being dogs, however all the fish were accurately identified and no dogs or cats were mistaken as fish. Through this information, one can observe that the algorithm was extremely accurate in identifying fish and made some errors when it came to distinguishing cats from dogs and vice versa.


Table of Confusion


A table of confusion is a table with 2 rows and 2 columns that reports the number of true positives, false positives, true negatives, and false negatives. This is more accurate at identifying information from confusion matrices because it allows for a more detailed analysis than a proportion of guesses. A reason for accuracy’s lack of reliability is because if the data set is unbalanced, the results will be misleading. For example, if there were 95 cats and only 5 dogs in the data set, the classifier could easily be biased into classifying all the samples as cats. The overall accuracy would be 95%, but in practice the classifier would have a 100% recognition rate for the cat class but a 0% recognition rate for the dog class.


The proper Table of Confusion for the dog class for the Confusion Matrix above would be:


6 true positives
(6 dogs correctly identified as dogs)
3 false positives
(3 cats that were incorrectly identified as dogs)
2 false negatives
(2 dogs that were incorrectly identified as cats)
18 true negatives
(18 animals (excluding dogs) not identified as dogs)