Data Mining Portfolio

Similarity Techniques

Definition

Similarity can be roughly described as the measure of how much two or more objects are alike. Similarity can also be seen as the numerical distance between multiple data objects that are typically represented as value between the range of 0 (not similar at all) and 1 (completely similar). Depending on the similarity metric used the triangle inequality between objects may hold, but more generally the two properties that must be maintained for similarites is that the measure of similarity must fall within the range of 0 and 1 and symmetry. Symmetry being the property that states that for all x and for all y the similarity of x and y must be the same as the similarity of y and x.

Similarity Metrics