# Discussion of Similarity Metrics

## Jaccard / Tanimoto Coefficient

__Analysis__

In some case, each attribute is binary such that each bit represents the absence of presence of a characteristic, thus, it is better to determine the similarity via the overlap, or intersection, of the sets.

Simply put, the Tanimoto Coefficient uses the ratio of the intersecting set to the union set as the measure of similarity. Represented as a mathematical equation:

In this equation, N represents the number of attributes in each object (a,b). C in this case is the intersection set.

__Python Implementation__

# Inputs: two lists
# Output: the Tanimoto Coefficient
def tanimoto (list1, list2):
intersection = [common_item for common_item in list1 if common_item in list2]
return float(len(c))/(len(a) + len(b) - len(c))

__References__

The previous content is based on Chapter 3 of the following book:

Segaran, Toby. Programming Collective Intelligence: Building Smart Web 2.0 Applications. Sebastopol, CA: O'Reilly Media, 2007.

Next: Cosine Similarity

Back to: Pearson Correlation