Cosine similarity



Mathematics

Release date:2023/9/8         

 ・In Japanese
Premise knowledge
 ・Inner product of vectors
 ・Covariance


■What is cosine similarity?

Cosine similarity is an index that expresses how similar two vectors are in the same direction using the inner product of vectors. You can examine data similarity by replacing sample data with vectors and calculating cosine similarity. Examples of using cosine similarity include vectorizing words and sentences in natural language processing to measure their similarity, and regression analysis (ex: kernel regression) from the similarity of sample data.

If you understand the relationship between inner product and covariance, you can imagine why vector similarity can be measured by inner product.

■Relationship between inner product of vectors and covariance

<Inner product of vectors>

It is defined as follows.



Here, the formula is modified as follows.



<Covariance>

The correlation coefficient r is defined using covariance as follows. The closer r is to 1, the more similar samples x and y are.



<Cosine similarity>

Comparing equations (3) and (5), we can see that the equations have the same form. In other words, cos θ corresponds to the correlation coefficient r and has the following relationship, so the similarity of data can be measured by obtaining cos.











List of related articles



Mathematics