How do Machine Learning algorithms understand the text?

Intuition behind Natural Language Processing

ยท

2 min read

Machine Learning algorithms don't understand the text. So we need to convert text into numbers or vectors (like [0.3,0.6,..,0.9]).

Words have meaning in the real world, and we want those meaning should retain when we convert those texts to vectors.

How do we know vectors representation of words retains meaning in the real world? ๐Ÿค”

For example, if I have the words Mango, Banana, and Table. And their respective vector representation Vector M, Vector B, and Vector T.

Mango and Banana are both fruits, and they are more related to each other than the Mango and Table.

There are different ways we can measure similarity between vectors like cosine similarity or dot product of two vectors.

Suppose vector representation of word Banana, Mango, and Table retains meaning. In that case, the Dot product of vectors M and B should be higher than vector M and T.

There are many techniques to convert text to numbers, like

  • TFIDF,
  • Word2vec,
  • BERT embedding etc.

Some techniques not only retain meaning but also retain analogies. For example, using Word2Vec to convert words into vectors.

We can retain real-world analogies like,

King - Man + Woman ~= Queen
into vectors also
Vector King - Vector Man + Vector Woman ~= Vector Queen

Having good embedding helps machine learning algorithms understand sentence meaning entirely, and it will help perform tasks like text classification or question answering.

ย