The closest search engines have come to actual applications of this technology
so far is know as “Associative Indexing” and it is put in effect under Stemming,
or the indexing of words on the basis of their uninflected roots (plurals, adverbs,
and adjectival forms are reduced to simple noun and verb forms before indexing).
Latent Semantic Analysis (LSA) is a technique in natural language processing,
in particular in vectorial semantics, invented in 1990 [1] by Scott Deerwester,
Susan Dumais, George Furnas, Thomas Landauer, and Richard Harshman.
In the context of its application to information retrieval, it is sometimes
called Latent Semantic Indexing (LSI).
Here are some quick facts about Latent Semantic Indexing:
1. LSI is 30% more effective than popular word matching methods.
2. LSI uses a fully automatic statistical method (Singular Value Decomposition)
3. It is very effective in cross-languages retrievals.
5. LSI can retrieve relevant information that does not contain query words.
6. It finds more relevant information than other methods.
Latent Semantic Indexing adds an important step to the document indexing
process. In addition to recording which keywords a document contains, the
method examines document collections as a whole, to see which others do
contain some of those same words. LSI considers documents that have many
words in common to be semantically close, and ones that have few words in
common to be semantically distant. This method correlates surprisingly well
with how a human being looking at content, classifies multiple documents.