By Jose Nuñez
Singular Value Decomposition (SVD) is a powerful and fully automatic statistical method utilized by Latent Semantic Analysis (LSA). The SVD algorithm is O(N2 k3), where N is the number of terms + documents, k is the number of dimensions in concept space. The SVD algorithm is unusable for a huge, dynamic collection since it is difficult to come across the number of dimensions.
Latent Semantic Indexing (LSI) is slow simply because of utilizing this SVD approach to create concept spaces. LSI assumes that there is some underlying or latent structure in word usage that is partially obscured by variability in word choice. So, a truncated Singular Value decomposition (SVD) is employed to estimate the structure in word usage across documents. Retrieval is then performed making use of the database of singular values and vectors obtained from the truncated SVD. Data shows that these statistically derived vectors are a lot more robust indicators of meaning than of individual terms.
SVD and LSI are least-squares strategies. The projection into the latent semantic space is chosen so that the representations in the original space are changed as little as possible when measured by the sum of the squares of these differences. The projection transforms a document’s vector in n-dimensional word space into a vector in the k-dimensional decreased space.
One can conclude or prove that SVD is unique, that is, there is only 1 achievable decomposition of a given matrix. Since SVD finds an optimal projection to a low dimensional space, that is the key property for word co-occurrence patterns.