A Taxonomy based Semantic Similarity of Documents using the Cosine Measure


Madylova A., Öğüdücü Ş.

24th International Symposium on Computer and Information Sciences, Güzelyurt, Kıbrıs (Kktc), 14 - 16 Eylül 2009, ss.129-134 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/iscis.2009.5291865
  • Basıldığı Şehir: Güzelyurt
  • Basıldığı Ülke: Kıbrıs (Kktc)
  • Sayfa Sayıları: ss.129-134
  • İstanbul Teknik Üniversitesi Adresli: Evet

Özet

In this paper, we present a new method for calculating semantic similarities between documents. This method is based on cosine similarity calculation between concept vectors of documents obtained from a taxonomy of words that captures IS-A relations. The calculation of semantic similarities between documents is a very time consuming task, since it is necessary first to calculate semantic similarities between each pair of words that appear on different documents. In this paper, we present a new method to calculate semantic similarities between documents which results in faster computational time. Both a taxonomy based semantic similarity and cosine similarity are employed. First, the concept vectors of documents are obtained by extending the terms in the document vectors with their corresponding IS-A concepts. Cosine similarity is then calculated between those concept vectors of documents. Thus, the overall similarity between documents is a combination of cosine similarity and semantic similarity. The proposed semantic similarity is tested in document clustering problem. The experimental results show that our method achieves a good performance.