Concept based semantic web mining

Concept based semantic web mining

Özışık, Alper

URI: http://hdl.handle.net/123456789/993

Date: 2008

Abstract:

Current web search technologies are good to find similar pages with their content and link structures. However they are not enough to find similar pages including word dictionary or cross-linguistic meaning relevance. This thesis focuses finding similar pages on web with combination of known techniques. Link gatherings, semantic web metadata parsing are required for Web content and structural mining. This thesis differs from other web mining methods with word dictionary meaning and cross-linguistic meanings. All of that information is processed by web crawlers and indexed on data for web mining. Indexed data is purified from non-useful words and misleading web sites, such as advertisement sites. Clean data is processed in clustering data mining. Data processing contains adding more information to page relations with link distance levels and content word joint values. For the web mining process, K-means and EM methods of clustering algorithms are compared to decide which one will have better results. Chosen method enlists similar pages to the page of the user selected at starting point of the process.

Show full item record