UM  > Faculty of Science and Technology
Residential Collegefalse
Status已發表Published
Detecting the content related parts of web pages
Yong Li1; Zhiguo Gong1; Ke Qi2
2005-08-29
Conference NameInternational Conference on Service Systems and Service Management
Source Publication2005 INTERNATIONAL CONFERENCE ON SERVICES SYSTEMS AND SERVICES MANAGEMENT, VOLS 1 AND 2, PROCEEDINGS
Pages1071-1074
Conference Date13-15 June 2005
Conference PlaceChongquing, China
Abstract

Many web pages are semantic diverse. That is, the whole content of a web page is not consistent to address one topic. However, current search engines are page-oriented (other than topic-oriented). But, most web users retrieve their target information by topics. Therefore, how to partition web pages by semantics is one of interesting research topics. In this paper, we firstly build a tree (called Semantic Tree, ST) to partition the web page into the content parts (called Semantic Part, SP) based on the web page tags. Then we analyze the characteristics of the words (or terms) appearing on the web page in order to build a term weighting formula. Based on these term weight values we employ the similarity formula to calculate the semantic similar degree between each two SPs. Finally, we consider the balance point of precision and recall as the reference value of the similarity-threshold. Through the work above we can find the content-related parts (or segmentations) of a web page. And we achieved a satisfied result.

KeywordWeb Mining Term Weighting Similarity
DOI10.1109/ICSSSM.2005.1500159
Indexed BySCIE ; CPCI-S
Language英語English
WOS Research AreaBusiness & Economics ; Computer Science ; Operations Research & Management Science
WOS SubjectBusiness ; Computer Science, Artificial Intelligence ; Computer Science, Information Systems ; Computer Science, Interdisciplinary
WOS IDWOS:000231534000219
Scopus ID2-s2.0-33745241081
Fulltext Access
Citation statistics
Document TypeConference paper
CollectionFaculty of Science and Technology
DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Affiliation1.FacuIty of Science and Technology, University of Macau, China
2.System Engineering Department, Beijing jiaotong University, Beijing, 1'75#, 100044 China
First Author AffilicationUniversity of Macau
Recommended Citation
GB/T 7714
Yong Li,Zhiguo Gong,Ke Qi. Detecting the content related parts of web pages[C], 2005, 1071-1074.
APA Yong Li., Zhiguo Gong., & Ke Qi (2005). Detecting the content related parts of web pages. 2005 INTERNATIONAL CONFERENCE ON SERVICES SYSTEMS AND SERVICES MANAGEMENT, VOLS 1 AND 2, PROCEEDINGS, 1071-1074.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Yong Li]'s Articles
[Zhiguo Gong]'s Articles
[Ke Qi]'s Articles
Baidu academic
Similar articles in Baidu academic
[Yong Li]'s Articles
[Zhiguo Gong]'s Articles
[Ke Qi]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Yong Li]'s Articles
[Zhiguo Gong]'s Articles
[Ke Qi]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.