DocumentCode :
231015
Title :
Stanford parser based approach for extraction of Link- Context from non-descriptive Anchor-Text
Author :
Kumar, Narendra ; Singh, Monika
Author_Institution :
AIIT, Amity Univ., Noida, India
fYear :
2014
fDate :
8-10 Oct. 2014
Firstpage :
1
Lastpage :
6
Abstract :
Link Context Analysis has been widely explored for determining the context of the target web page. But most of the researchers have only considered descriptive or meaningful anchor text and left the undiscriptive anchor text. By researching the World Wide Web it is analyzed that a good percentage of web pages can be reached by following the undescriptive anchor text. So an algorithm has been proposed and implemented for Link context determination (LCD) to determine the context of non-descriptive anchor text in this paper. In this work non-descriptive anchor text are mainly considered for Link Context determination. A corpus of different web pages belonging to a common domain has been considered first. Then the pages were manually analyzed and relation between the anchor text and the words in its vicinity were discovered. Certain numbers of rules were formed and represented in the form of a tree, based upon these relationships. In our proposed and implemented architecture for LCD we have used three components(1) Stanford parser (2) Rules (3) Link Context Determination. The input sentence is given to the Stanford parser which creates a parse tree for the read sentence. This tree is then used by the link context determiner along with the appropriate rules tree to determine the link context. The proposed approach has been implemented and validated by considering limited samples of non-descriptive ATs. The results have shown that, the proposed LCD has extracted 100% actual link-context of each considered non-descriptive Anchor Text (AT´s).
Keywords :
Web sites; grammars; natural language processing; semantic Web; text analysis; LCD; Stanford parser based approach; Web page; World Wide Web; link context analysis; link context determination; link context extraction; nondescriptive anchor text; undiscriptive anchor text; Databases; Indium phosphide; Knowledge discovery; Organizations; Focused-Crawling; Information Extraction; Link Context Determiner (LCD); NLP; Semantic web crawling; Semantic-Web; Stanford Parser;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions), 2014 3rd International Conference on
Conference_Location :
Noida
Print_ISBN :
978-1-4799-6895-4
Type :
conf
DOI :
10.1109/ICRITO.2014.7014751
Filename :
7014751
Link To Document :
بازگشت