DocumentCode :
3221529
Title :
BibPro: A Citation Parser Based on Sequence Alignment Techniques
Author :
Chen, Chien-Chih ; Yang, Kai-Hsiang ; Kao, Hung-Yu ; Ho, Jan-Ming
Author_Institution :
Nat. Taiwan Univ., Taipei
fYear :
2008
fDate :
25-28 March 2008
Firstpage :
1175
Lastpage :
1180
Abstract :
The dramatic increase in the number of academic publications has led to a growing demand for efficient organization of the resources to meet researchers´ specific needs. As a result, a number of network services have compiled databases from the public resources scattered over the Internet. However, publications in different conferences and journals follow different citation formats, so the problem of accurately extracting metadata from a publication string has also attracted a great deal of attention in recent years. In this paper, we extend our previous work to propose a new tool called BibPro for extracting metadata from citation strings by using a gene sequence alignment tool. The main enhancement of BibPro to our previously tool is that BibPro does not need knowledge databases (e.g., an author name database) to generate feature indices for citation strings. Instead, only the order of punctuation marks in a citation string is used to represent its format. Second, BibPro employs the basic local alignment search tool (BLAST) to find the most similar citation formats in database and then uses the Needleman-Wunsch algorithm to choose the best-fit citation format as the extraction template. Our experimental results show that, in terms of precision and recall, BibPro outperforms other existent systems (e.g., INFOMAP and ParaCite), and BibPro can scale well.
Keywords :
Internet; citation analysis; meta data; text analysis; BibPro; Internet; Needleman-Wunsch algorithm; academic publications; basic local alignment search tool; citation format; citation parser; citation strings; feature index; gene sequence alignment tool; metadata extraction; network services; public resources; publication string; punctuation marks; resource organization; Citation analysis; Data mining; Hidden Markov models; IP networks; Information analysis; Scattering; Spatial databases; Support vector machine classification; Support vector machines; Web and internet services; Citation Parser; Digital Library; Sequence Alignment;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advanced Information Networking and Applications - Workshops, 2008. AINAW 2008. 22nd International Conference on
Conference_Location :
Okinawa
Print_ISBN :
978-0-7695-3096-3
Type :
conf
DOI :
10.1109/WAINA.2008.125
Filename :
4483078
Link To Document :
بازگشت