Title of article :
Finding Nuggets in Documents: A Machine
Learning Approach
Author/Authors :
Yi-fang Brook Wu، نويسنده , , Quanzhi Li، نويسنده , , Razvan Stefan Bot، نويسنده , , and Xin Chen، نويسنده ,
Issue Information :
ماهنامه با شماره پیاپی سال 2006
Abstract :
Document keyphrases provide a concise summary of
a document’s content, offering semantic metadata summarizing
a document. They can be used in many applications
related to knowledge management and text
mining, such as automatic text summarization, development
of search engines, document clustering, document
classification, thesaurus construction, and browsing
interfaces. Because only a small portion of documents
have keyphrases assigned by authors, and it is timeconsuming
and costly to manually assign keyphrases to
documents, it is necessary to develop an algorithm to
automatically generate keyphrases for documents. This
paper describes a Keyphrase Identification Program
(KIP), which extracts document keyphrases by using
prior positive samples of human identified phrases to
assign weights to the candidate keyphrases. The logic
of our algorithm is: The more keywords a candidate
keyphrase contains and the more significant these keywords
are, the more likely this candidate phrase is a
keyphrase. KIP’s learning function can enrich the glossary
database by automatically adding new identified
keyphrases to the database. KIP’s personalization feature
will let the user build a glossary database specifically
suitable for the area of his/her interest. The evaluation
results show that KIP’s performance is better than the
systems we compared to and that the learning function
is effective.
Journal title :
Journal of the American Society for Information Science and Technology
Journal title :
Journal of the American Society for Information Science and Technology