DocumentCode
570692
Title
Comparing methods to extract technical content for technological intelligence
Author
Newman, Nils C. ; Porter, Alan L. ; Newman, David ; Courseault, Cherie ; Bolan, Stephanie D.
Author_Institution
IISC, Atlanta, GA, USA
fYear
2012
fDate
July 29 2012-Aug. 2 2012
Firstpage
1279
Lastpage
1285
Abstract
We are developing indicators for the emergence of science and technology (S&T) topics. We are targeting various S&T information resources, including metadata (i.e., bibliographic information) and full text. We explore alternative text analysis approaches - principal components analysis (PCA) and topic modeling - to extract technical topic information. We analyze the topical content to pursue potential applications and innovation pathways. In this presentation we compare alternative ways of consolidating messy sets of key terms [e.g., using Natural Language Processing (NLP) on abstracts and titles, together with various keyword sets]. Our process includes combinations of stopword removal, fuzzy term matching, association rules, and tf-idf weighting. We compare PCA results to topic modeling results. Our key test set consists of 4104 Web of Science records on Dye-Sensitized Solar Cells (DSSCs). Results suggest good potential to enhance our technical intelligence payoffs from database searches on topics of interest.
Keywords
content-based retrieval; data mining; meta data; principal component analysis; scientific information systems; PCA; S&T information resources; alternative text analysis; association rules; bibliographic information; fuzzy term matching; metadata; principal components analysis; science and technology topics; stopword removal; technical content extraction; technological intelligence; tf-idf weighting; topic modeling; Abstracts; Clustering algorithms; Decision support systems; Electrodes; Films; Photovoltaic cells; Principal component analysis;
fLanguage
English
Publisher
ieee
Conference_Titel
Technology Management for Emerging Technologies (PICMET), 2012 Proceedings of PICMET '12:
Conference_Location
Vancouver, BC
Print_ISBN
978-1-4673-2853-1
Type
conf
Filename
6304150
Link To Document