• DocumentCode
    555334
  • Title

    Experiences with text mining large collections of unstructured systems development artifacts at jpl

  • Author

    Port, Dan ; Nikora, Allen ; Hihn, Jairus ; Huang, LiGuo

  • Author_Institution
    Shidler Coll. of Bus., Univ. of Hawaii, Honolulu, HI, USA
  • fYear
    2011
  • fDate
    21-28 May 2011
  • Firstpage
    701
  • Lastpage
    710
  • Abstract
    Often repositories of systems engineering artifacts at NASA´s Jet Propulsion Laboratory (JPL) are so large and poorly structured that they have outgrown our capability to effectively manually process their contents to extract useful information. Sophisticated text mining methods and tools seem a quick, low-effort approach to automating our limited manual efforts. Our experiences of exploring such methods mainly in three areas including historical risk analysis, defect identification based on requirements analysis, and over-time analysis of system anomalies at JPL, have shown that obtaining useful results requires substantial unanticipated efforts - from preprocessing the data to transforming the output for practical applications. We have not observed any quick ´wins´ or realized benefit from short-term effort avoidance through automation in this area. Surprisingly we have realized a number of unexpected long-term benefits from the process of applying text mining to our repositories. This paper elaborates some of these benefits and our important lessons learned from the process of preparing and applying text mining to large unstructured system artifacts at JPL aiming to benefit future TM applications in similar problem domains and also in hope for being extended to broader areas of applications.
  • Keywords
    aerospace industry; data mining; formal verification; information retrieval; systems analysis; text analysis; JPL; Jet Propulsion Laboratory; NASA; defect identification; historical risk analysis; information extraction; large collections; requirements analysis; sophisticated text mining methods; system anomaly over-time analysis; systems engineering artifacts; unstructured systems development artifacts; Association rules; Manuals; Risk management; Software; Text mining; assurance; experience; requirements assurance; risk; risk assurance; system repository mining; systems development artifact; text mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Engineering (ICSE), 2011 33rd International Conference on
  • Conference_Location
    Honolulu, HI
  • ISSN
    0270-5257
  • Print_ISBN
    978-1-4503-0445-0
  • Electronic_ISBN
    0270-5257
  • Type

    conf

  • DOI
    10.1145/1985793.1985891
  • Filename
    6032511