• DocumentCode
    633112
  • Title

    Machine learning of engineering diagnostic knowledge from unstructured verbatim text descriptions

  • Author

    Yinghao Huang ; Murphey, Yi L. ; Yao Ge

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Michigan, Dearborn, MI, USA
  • fYear
    2013
  • fDate
    16-19 April 2013
  • Firstpage
    46
  • Lastpage
    52
  • Abstract
    This paper presents our research in text mining for discovering important engineering fault diagnostic knowledge from unstructured and verbatim text descriptions. In particular we focus on developing machine learning algorithms for detecting documents that contain descriptions of systematic failures and root causes to the faults. We developed a machine algorithm based on entropy analysis to extract an A-word list, a list of words that are important to characterize the documents of interests, a vector space model to represent features of important documents, and a constraint based k-means clustering algorithm to generate high purity clusters for use in detecting important documents. We applied the algorithms to automotive diagnostic text data, which are unstructured and verbatim descriptions by customers and technicians that contain many typos and self-invented terms. We were able to reduce a list of 2183 words to a list of 137 important words. The classification system generated by these machine learning algorithms showed high recall and accuracy in detecting important diagnostic descriptions.
  • Keywords
    automotive engineering; data mining; entropy; failure analysis; fault diagnosis; learning (artificial intelligence); pattern clustering; text analysis; word processing; A-word list extraction; automotive diagnostic text data; classification system; constraint based k-means clustering algorithm; document detection; engineering diagnostic knowledge; engineering fault diagnostic knowledge; entropy analysis; machine algorithm; machine learning algorithms; systematic failures; text mining; unstructured verbatim text descriptions; vector space model; Automotive engineering; Clustering algorithms; Entropy; Machine learning algorithms; Training data; Vectors; Vehicles; engineering diagnostics; important document detection; machine learning; text mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Data Mining (CIDM), 2013 IEEE Symposium on
  • Conference_Location
    Singapore
  • Type

    conf

  • DOI
    10.1109/CIDM.2013.6597216
  • Filename
    6597216