شماره ركورد كنفرانس :
4650
عنوان مقاله :
SAZEH: A Wide Coverage Persian Constituency Tree Bank and Parser
پديدآورندگان :
Tabatabayi Seifi Shohreh RCDAT: Research Center for Development of Advanced Technologies , Sarraf Rezaee Iman RCDAT: Research Center for Development of Advanced Technologies
كليدواژه :
Constituency Treebank , Constituency Parser , Natural Language Processing
عنوان كنفرانس :
نوزدهمين كنفرانس بين المللي هوش مصنوعي و پردازش سيگنال
چكيده فارسي :
— Constituency parsing is one of the basic operations in many NLP tasks such as translation, Information Extraction, Abstractive Summarization and etc. We need wide coverage constituency treebank to train a probabilistic parser. SAZEH is the first large-volume Persian constituency treebank with more than 21000 parsed trees and 627000 tokens. The average length of its sentences is 30 words. They are chosen from Peykare Corpus which already has POS tags. Berkeley Lexical Parser is trained on SAZEH corpus and the best F-measure attained on the test part of the corpus is 81.65% using gold POS-tags.
چكيده لاتين :
— Constituency parsing is one of the basic operations in many NLP tasks such as translation, Information Extraction, Abstractive Summarization and etc. We need wide coverage constituency treebank to train a probabilistic parser. SAZEH is the first large-volume Persian constituency treebank with more than 21000 parsed trees and 627000 tokens. The average length of its sentences is 30 words. They are chosen from Peykare Corpus which already has POS tags. Berkeley Lexical Parser is trained on SAZEH corpus and the best F-measure attained on the test part of the corpus is 81.65% using gold POS-tags.