ETNLP: A visual-aided systematic approach to select pre-trained embeddings for a downstream task
conference contribution
posted on 2023-05-23, 14:36 authored by Vu, X-S, Vu, T, Son TranSon Tran, Jiang, L© 2019 Association for Computational Linguistics (ACL). All rights reserved. Given many recent advanced embedding models, selecting pre-trained word embedding (a.k.a., word representation) models best fit for a specific downstream task is non-trivial. In this paper, we propose a systematic approach, called ETNLP, for extracting, evaluating, and visualizing multiple sets of pre-trained word embeddings to determine which embeddings should be used in a downstream task. We demonstrate the effectiveness of the proposed approach on our pre-trained word embedding models in Vietnamese to select which models are suitable for a named entity recognition (NER) task. Specifically, we create a large Vietnamese word analogy list to evaluate and select the pre-trained embedding models for the task. We then utilize the selected embeddings for the NER task and achieve the new state-of-the-art results on the task benchmark dataset. We also apply the approach to another downstream task of privacy-guaranteed embedding selection, and show that it helps users quickly select the most suitable embeddings. In addition, we create an open-source system using the proposed systematic approach to facilitate similar studies on other NLP tasks. The source code and data are available at https://github.com/vietnlp/etnlp.
History
Publication title
Proceedings of the 2019 Recent Advances in Natural Language Processing International ConferenceEditors
G Angelova, R Mitkov, I Nikolova and I TemnikovaPagination
1285-1294ISSN
1313-8502Department/School
School of Information and Communication TechnologyPublisher
INCOMAPlace of publication
BulgariaEvent title
Recent Advances in Natural Language Processing International ConferenceEvent Venue
Varna, BulgariaDate of Event (Start Date)
2019-09-02Date of Event (End Date)
2019-09-04Rights statement
Copyright unknownRepository Status
- Restricted
Socio-economic Objectives
Information systems, technologies and services not elsewhere classifiedUsage metrics
Categories
Keywords
Licence
Exports
RefWorksRefWorks
BibTeXBibTeX
Ref. managerRef. manager
EndnoteEndnote
DataCiteDataCite
NLMNLM
DCDC