University of Tasmania
Browse

ETNLP: A visual-aided systematic approach to select pre-trained embeddings for a downstream task

conference contribution
posted on 2023-05-23, 14:36 authored by Vu, X-S, Vu, T, Son TranSon Tran, Jiang, L
© 2019 Association for Computational Linguistics (ACL). All rights reserved. Given many recent advanced embedding models, selecting pre-trained word embedding (a.k.a., word representation) models best fit for a specific downstream task is non-trivial. In this paper, we propose a systematic approach, called ETNLP, for extracting, evaluating, and visualizing multiple sets of pre-trained word embeddings to determine which embeddings should be used in a downstream task. We demonstrate the effectiveness of the proposed approach on our pre-trained word embedding models in Vietnamese to select which models are suitable for a named entity recognition (NER) task. Specifically, we create a large Vietnamese word analogy list to evaluate and select the pre-trained embedding models for the task. We then utilize the selected embeddings for the NER task and achieve the new state-of-the-art results on the task benchmark dataset. We also apply the approach to another downstream task of privacy-guaranteed embedding selection, and show that it helps users quickly select the most suitable embeddings. In addition, we create an open-source system using the proposed systematic approach to facilitate similar studies on other NLP tasks. The source code and data are available at https://github.com/vietnlp/etnlp.

History

Publication title

Proceedings of the 2019 Recent Advances in Natural Language Processing International Conference

Editors

G Angelova, R Mitkov, I Nikolova and I Temnikova

Pagination

1285-1294

ISSN

1313-8502

Department/School

School of Information and Communication Technology

Publisher

INCOMA

Place of publication

Bulgaria

Event title

Recent Advances in Natural Language Processing International Conference

Event Venue

Varna, Bulgaria

Date of Event (Start Date)

2019-09-02

Date of Event (End Date)

2019-09-04

Rights statement

Copyright unknown

Repository Status

  • Restricted

Socio-economic Objectives

Information systems, technologies and services not elsewhere classified

Usage metrics

    University Of Tasmania

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC