Combining dictionary- and rule-based approximate entity linking with tuned BioBERT

Overview
Related items

Abstract:

Chemical named entity recognition (NER) is a significant step for many downstream applications like entity linking for the chemical text-mining pipeline. However, the identification of chemical entities in a biomedical text is a challenging task due to the diverse morphology of chemical entities and the different types of chemical nomenclature. In this work, we describe our approach that was submitted for BioCreative version 7 challenge Track 2, focusing on the ‘Chemical Identification’ task for identifying chemical entities and entity linking, using MeSH. For this purpose, we have applied a two-stage approach as follows (a) usage of fine-tuned BioBERT for identification of chemical entities (b) semantic approximate search in MeSH and PubChem databases for entity linking. There was some friction between the two approaches, as our rule-based approach did not harmonise optimally with partially recognized words forwarded by the BERT component. For our future work, we aim to resolve the issue of the artefacts arising from BERT tokenizers and develop joint learning of chemical named entity recognition and entity linking using pre-trained transformer-based models and compare their performance with our preliminary approach. Next, we will improve the efficiency of our approximate search in reference databases during entity linking. This task is non-trivial as it entails determining similarity scores of large sets of trees with respect to a query tree. Ideally, this will enable flexible parametrization and rule selection for the entity linking search.

SEEK ID: https://fairdomhub.org/publications/633

DOI: 10.1101/2021.11.09.467905

Projects: BioCreative VII

Publication type: Conference Proceeding

Citation: biorxiv;2021.11.09.467905v1,[Preprint]

Date Published: 11th Nov 2021

Registered Mode: by DOI

Authors: Ghadeer Mobasher, Lukrécia Mertová, Sucheta Ghosh, Olga Krebs, Bettina Heinlein, Wolfgang Müller

Help

Tree Split Graph

Submitter

Lukrécia Mertová

Citation

Mobasher, G., Mertová, L., Ghosh, S., Krebs, O., Heinlein, B., & Müller, W. (2021). Combining dictionary- and rule-based approximate entity linking with tuned BioBERT. In []. openRxiv. https://doi.org/10.1101/2021.11.09.467905

Activity

Views: 2052

Created: 17th Nov 2021 at 19:44

Last updated: 8th Dec 2022 at 17:26

Combining dictionary- and rule-based approximate entity linking with tuned BioBERT

Related items