BI-LSTM-CRF-FC readme (steps to run the notebook code, result and references)

CS598 Final Project implementing the paper:

1)Fully‑connected LSTM–CRF on medical concept extraction Jie Ji1, Bairui Chen, Hongcheng Jiang4 Received: 14 December 2018 / Accepted: 12 February 2020 / Published online: 24 February 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 202

2)There is no original repo provided by the above mentioned authors of the paper.

3)Dependencies: pytorch 1.9 , GLOVE300D6B (.txt), matlplotlib, conlleval(pip install conlleval), python 3.x

4)Data download instruction

i2b2010 dataset for medical concepts Register for an account at https://portal.dbmi.hms.harvard.edu/ . The i2b2 NLP data sets previously released on i2b2.org are now hosted here on the DBMI Data Portal under their new name, n2c2 (National NLP Clinical Challenges):n2c2 NLP Research Data Sets: https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/ 2010 relation challenge datasets: download the concepts and txt

Preprocessing code + command (if applicable)

Please run the notebook in jupyter/google colab: Transform_i2b2_CoNLL_data.ipynb this notebook translates the concept and txt files data input into train,dev(validation) and test files(80/10/10) approx split ratio.

Training, Model and Test

a) Please run the jupyter notebook BIDIR_LSTM_CRF_FC.ipynb to train and test concepts and txt data for fully connected LSTM-CRF proposed by the original authors.

b) Please run the jupyter notebook CS598BIDIR-LISTM-CRF-STACKED-Project.ipynb to train and test concepts and txt data for 4 layer stacked BIDIR-LSTM-CRF.

c) Please run the jupyter notebook BI-LISTM-CRF.ipynb to train and test concepts and txt data for BIDIR-LSTM-CRF.

d) Please run the jupyter notebook UniLSTM-CRF.ipynb to train and test concepts and txt data for fully connected UNIDIR-LSTM-CRF.
Evaluation code + command

Please run pip install conlleval (credit: https://github.com/kaniblu/conlleval) to get F1 scores, Precision, Recall, Accuracy scores. The last code cell in each notebook shows how to extract that info to an output file and get the best F1 score.

No Pretrained models used
Table of results

Number 4 (Fully Connected LSTM-CRF) performed the best, but due to limited data availability I could not get the performance that the original authors achieved.

We run this notebook in google colab using GPU runtime

If you cannot use Google Colab, run a jupyter notebook and remove all the .cuda() function calls from the code to run code on a CPU.

References:

LSTM-CRF code credits: https://github.com/huangxt39/LSTM-CRF-pytorch

Advanced: Making Dynamic Decisions and the Bi-LSTM CRF https://pytorch.org/tutorials/beginner/nlp/advanced_tutorial.html

Preprocess Code credits: https://github.com/raghavchalapathy/Bidirectional-LSTM-CRF-for-Clinical-Concept-Extraction (theano)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BI-LSTM-CRF-FC readme (steps to run the notebook code, result and references)

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
BI-LISTM-CRF.ipynb		BI-LISTM-CRF.ipynb
BIDIR_LSTM_CRF_FC.ipynb		BIDIR_LSTM_CRF_FC.ipynb
CS598BIDIR-LISTM-CRF-STACKED-Project.ipynb		CS598BIDIR-LISTM-CRF-STACKED-Project.ipynb
README.md		README.md
Transform_i2b2_CoNLL_data.ipynb		Transform_i2b2_CoNLL_data.ipynb
UniLSTM-CRF.ipynb		UniLSTM-CRF.ipynb

mandar33/BI-LSTM-CRF-FC

Folders and files

Latest commit

History

Repository files navigation

BI-LSTM-CRF-FC readme (steps to run the notebook code, result and references)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages