Skip to content

kuteykin/Natural-Language-Processing

Repository files navigation

project Natural-language-processing

Coding Exercises for NLP

Disease_recognition_in_Spanish.ipynb Recognition and tagging of diseases from medical records / scientific literature in Spanish language (Finetuning of pretrained HF model of RoBERTa family on Token Classification task: customisation of NER tagging model for recognition of new entity using spaCy)

NER_Finetuning_HuggingFace_model.ipynb Fine-tuning of RoBERTa family model (from HuggingFace) for Named Entity Recogition: training on specific dataset

Extract_text_from_PDF_to_JSON.ipynb Exploring various methods for text scraping from large PDF files

swear_words_filter_testing.ipynb Evaluation of various approaches for profanity detection / swear words filtering (5 different libraries were tested)

word_frequency_barchart_wordcloud.py Word Frequency Bar Chart and Word Cloud (from Shakespeare’s Hamlet)

Input/ .* Texts for analysis, datasets and Masks for WordCloud

Output/ .* Generated datasets and wordclouds

About

Natural language processing

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published