Saama Word Guesser ❓

This project mimics the problem with hangman/dumb charades like NLP word games. This NLP model will predict the correct word from an intentionally obscured word based on its description.

Example:

Input masked word = DEM_G_A_HY and

Description = is the statistical study of populations, especially human beings.

Model prediction/output = DEMOGRAPHY

Method to Execute the Project

Install the libraries using the command - pip install -r requirements.txt
Run the app_gradio.py file - python3 app_gradio.py

Training a model from Scratch

Run python3 train.py --dataset_csv_path path/to/dataset.csv, the csv must contain words in the first column, and their meanings in the second column.

For example, run python3 train.py --dataset_csv_path ./data/train.csv to train on English words (dataset included).

OR

Build docker image with python3 build_container.py train
Run sudo docker run train path/to/dataset.csv

Gradio app

Run python3 app_gradio.py to launch gradio-based web-app.

OR

Build docker image with python3 build_container.py inference_gradio
Run sudo docker run inference_gradio

Rest API

Run python3 app_rest.py to launch flask REST server.

OR

Build docker image with python3 build_container.py inference_rest
Run sudo docker run inference_rest

Example request

curl --header "Content-Type: application/json" \
--request POST \
--data '{"masked_word":"DEM_G_A_HY","description":"the statistical study of populations."}' \
http://127.0.0.1:5000/guess_word

Project Details:

train.py is the training script, uses PyTorch-Lightning, pass --dataset_csv_path argument to the script (works for the docker container as well).
tests contains unit tests for modules in core
core contains implementations of the dataset, model, tokenizer etc.
data contains csvs for training, testing, dumped tokenizer etc.
build_tokenizer.py builds a character-level tokenizer for a given CSV.
clean_data.py is used for cleaning the open-source dataset used.
app_gradio.py is a simple gradio inference for inference with a trained model (included).
app_rest.py is a simple REST API (flask).
dockerization_app contain Dockerfiles for inference and training.
build_container.py builds docker containers for inference_gradio, inference_rest, and inference_training. Example: python3 build_container.py inference_gradio.
logs contain the chekpoint of model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Saama Word Guesser ❓

Example:

Method to Execute the Project

Training a model from Scratch

Gradio app

Rest API

Project Details:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
core		core
data		data
dockerization_app		dockerization_app
flagged		flagged
logs/version_3/checkpoints		logs/version_3/checkpoints
tests		tests
.gitignore		.gitignore
README.md		README.md
app_gradio.py		app_gradio.py
app_rest.py		app_rest.py
build_container.py		build_container.py
build_tokenizer.py		build_tokenizer.py
clean_data.py		clean_data.py
output.png		output.png
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

ayazkhan03/Word-Guesser-NLP

Folders and files

Latest commit

History

Repository files navigation

Saama Word Guesser ❓

Example:

Method to Execute the Project

Training a model from Scratch

Gradio app

Rest API

Project Details:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages