Skip to content

A Machine Learning Model Capable of Identifying Low-Value Pages on Professional Theses written in the Turkish Language in PDF Format

License

Notifications You must be signed in to change notification settings

EmirXK/TR-PDF-PageClassifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

TR-PDF-PageClassifier

A Machine Learning Model Capable of Identifying Low-Value Pages on Professional Theses written in the Turkish Language in PDF Format

Low-Value pages are defined as follows: Table of Contents, Table of Figures, Tables of Tables, References, and Appendices.

The model detects these pages and outputs a list that contains the page number of every page it has labeled as a Low-Value page.

Features

  • Can read from CSV, PDF, and URL's
  • High Accuracy Values (~98%)
  • Fast Algorithm
  • Usable with any type of text in the Turkish Language.

Usage

Run Last_version.ipynb

Acknowledgements

Authors

License

MIT

About

A Machine Learning Model Capable of Identifying Low-Value Pages on Professional Theses written in the Turkish Language in PDF Format

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published