Skip to content

xnought/bpe.cpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bpe.cpp

Fast BPE training

  • Iterate given merge vocabs
  • Add string to vector sorted by length
  • Counter for byte pairs and selecting the one with the most
  • Put it all together
  • Replicate the python example and time
  • Optimize single threaded
  • Multi-thread
  • Read from file and output result to file
  • Keep a global counter where when we find a new item, we don't need to recount everything. Just a count once.

About

BPE training in c++

Resources

License

Stars

Watchers

Forks

Languages