Skip to content

locchh/synthflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SynthFlow🍃

A tool for generating synthetic data.

Pipelines

  • Basic
No Pipeline none txt pdf Comment
01 Simple Q&A Based on what type of document?
  • For Learning
No Pipeline none txt pdf Comment
01 Definitions and Terminology
02 Troubleshooting
03 Command References
04 System Operations
05 Programming on Mainframes
06 System Configuration
07 Migration and Modernization
08 Performance Optimization
09 Integration
10 Error Analysis
  • For Coding
No Pipeline none txt pdf Comment
01 SQL Query Assistance Requirement
02 Code Generation Requirement
03 Code Completion Requirement + Incomplete Code
04 Code Debugging Requirement + Buggy Code
07 Error Explanation Requirement + Buggy Code + Error Message
05 Code Explanation Requirement + Completed Code
06 Code Review Requirement + Completed Code
08 Code Documentation Requirement + Completed Code
09 Code Optimization Requirement + Completed Code
10 Code Translation Requirement + Completed Code
  • Other

TODOs

No Task Status
01 Improve data quality (self-instruct, evol-instruct, validator, eliminator,...) 🛠️
02 Support local host model
03 Support multi-threaded running
04 Generate a synthetic dataset for training the model
05 Public package and release paper

References

gitingest

h2o-wizardlm

self-instruct

openai

openai-python

docling

docling.io

grammarinator

WizardLM: Empowering Large Language Models to Follow Complex Instructions

Self-Instruct: Aligning Language Models with Self-Generated Instructions

Instruction Induction: From Few Examples to Natural Language Task Descriptions

Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor