The world of Large Language Models (LLMs) is complex and varied. This resource collates together the things that matter, helping to make sense of this increasing important topic.
Everyone knows ChatGPT, but do you know these others?
- OpenAI ChatGPT
- Google Bard
- Anthropic Claude
- Inflection Pi
- Hugging Chat
- Microsoft Bing
- You
- Perplexity
- Chatsonic
A selection of interesting & noteworthy research papers related to LLMs.
- 2023: A Survey of Large Language Models
- 2023: A Comprehensive Overview of Large Language Models
- 2023: Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
- 2023: Aligning Large Language Models with Human: A Survey
- 2023: ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
- 2023: Generative Agents: Interactive Simulacra of Human Behavior
- 2023: QLoRA: Efficient Finetuning of Quantized LLMs
- 2022: Scaling Instruction-Finetuned Language Models
- 2022: Constitutional AI: Harmlessness from AI Feedback
- 2022: What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
- 2022: Training Compute-Optimal Large Language Models
- 2022: Training language models to follow instructions with human feedback
- 2022: Emergent Abilities of Large Language Models
- 2022: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- 2021: LoRA: Low-Rank Adaptation of Large Language Models
- 2021: The Power of Scale for Parameter-Efficient Prompt Tuning
- 2021: On the Opportunities and Risks of Foundation Models
- 2020: Language Models are Few-Shot Learners
- 2020: Scaling Laws for Neural Language Models
- 2018: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- 2017: Attention is all you need
Get skilled up with these free and paid-for courses.
- Prompt Engineering Guide
- LLM Bootcamp
- Best practices for prompt engineering with OpenAI API
- Lil'Log: Prompt Engineering
- Prompt Engineering Guide
- Cohere LLM University
- Deep Learning: ChatGPT Prompt Engineering for Developers
- Deep Learning: Learn the fundamentals of generative AI for real-world applications
- Deep Learning: LangChain for LLM Application Development
- Princeton: COS 597G - Understanding Large Language Models
- Stanford: CS324 - Large Language Models
These various benchmarks are commonly used to compare LLM performance.
These leaderboards show how LLMs compare relative to each other.
Open source models are generally understood to be free to use, but some models have restrictive licensing that prohibits commercial use or restricts usage in some way. Be careful to check out the exact license for the model you want to use, making sure you understand exactly what is permissable.
๐ฌ๏ธ Mistral
Parameters: 7B
Origin: Mistral
License: Apache 2.0
Release date: October 2023
Paper:
Commercial use possible: YES
GitHub: https://huggingface.co/mistralai
Training cost:
Comment: Outperforms LLaMA2 13B
๐ LongChat
Parameters: 7B
Origin: UC Berkeley, CMU, Stanford, and UC San Diego
License: Apache 2.0
Release date: August 2023
Paper:
Commercial use possible: YES
GitHub: https://github.com/DachengLi1/LongChat
Training cost:
Comment: 32k context length!
๐ฏ Qwen
Parameters: 7B
Origin: Alibaba
License: Tongyi Qianwen
Release date: August 2023
Paper:
Commercial use possible: YES
GitHub: https://github.com/QwenLM/Qwen-7B
Training cost:
๐ฆ Vicuna 1.5
Parameters: 13B
Origin: UC Berkeley, CMU, Stanford, and UC San Diego
License: Apache 2.0
Release date: August 2023 (v1.5 uses LLaMA2 instead of LLaMA of prior releases)
Paper:
Commercial use possible: NO (trained on https://sharegpt.com conversations that potentially breaches OpenAI license)
GitHub: https://github.com/lm-sys/FastChat
Training cost:
๐ Stable Beluga
Parameters: 7B, 40B
Origin: Stability AI.
License: CC BY-NC-4.0
Release date: July 2023
Paper:
Commercial use possible: NO
GitHub: https://huggingface.co/stabilityai/StableBeluga2
Training cost:
๐ฆ LLaMA2
Parameters: 7B, 40B
Origin: Meta.
License: Llama 2 Community License Agreement
Release date: July 2023
Paper: https://arxiv.org/abs/2307.09288
Commercial use possible: YES
GitHub: https://huggingface.co/meta-llama
Training cost: A cumulative of 3.3M GPU hours of computation was performed on hardware of type A100-80GB (TDP of 400W or 350W). We estimate the total emissions for training to be 539 tCO2eq, of which 100% were directly offset by Metaโs sustainability program.
๐ฆ Falcon
Parameters: 7B, 40B
Origin: UAE Technology Innovation Institute.
License: Apache 2.0
Release date: May 2023
Paper:
Commercial use possible: YES
GitHub: https://huggingface.co/tiiuae/falcon-7b
GitHub: https://huggingface.co/tiiuae/falcon-7b-instruct
GitHub: https://huggingface.co/tiiuae/falcon-40b
GitHub: https://huggingface.co/tiiuae/falcon-40b-instruct
Training cost: Falcon-40B was trained on AWS SageMaker, on 384 A100 40GB GPUs in P4d instances.
๐งฉ MosaicML MPT-30B
Parameters: 30B
Origin: Open souce from MosaicML.
License (MPT-30B Base, Instruct): Attribution-ShareAlike 3.0 Unported
License (MPT-30B Chat): Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Release date: June 2023
Paper:
Commercial use possible: YES(Base & Instruct), NO(Chat)
GitHub: Base: https://huggingface.co/mosaicml/mpt-30b
GitHub: Instruct: https://huggingface.co/mosaicml/mpt-30b-instruct
GitHub: Chat: https://huggingface.co/mosaicml/mpt-30b-chat
Training cost: From Scratch: 512xA100-40GB, 28.3 Days, ~ $871,000.
Traingin cost: Finetune 30B Base model: 16xA100-40GB, 21.8 Hours, $871
๐งฉ MosaicML MPT-7B
Parameters: 7B
Origin: Open souce from MosaicML. Claimed to be competitive with LLaMA-7B. Base, storywriter, instruct, chat fine-tunings available.
License (MPT-7B Base): Apache 2.0
License (MPT-7B-StoryWriter-65k+): Apache 2.0
License (MPT-7B-Instruct): CC-By-SA-3.0
License (MPT-7B-Chat): CC-By-NC-SA-4.0
Release date: May 2023
Paper:
Commercial use possible: YES
GitHub: https://huggingface.co/togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1
Training cost: Nearly all of the training budget was spent on the base MPT-7B model, which took ~9.5 days to train on 440xA100-40GB GPUs, and cost ~$200k
Parameters: 3B, 7B
Origin: "Official" version of the Open Source recreation of LLaMA + chat/instruction-tuned versions
License: Apache 2.0
Release date: May 2023
Paper:
Commercial use possible: YES
GitHub: https://huggingface.co/togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1
Training cost: The training of the first collection of RedPajama-INCITE models is performed on 3,072 V100 GPUs provided as part of the INCITE compute grant on Summit supercomputer at the Oak Ridge Leadership Computing Facility (OLCF).
๐ฆ OpenAlpaca
Parameters: 7B
Origin: An instruction tuned version of OpenLLaMA
License: Apache 2.0
Release date: May 2023
Paper:
Commercial use possible: YES
GitHub: https://github.com/yxuansu/OpenAlpaca
๐ฆ OpenLLaMA
Parameters: 7B
Origin: A claimed recreation of Meta's LLaMA without the licensing restrictions
License: Apache 2.0
Release date: May 2023
Paper:
Commercial use possible: YES
GitHub: https://github.com/openlm-research/open_llama
๐ช Camel
Parameters: 5B, (20B coming)
Origin: Writer
License: Apache 2.0
Release date: April 2023
Paper:
Commercial use possible: YES
GitHub: https://github.com/basetenlabs/camel-5b-truss
๐๏ธ Palmyra
Parameters: 5B, (20B coming)
Origin: Writer
License: Apache 2.0
Release date: April 2023
Paper:
Commercial use possible: YES
GitHub:
๐ StableLM
Parameters: 3B, 7B, (15B, 65B coming)
Origin: Stability.ai
License: CC BY-SA-4.0
Release date: April 2023
Paper:
Commercial use possible: YES
GitHub: https://github.com/Stability-AI/StableLM
๐งฑ Databricks Dolly 2
Parameters: 12B
Origin: Databricks, an instruction tuned version of EleutherAI pythia
License: CC BY-SA-4.0
Release date: April 2023
Paper:
Commercial use possible: YES
GitHub: https://github.com/databrickslabs/dolly
Training cost: Databricks cite "for thousands of dollars and in a few hours, Dolly 2.0 was built by fine tuning a 12B parameter open-source model (EleutherAI's Pythia) on a human-generated dataset of 15K Q&A pairs". This, of course, is just for the fine-tuning and the cost of training the underlying Pythia model also needs to be taken into account when estimating total training cost.
๐ฆ Vicuna
Parameters: 13B
Origin: UC Berkeley, CMU, Stanford, and UC San Diego
License: Requires access to LlaMA, trained on https://sharegpt.com conversations that potentially breaches OpenAI license
Release date: April 2023
Paper:
Commercial use possible: NO
GitHub: https://github.com/lm-sys/FastChat
๐ง Cerebras-GPT
Parameters: 111M, 256M, 590M, 1.3B, 2.7B, 6.7B, and 13B
Origin: Cerebras
License: Apache 2.0
Release date: March 2023
Paper: https://arxiv.org/abs/2304.03208
Commercial use possible: YES
๐ฆ Stanford Alpaca
Parameters: 7B
Origin: Stanford, based on Meta's LLaMA
License: Requires access to LlaMA, trained on GPT conversations against OpenAI license
Release date: March 2023
Paper:
Commercial use possible: NO
GitHub: https://github.com/tatsu-lab/stanford_alpaca
Training cost: Replicate posted a blog post where they replicated the Alpaca fine-tuning process. They used 4x A100 80GB GPUs for 1.5 hours. For total training cost, the cost of training the underlying LLaMA model also needs to be taken into account.
๐ฎ EleutherAI pythia
Parameters: 70M, 160M, 410M, 1B, 1.4B, 2.8B, 6.9B, 12B
Origin: EleutherAI
License: Apache 2.0
Release date: February 2023
Paper: https://arxiv.org/pdf/2304.01373.pdf
Commercial use possible: YES
๐ฆ LLaMA
Parameters: 7B, 33B, 65B
Origin: Meta
License: Model weights available for non-commercial use by application to Meta
Release date: February 2023
Paper: https://arxiv.org/abs/2302.13971
Commercial use possible: NO
Training cost: Meta cite "When training a 65B-parameter model, our code processes around 380 tokens/sec/GPU on 2048 A100 GPU with 80GB of RAM. This means that training over our dataset containing 1.4T tokens takes approximately 21 days... Finally, we estimate that we used 2048 A100-80GB for a period of approximately 5 months to develop our models." However, that cost is for all the different model sizes combined. Separately in the LLaMA paper Meta cite 1,022,362 GPU hours on A100-80GB GPUs.
๐ธ Bloom
Parameters: 176B
Origin: BigScience
License: BigScience Rail License
Release date: July 2022
Paper: https://arxiv.org/abs/2211.05100
Commercial use possible: YES
๐ด Google PaLM
Parameters: 540B
Origin: Google
License: Unknown - only announcement of intent to open
Release date: April 2022
Paper: https://arxiv.org/abs/2204.02311
Commercial use possible: Awaiting more information
๐ค GPT-NeoX-20B
Parameters: 20B
Origin: EleutherAI
License: Apache 2.0
Release date: January 2022
Paper: https://aclanthology.org/2022.bigscience-1.9/
Commercial use possible: YES
GitHub: https://github.com/EleutherAI/gpt-neox
๐ค GPT-J
Parameters: 6B
Origin: EleutherAI
License: Apache 2.0
Release date: June 2021
Paper:
Commercial use possible: YES
๐ฎ Google FLAN-T5
Parameters: 80M, 250M, 780M, 3B, 11B
Origin: Google
License: Apache 2.0
Release date: October 2021
Paper: https://arxiv.org/pdf/2210.11416.pdf
Commercial use possible: YES
GitHub: https://github.com/google-research/t5x
๐ฆ IBM Dromedary
Parameters: 7B, 13B, 33B and 65B
Origin: IBM, based on Meta's LLaMA
License: GNU General Public License v3.0
Release date:
Paper: https://arxiv.org/abs/2305.03047
Commercial use possible: NO
GitHub: https://github.com/IBM/Dromedary
These commercial models are generally available through some form of usage-based payment model - you use more, you pay more.
GPT-4
Parameters: undeclared
Availability: Wait-list https://openai.com/waitlist/gpt-4-api
Fine-tuning: No fine-tuning yet available or announced.
Paper: https://arxiv.org/abs/2303.08774
Pricing: https://openai.com/pricing
Endpoints: Chat API endpoint, which also serves as a completions endpoint.
Privacy: Data from API calls not collected or used to train models https://openai.com/policies/api-data-usage-policies
GPT-3.5
Parameters: undeclared (GPT-3 had 175B)
Availability: GA
Fine-tuning: Yes, fine-tuning available through APIs.
Paper: https://arxiv.org/pdf/2005.14165.pdf
Pricing: https://openai.com/pricing
Endpoints: A variety of endpoints available, including: chat, embeddings, fine-tuning, moderation, completions.
Privacy: Data from API calls not collected or used to train models.
ChatGPT
Parameters: undeclared (uses GPT-3.5 model)
Availability: GA
Fine-tuning: N/A - consumer web-based solution.
Paper:
Pricing: https://openai.com/pricing
Endpoints: N/A - consumer web-based solution.
Privacy: Data submitted on the web-based ChatGPT service is collected and used to train models https://openai.com/policies/api-data-usage-policies
Jurassic-2
Parameters: undeclared (jurassic-1 had 178B)
Availability: GA
Fine-tuning: Yes, fine-tuning available through APIs.
Paper:
Pricing: https://www.ai21.com/studio/pricing
Endpoints: A variety of endpoints available, including: task-specific endpoints including paraphrase, gramtical errors, text improvements, summarisation, text segmentation, contextual answers.
Privacy:
Claude
Parameters: undeclared
Availability: Waitlist https://www.anthropic.com/product
Fine-tuning: Not standard, large enterprise may contact via https://www.anthropic.com/earlyaccess to discuss.
Paper: https://arxiv.org/abs/2204.05862
Pricing: https://cdn2.assets-servd.host/anthropic-website/production/images/apr-pricing-tokens.pdf
Endpoints: Completions endpoint.
Privacy: Data sent to/from is not used to train models unless feedback is given - https://vault.pactsafe.io/s/9f502c93-cb5c-4571-b205-1e479da61794/legal.html#terms
Google Bard
Parameters: 770M
Availability: Waitlist https://bard.google.com
Fine-tuning: No
Paper:
Pricing:
Endpoints: Consumer UI only, API via PaLM
Privacy:
Google PaLM API
Parameters: Upto 540B
Availability: Announced but not yet available โ https://blog.google/technology/ai/ai-developers-google-cloud-workspace/
Fine-tuning: unknown
Paper: https://arxiv.org/abs/2204.02311
Pricing: unknown
Endpoints: unknown
Privacy: unknown
Amazon Titan
Parameters: unknown
Availability: Announced but not yet available โ https://aws.amazon.com/bedrock/titan/ai-developers-google-cloud-workspace/
Fine-tuning: unknown
Paper:
Pricing: unknown
Endpoints: unknown
Privacy: unknown
Cohere
Parameters: 52B
Availability: GA
Fine-tuning:
Paper:
Pricing: https://cohere.com/pricing
Endpoints: A variety of endpoints including embedding, text completion, classification, summarisation, tokensisation, language detection.
Privacy: Data submitted is used to train models - https://cohere.com/terms-of-use
Granite
Parameters: 13B, 20B
Availability: GA (granite.13b - .instruct, .chat; granite.20b.code - .ansible, .cobol; other variants on roadmap)
Fine-tuning: Currently Prompt Tuning via APIs
Paper: https://www.ibm.com/downloads/cas/X9W4O6BM
Pricing: https://www.ibm.com/products/watsonx-ai/pricing
Endpoints: Various endpoints - Q&A; Generate; Extract; Summarise; Classify
Privacy: IBM curated 6.48 TB of data before pre-processing, 2.07 TB after pre-processing. 1T tokens generated from a total of 14 datasets. Detail in paper. Prompt data is not saved by IBM for other training purposes. Users have complete control in their storage area of any saved prompts, prompt sessions, or prompt tuned models.
Training cost: granite.13b trained on 256 A100 for 1056 GPU hours.
Legal: IBM indemnifies customer use of these models on the watsonx platform