Large Language Models (LLMs)

The world of Large Language Models (LLMs) is complex and varied. This resource collates together the things that matter, helping to make sense of this increasing important topic.

LLM CHAT

Everyone knows ChatGPT, but do you know these others?

RESEARCH PAPERS

A selection of interesting & noteworthy research papers related to LLMs.

EDUCATION

Get skilled up with these free and paid-for courses.

BENCHMARKS

These various benchmarks are commonly used to compare LLM performance.

LEADERBOARDS

These leaderboards show how LLMs compare relative to each other.

OPEN SOURCE MODELS

Open source models are generally understood to be free to use, but some models have restrictive licensing that prohibits commercial use or restricts usage in some way. Be careful to check out the exact license for the model you want to use, making sure you understand exactly what is permissable.

🌬️ Mistral

Parameters: 7B
Origin: Mistral
License: Apache 2.0
Release date: October 2023
Paper:
Commercial use possible: YES
GitHub: https://huggingface.co/mistralai
Training cost:
Comment: Outperforms LLaMA2 13B

📏 LongChat

Parameters: 7B
Origin: UC Berkeley, CMU, Stanford, and UC San Diego
License: Apache 2.0
Release date: August 2023
Paper:
Commercial use possible: YES
GitHub: https://github.com/DachengLi1/LongChat
Training cost:
Comment: 32k context length!

🏯 Qwen

Parameters: 7B
Origin: Alibaba
License: Tongyi Qianwen
Release date: August 2023
Paper:
Commercial use possible: YES
GitHub: https://github.com/QwenLM/Qwen-7B
Training cost:

🦙 Vicuna 1.5

Parameters: 13B
Origin: UC Berkeley, CMU, Stanford, and UC San Diego
License: Apache 2.0
Release date: August 2023 (v1.5 uses LLaMA2 instead of LLaMA of prior releases)
Paper:
Commercial use possible: NO (trained on https://sharegpt.com conversations that potentially breaches OpenAI license)
GitHub: https://github.com/lm-sys/FastChat Training cost:

🐋 Stable Beluga

Parameters: 7B, 40B
Origin: Stability AI.
License: CC BY-NC-4.0
Release date: July 2023
Paper:
Commercial use possible: NO
GitHub: https://huggingface.co/stabilityai/StableBeluga2
Training cost:

🦙 LLaMA2

Parameters: 7B, 40B
Origin: Meta.
License: Llama 2 Community License Agreement
Release date: July 2023
Paper: https://arxiv.org/abs/2307.09288
Commercial use possible: YES
GitHub: https://huggingface.co/meta-llama
Training cost: A cumulative of 3.3M GPU hours of computation was performed on hardware of type A100-80GB (TDP of 400W or 350W). We estimate the total emissions for training to be 539 tCO2eq, of which 100% were directly offset by Meta’s sustainability program.

🦅 Falcon

Parameters: 7B, 40B
Origin: UAE Technology Innovation Institute.
License: Apache 2.0
Release date: May 2023
Paper:
Commercial use possible: YES
GitHub: https://huggingface.co/tiiuae/falcon-7b
GitHub: https://huggingface.co/tiiuae/falcon-7b-instruct
GitHub: https://huggingface.co/tiiuae/falcon-40b
GitHub: https://huggingface.co/tiiuae/falcon-40b-instruct
Training cost: Falcon-40B was trained on AWS SageMaker, on 384 A100 40GB GPUs in P4d instances.

🧩 MosaicML MPT-30B

Parameters: 30B
Origin: Open souce from MosaicML.
License (MPT-30B Base, Instruct): Attribution-ShareAlike 3.0 Unported
License (MPT-30B Chat): Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Release date: June 2023
Paper:
Commercial use possible: YES(Base & Instruct), NO(Chat)
GitHub: Base: https://huggingface.co/mosaicml/mpt-30b
GitHub: Instruct: https://huggingface.co/mosaicml/mpt-30b-instruct
GitHub: Chat: https://huggingface.co/mosaicml/mpt-30b-chat
Training cost: From Scratch: 512xA100-40GB, 28.3 Days, ~ $871,000.
Traingin cost: Finetune 30B Base model: 16xA100-40GB, 21.8 Hours, $871

🧩 MosaicML MPT-7B

Parameters: 7B
Origin: Open souce from MosaicML. Claimed to be competitive with LLaMA-7B. Base, storywriter, instruct, chat fine-tunings available.
License (MPT-7B Base): Apache 2.0
License (MPT-7B-StoryWriter-65k+): Apache 2.0
License (MPT-7B-Instruct): CC-By-SA-3.0
License (MPT-7B-Chat): CC-By-NC-SA-4.0
Release date: May 2023
Paper:
Commercial use possible: YES
GitHub: https://huggingface.co/togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1
Training cost: Nearly all of the training budget was spent on the base MPT-7B model, which took ~9.5 days to train on 440xA100-40GB GPUs, and cost ~$200k

🦙 Together RedPajama-INCITE

Parameters: 3B, 7B
Origin: "Official" version of the Open Source recreation of LLaMA + chat/instruction-tuned versions
License: Apache 2.0
Release date: May 2023
Paper:
Commercial use possible: YES
GitHub: https://huggingface.co/togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1
Training cost: The training of the first collection of RedPajama-INCITE models is performed on 3,072 V100 GPUs provided as part of the INCITE compute grant on Summit supercomputer at the Oak Ridge Leadership Computing Facility (OLCF).

🦙 OpenAlpaca

Parameters: 7B
Origin: An instruction tuned version of OpenLLaMA
License: Apache 2.0
Release date: May 2023
Paper:
Commercial use possible: YES
GitHub: https://github.com/yxuansu/OpenAlpaca

🦙 OpenLLaMA

Parameters: 7B
Origin: A claimed recreation of Meta's LLaMA without the licensing restrictions License: Apache 2.0
Release date: May 2023
Paper:
Commercial use possible: YES
GitHub: https://github.com/openlm-research/open_llama

🐪 Camel

Parameters: 5B, (20B coming)
Origin: Writer
License: Apache 2.0
Release date: April 2023
Paper:
Commercial use possible: YES
GitHub: https://github.com/basetenlabs/camel-5b-truss

🏛️ Palmyra

Parameters: 5B, (20B coming)
Origin: Writer
License: Apache 2.0
Release date: April 2023
Paper:
Commercial use possible: YES
GitHub:

🐎 StableLM

Parameters: 3B, 7B, (15B, 65B coming)
Origin: Stability.ai
License: CC BY-SA-4.0
Release date: April 2023
Paper:
Commercial use possible: YES
GitHub: https://github.com/Stability-AI/StableLM

🧱 Databricks Dolly 2

Parameters: 12B
Origin: Databricks, an instruction tuned version of EleutherAI pythia
License: CC BY-SA-4.0
Release date: April 2023
Paper:
Commercial use possible: YES
GitHub: https://github.com/databrickslabs/dolly
Training cost: Databricks cite "for thousands of dollars and in a few hours, Dolly 2.0 was built by fine tuning a 12B parameter open-source model (EleutherAI's Pythia) on a human-generated dataset of 15K Q&A pairs". This, of course, is just for the fine-tuning and the cost of training the underlying Pythia model also needs to be taken into account when estimating total training cost.

🦙 Vicuna

Parameters: 13B
Origin: UC Berkeley, CMU, Stanford, and UC San Diego
License: Requires access to LlaMA, trained on https://sharegpt.com conversations that potentially breaches OpenAI license
Release date: April 2023
Paper:
Commercial use possible: NO
GitHub: https://github.com/lm-sys/FastChat

🧠 Cerebras-GPT

Parameters: 111M, 256M, 590M, 1.3B, 2.7B, 6.7B, and 13B
Origin: Cerebras
License: Apache 2.0
Release date: March 2023
Paper: https://arxiv.org/abs/2304.03208
Commercial use possible: YES

🦙 Stanford Alpaca

Parameters: 7B
Origin: Stanford, based on Meta's LLaMA
License: Requires access to LlaMA, trained on GPT conversations against OpenAI license
Release date: March 2023
Paper:
Commercial use possible: NO
GitHub: https://github.com/tatsu-lab/stanford_alpaca
Training cost: Replicate posted a blog post where they replicated the Alpaca fine-tuning process. They used 4x A100 80GB GPUs for 1.5 hours. For total training cost, the cost of training the underlying LLaMA model also needs to be taken into account.

🔮 EleutherAI pythia

Parameters: 70M, 160M, 410M, 1B, 1.4B, 2.8B, 6.9B, 12B
Origin: EleutherAI
License: Apache 2.0
Release date: February 2023
Paper: https://arxiv.org/pdf/2304.01373.pdf
Commercial use possible: YES

🦙 LLaMA

Parameters: 7B, 33B, 65B
Origin: Meta
License: Model weights available for non-commercial use by application to Meta
Release date: February 2023
Paper: https://arxiv.org/abs/2302.13971
Commercial use possible: NO
Training cost: Meta cite "When training a 65B-parameter model, our code processes around 380 tokens/sec/GPU on 2048 A100 GPU with 80GB of RAM. This means that training over our dataset containing 1.4T tokens takes approximately 21 days... Finally, we estimate that we used 2048 A100-80GB for a period of approximately 5 months to develop our models." However, that cost is for all the different model sizes combined. Separately in the LLaMA paper Meta cite 1,022,362 GPU hours on A100-80GB GPUs.

🌸 Bloom

Parameters: 176B
Origin: BigScience
License: BigScience Rail License
Release date: July 2022
Paper: https://arxiv.org/abs/2211.05100
Commercial use possible: YES

🌴 Google PaLM

Parameters: 540B
Origin: Google
License: Unknown - only announcement of intent to open
Release date: April 2022
Paper: https://arxiv.org/abs/2204.02311
Commercial use possible: Awaiting more information

🤖 GPT-NeoX-20B

Parameters: 20B
Origin: EleutherAI
License: Apache 2.0
Release date: January 2022
Paper: https://aclanthology.org/2022.bigscience-1.9/
Commercial use possible: YES
GitHub: https://github.com/EleutherAI/gpt-neox

🤖 GPT-J

Parameters: 6B
Origin: EleutherAI
License: Apache 2.0
Release date: June 2021
Paper:
Commercial use possible: YES

🍮 Google FLAN-T5

Parameters: 80M, 250M, 780M, 3B, 11B
Origin: Google
License: Apache 2.0
Release date: October 2021
Paper: https://arxiv.org/pdf/2210.11416.pdf
Commercial use possible: YES
GitHub: https://github.com/google-research/t5x

🦙 IBM Dromedary

Parameters: 7B, 13B, 33B and 65B
Origin: IBM, based on Meta's LLaMA
License: GNU General Public License v3.0
Release date:
Paper: https://arxiv.org/abs/2305.03047
Commercial use possible: NO
GitHub: https://github.com/IBM/Dromedary

COMMERCIAL MODELS

These commercial models are generally available through some form of usage-based payment model - you use more, you pay more.

OpenAI

GPT-4
Parameters: undeclared
Availability: Wait-list https://openai.com/waitlist/gpt-4-api
Fine-tuning: No fine-tuning yet available or announced.
Paper: https://arxiv.org/abs/2303.08774
Pricing: https://openai.com/pricing
Endpoints: Chat API endpoint, which also serves as a completions endpoint.
Privacy: Data from API calls not collected or used to train models https://openai.com/policies/api-data-usage-policies

GPT-3.5
Parameters: undeclared (GPT-3 had 175B)
Availability: GA
Fine-tuning: Yes, fine-tuning available through APIs.
Paper: https://arxiv.org/pdf/2005.14165.pdf
Pricing: https://openai.com/pricing
Endpoints: A variety of endpoints available, including: chat, embeddings, fine-tuning, moderation, completions.
Privacy: Data from API calls not collected or used to train models.

ChatGPT
Parameters: undeclared (uses GPT-3.5 model)
Availability: GA
Fine-tuning: N/A - consumer web-based solution.
Paper:
Pricing: https://openai.com/pricing
Endpoints: N/A - consumer web-based solution.
Privacy: Data submitted on the web-based ChatGPT service is collected and used to train models https://openai.com/policies/api-data-usage-policies

AI21Labs

Jurassic-2
Parameters: undeclared (jurassic-1 had 178B)
Availability: GA
Fine-tuning: Yes, fine-tuning available through APIs.
Paper:
Pricing: https://www.ai21.com/studio/pricing
Endpoints: A variety of endpoints available, including: task-specific endpoints including paraphrase, gramtical errors, text improvements, summarisation, text segmentation, contextual answers.
Privacy:

Anthropic

Claude
Parameters: undeclared
Availability: Waitlist https://www.anthropic.com/product
Fine-tuning: Not standard, large enterprise may contact via https://www.anthropic.com/earlyaccess to discuss.
Paper: https://arxiv.org/abs/2204.05862
Pricing: https://cdn2.assets-servd.host/anthropic-website/production/images/apr-pricing-tokens.pdf
Endpoints: Completions endpoint.
Privacy: Data sent to/from is not used to train models unless feedback is given - https://vault.pactsafe.io/s/9f502c93-cb5c-4571-b205-1e479da61794/legal.html#terms

Google

Google Bard
Parameters: 770M
Availability: Waitlist https://bard.google.com
Fine-tuning: No
Paper:
Pricing:
Endpoints: Consumer UI only, API via PaLM
Privacy:

Google PaLM API
Parameters: Upto 540B
Availability: Announced but not yet available – https://blog.google/technology/ai/ai-developers-google-cloud-workspace/
Fine-tuning: unknown
Paper: https://arxiv.org/abs/2204.02311
Pricing: unknown
Endpoints: unknown
Privacy: unknown

Amazon

Amazon Titan
Parameters: unknown
Availability: Announced but not yet available – https://aws.amazon.com/bedrock/titan/ai-developers-google-cloud-workspace/
Fine-tuning: unknown
Paper:
Pricing: unknown
Endpoints: unknown
Privacy: unknown

Cohere

Cohere
Parameters: 52B
Availability: GA
Fine-tuning:
Paper:
Pricing: https://cohere.com/pricing
Endpoints: A variety of endpoints including embedding, text completion, classification, summarisation, tokensisation, language detection.
Privacy: Data submitted is used to train models - https://cohere.com/terms-of-use

IBM

Granite
Parameters: 13B, 20B
Availability: GA (granite.13b - .instruct, .chat; granite.20b.code - .ansible, .cobol; other variants on roadmap)
Fine-tuning: Currently Prompt Tuning via APIs
Paper: https://www.ibm.com/downloads/cas/X9W4O6BM
Pricing: https://www.ibm.com/products/watsonx-ai/pricing
Endpoints: Various endpoints - Q&A; Generate; Extract; Summarise; Classify
Privacy: IBM curated 6.48 TB of data before pre-processing, 2.07 TB after pre-processing. 1T tokens generated from a total of 14 datasets. Detail in paper. Prompt data is not saved by IBM for other training purposes. Users have complete control in their storage area of any saved prompts, prompt sessions, or prompt tuned models.
Training cost: granite.13b trained on 256 A100 for 1056 GPU hours.
Legal: IBM indemnifies customer use of these models on the watsonx platform

Name	Name	Last commit message	Last commit date
Latest commit jeremycaine Update README.md - additional Oct 11, 2023 af7f295 · Oct 11, 2023 History 42 Commits
LICENSE	LICENSE	Initial commit	Apr 20, 2023
README.md	README.md	Update README.md - additional	Oct 11, 2023

License

jeremycaine/awesome-llm-list

Folders and files

Latest commit

History

Repository files navigation