- Nodejs express server app to faithfully mock all important OpenAI endpoints (chat, embeddings, image, audio and models).
- All existing SDK's and apps should work as-is by changing just the endpoint and optionally api key (check
apiKeys
in config.yaml). - Helpful to save on your LLM API call costs while developing/testing agents or developing other apps. Also, helpful to return reproducible and configurable outputs while developing/testing.
- All models, their configurations, sample responses and server settings are configurable through
config.yaml
.
/chat/completions
/embeddings
/images/generations
/images/variations
/images/edits
/audio/speech
/audio/transcriptions
/audio/translations
/models
/models/{model}
You most probably are not using these
- Logits, modalities, prediction, and response_format->json_schema in
/chat/completions
. - Only
aac
output format is NOT supported in/audio/speech
endpoint. Rest all ARE supported.ffmpeg
related, nothing related to code.
- Clone the repo (
git clone https://github.com/freakynit/mock-openai-server.git
),cd
into it, and donpm i
- Install ffmpeg binary separately if audio endpoints are not working (should not be needed though).
- Start server:
node src/server.js
ornpm run server
- Set api base (and optionally api key) in your client apps or openai SDK's to:
http://localhost:8080/v1
- Checkout
src/examples.js
for example usages of all supported endpoints (start looking from bottom of the file).
This configuration file defines server settings, model configurations, and supported functionalities for text, image, audio, and embeddings.
-
General Settings:
publicFilesDirectory
: Directory for storing public/generated files (public
).server
: Host and port configuration (localhost:8080
).organizationName
: Name of the organization (my sample org
).apiKeys
: List of API keys used for authorization. Leave it empty to allow anonymous access.
-
Response Delay:
- Option to enable/disable simulated response delays.
- Configurable delay range in milliseconds.
-
Model Configurations:
- Chat Models: Multiple text-based models (e.g.,
chatgpt-4o-latest
,model-2
). Add more if needed.- Maximum token limits and sample responses are defined.
- Visual Language Models (VLM):
- Image generation models (e.g.,
dall-e-2
,dall-e-3
) with configurations for resolution, quality, and styles. - Support for generating and editing images with sample responses provided.
- Image generation models (e.g.,
- Audio Models:
- Text-to-speech (TTS) models with voice options, supported formats, and duration limits.
- Audio transcription and translation with sample responses and supported formats (e.g.,
json
,text
).
- Embeddings:
- Text embedding models with maximum token and dimension limits, supporting different encoding formats.
- Chat Models: Multiple text-based models (e.g.,
-
Functionalities:
- Text-based tools (e.g., summation, weather fetching, string reversal) using regex-based triggers.
- Extensive support for generating sample responses in text, JSON, and media (images/audio).
- Unlike other sample responses, the tool use requires one small extra step.
- You just need to declare the regex that will be used to match the prompt when tool use is requested.
- See
modelConfigs -> chat -> tools
inconfig.yaml
for examples (they are simple enough). Sample:
tools:
functions:
- functionName: "calculate_sum"
arguments: { "a": 10, "b": 20 }
regexToMatchAgainstPrompt: |
.*sum.*
- functionName: "fetch_weather"
arguments: { "location": "Boston, MA", "unit": "imperial" }
regexToMatchAgainstPrompt: |
.*weather.*
- Except for
/embeddings
endpoint, all other endpoints allow specifying sample responses in config.yaml. - For
/embeddings
, responses are always generated on the fly. But, the server makes sure to generate same tokens for the same given input. - Media endpoints:
/audio/speech
and/images/{generations,variations,edits}
also support generating media dynamically. UsegenerationFrom: generated
for these if needed. /models
endpoint simply traverses all models listed in config.yaml across different modalities and returns them.- For adding new response strategies, look into relevant source files under
src/generators
path. You just need to implement one simple method for all except/chat
, which needs two methods.
- When using dynamically generated media, these files are generated in
src/public
(more precisely:src/${publicFilesDirectory}
) by default and are NOT cleaned. - Do not forget to put
/v1
in your client SDK's or apps when configuring mock server endpoint. - Code requires some good refactoring... inputs and contributions are welcome.
Make sure to keep port same as what's mentioned in config.yaml
docker build -t mock-openai-server --build-arg PORT=8080 .
docker run -p 8080:8080 mock-openai-server