Added documentation and fixed sample for Windows.

Uralstech · Jun 25, 2024 · b688c7b · b688c7b
1 parent 574490d
commit b688c7b
Show file tree

Hide file tree

Showing 14 changed files with 291 additions and 112 deletions.
diff --git a/README.md b/README.md
@@ -1,3 +1,41 @@
 ## UBhashini
 
-A C# wrapper for the ULCA Bhashini API.
+A C# wrapper for the ULCA Bhashini API.
+
+### Installation
+
+This *should* work on any reasonably modern Unity version. Built and tested in Unity 2022.3.29f1.
+
+#### From OpenUPM Through Unity Package Manager
+
+1. Open project settings
+2. Select `Package Manager`
+3. Add the OpenUPM package registry:
+    - Name: `OpenUPM`
+    - URL: `https://package.openupm.com`
+    - Scope(s)
+        - `com.uralstech`
+        - *`com.utilities`
+4. Open the Unity Package Manager window (`Window` -> `Package Manager`)
+5. Change the registry from `Unity` to `My Registries`
+6. Add the `UBhashini`, *`Utilities.Encoder.Wav` and *`Utilities.Audio` packages
+
+#### From GitHub Through Unity Package Manager
+
+1. Open the Unity Package Manager window (`Window` -> `Package Manager`)
+2. Select the `+` icon and `Add package from git URL...`
+3. Paste the UPM branch URL and press enter:
+    - `https://github.com/Uralstech/UBhashini.git#upm`
+
+*\*Adding additional dependencies:*<br/>
+Follow the steps detailed in the OpenUPM installation method and only install the *`Utilities.Encoder.Wav` and *`Utilities.Audio` packages.
+
+*Optional, but required if you don't want to bother with encoding your AudioClips into Base64 strings manually, or, if you want to use the samples.
+
+### Documentation
+
+See <https://github.com/Uralstech/UBhashini/blob/master/UBhashini/Packages/com.uralstech.ubhashini/Documentation~/README.md>.
+
+---
+
+Made with the help of the [*great documentation by Himanshu Gupta!*](https://bhashini.gitbook.io/bhashini-apis)
diff --git a/UBhashini/Packages/com.uralstech.ubhashini/CHANGELOG.md b/UBhashini/Packages/com.uralstech.ubhashini/CHANGELOG.md
diff --git a/UBhashini/Packages/com.uralstech.ubhashini/Documentation~/README.md b/UBhashini/Packages/com.uralstech.ubhashini/Documentation~/README.md
@@ -1,3 +1,115 @@
-## UBhashini
+## UBhashini Documentation
 
-A C# wrapper for the ULCA Bhashini API.
+### Setup
+
+Add an instance of `BhashiniApiManager` to your scene, and set it up with your ULCA user ID and API key, as detailed in the [*Bhashini documentation*](https://bhashini.gitbook.io/bhashini-apis/pre-requisites-and-onboarding).
+
+### Pipelines
+
+As from the [*Bhashini documentation*](https://bhashini.gitbook.io/bhashini-apis):
+> ULCA Pipeline is a set of tasks that any specific pipeline supports. For example, any specific pipeline (identified by unique pipeline ID) can support the following:
+> 
+> - only ASR (Speech To Text)
+> - only NMT (Translate)
+> - only TTS
+> - ASR + NMT
+> - NMT + TTS
+> - ASR + NMT + TTS
+> 
+> Our R&D institutes can create pipelines using any of the available models on ULCA. 
+
+Basically, computation (STT, TTS, Translate) is done on a "pipeline". A "pipeline" is set to support a list of tasks, in a defined order, like:
+
+- (input: audio) STT -> Translate (output: text)
+- (input: text) Translate -> TTS (output: audio)
+
+In the given examples:
+
+- Case 1 (STT -> Translate): From the given audio clip, the STT model computes text, which is sent automatically to the translate model, and text is returned.
+- Case 2 (Translate -> TTS): From the given text, the translate model computes text, which is sent automatically to the TTS model, and audio is returned.
+
+You can have any combination of these tasks, or just individual ones. You can even have tasks like:
+
+- STT -> Translate -> TTS!
+
+#### Code
+
+So, before we do any computation, we have to set up our pipelines:
+
+```csharp
+using Uralstech.UBhashini;
+using Uralstech.UBhashini.Data;
+
+// This example shows a pipeline configured for a set of tasks which will receive spoken English audio
+// as input, transcribe and translate it to Hindi, and finally convert the text to spoken Hindi audio.
+
+BhashiniPipelineConfigResponse response = await BhashiniApiManager.Instance.ConfigurePipeline(new BhashiniPipelineTask[]
+{
+    BhashiniPipelineTask.GetConfigurationTask(BhashiniPipelineTaskType.SpeechToText, "en"), // Here, "en" is the source language.
+    BhashiniPipelineTask.GetConfigurationTask(BhashiniPipelineTaskType.TextTranslation, "en", "hi"), // Here, "en" is still the source language, but "hi" is the target language.
+    BhashiniPipelineTask.GetConfigurationTask(BhashiniPipelineTaskType.TextToSpeech, "hi"), // Here, the source language is "hi".
+});
+```
+
+The Bhashini API follows the [*ISO-639*](https://www.loc.gov/standards/iso639-2/php/code_list.php) standard for language codes.
+
+The API wrapper class, `BhashiniApiManager`, usually returns `null` in if a request fails. Check the debug window or logs for errors in such cases.
+
+Now, we store the computation inference data in variables:
+
+```csharp
+BhashiniPipelineInferenceData _inferenceData = response.PipelineEndpoint;
+
+BhashiniPipelineData _sttData = response.PipelineResponseConfig[0].Data[0];
+BhashiniPipelineData _translateData = response.PipelineResponseConfig[1].Data[0];
+BhashiniPipelineData _ttsData = response.PipelineResponseConfig[2].Data[0];
+```
+
+Here, as we specified the expected source and target languages for each task in the pipeline, we know the order of pipeline configurations in `PipelineResponseConfig`.
+This may not always be the case. It is recommended to check the array of configurations for the desired model(s).
+
+### Computation
+
+Now that we have the inference data and pipelines configured, we can go straight into computation.
+
+#### Code
+
+```csharp
+_audioClip = ...
+_audioSource = ...
+
+BhashiniPipelineTask[] tasks = new BhashiniPipelineTask[]
+{
+    _sttData.GetSpeechToTextTask(),
+    _translateData.GetTextTranslateTask(),
+    _ttsData.GetTextToSpeechTask(BhashiniVoiceType.Male),
+};
+
+BhashiniComputeResponse response = await BhashiniApiManager.Instance.ComputeOnPipeline(_inferenceData, tasks, audioSource: _audioClip);
+
+AudioClip result = await response.GetTextToSpeechResult();
+_audioSource.PlayOneShot(result);
+```
+
+`ComputeOnPipeline` accepts three optional parameter:
+- `textSource` - This is for text-input-based tasks, like Translate or TTS.
+- `audioSource` - This is for audio-input-based tasks, like STT. This parameter also requires the `Utilities.Encoder.Wav` and `Utilities.Audio` packages.
+- `rawBase64AudioSource` - This is also for audio-input-based tasks, but takes the raw Base64-encoded audio data. You will have to encode your audio manually.
+
+You must only provide one of the parameters at a time, based on the first task given to the pipeline.
+
+Also, `GetSpeechToTextTask` takes an optional `sampleRate` argument. By default, it is 44100, but make sure it matches with your audio data.
+
+`BhashiniComputeResponse` contains three utility functions to help extract the actual text or audio response:
+- `GetSpeechToTextResult`
+- `GetTextTranslateResult` and
+- `GetTextToSpeechResult`
+
+You should call them based on the last task in the pipeline's task list. If your pipeline's last task is STT, use `GetSpeechToTextResult`.
+If the last task is translate, use `GetTextTranslateResult`.
+
+`ComputeOnPipeline` and `GetTextToSpeechResult` will throw `BhashiniAudioIOException` errors if they encounter an unsupported format.
+
+---
+
+And that's it! You've learnt how to use the Bhashini API in Unity!
diff --git a/UBhashini/Packages/com.uralstech.ubhashini/Samples~/ASR-Translate-TTS.meta b/UBhashini/Packages/com.uralstech.ubhashini/Samples~/ASR-Translate-TTS.meta
diff --git a/UBhashini/Packages/com.uralstech.ubhashini/Samples~/ASR-Translate-TTS/Scenes.meta b/UBhashini/Packages/com.uralstech.ubhashini/Samples~/ASR-Translate-TTS/Scenes.meta
diff --git a/...es/com.uralstech.ubhashini/Samples~/ASR-Translate-TTS/Scenes/ASR-Translate-TTS_Demo.unity b/...es/com.uralstech.ubhashini/Samples~/ASR-Translate-TTS/Scenes/ASR-Translate-TTS_Demo.unity
@@ -2633,7 +2633,7 @@ MonoBehaviour:
   m_GameObject: {fileID: 1214495822}
   m_Enabled: 1
   m_EditorHideFlags: 0
-  m_Script: {fileID: 11500000, guid: 44079af7b9d52724eb845b2b44230d8f, type: 3}
+  m_Script: {fileID: 11500000, guid: 578eb5db91665c74d8f273702776cc17, type: 3}
   m_Name: 
   m_EditorClassIdentifier: 
   _audioSource: {fileID: 1214495828}

diff --git a/...com.uralstech.ubhashini/CHANGELOG.md.meta → .../Scenes/ASR-Translate-TTS_Demo.unity.meta b/...com.uralstech.ubhashini/CHANGELOG.md.meta → .../Scenes/ASR-Translate-TTS_Demo.unity.meta
diff --git a/UBhashini/Packages/com.uralstech.ubhashini/Samples~/ASR-Translate-TTS/Scripts.meta b/UBhashini/Packages/com.uralstech.ubhashini/Samples~/ASR-Translate-TTS/Scripts.meta