Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with VAD #23

Open
hunterua opened this issue Mar 21, 2024 · 9 comments
Open

Issues with VAD #23

hunterua opened this issue Mar 21, 2024 · 9 comments

Comments

@hunterua
Copy link

Hello there!
First of all thank you very much for such a great job! This is really cool project.
Currently I'm working on some hobby project that requires voice assistant. I'm using esp32-s3-devkitc-1 board.
If I use this part of config:

external_components:
  - source:
      type: git
      url: https://github.com/gnumpi/esphome_audio
      ref: main

    components: [ adf_pipeline, i2s_audio ]

The media_player works great, but not the mic (INMP411)... VA hangs on forever WAITING_FOR_VAD state...
I have to really SHOUT at few centimetres from mic to proceed VA into the streaming state... then VA complete pipeline successfully and gets back to WAITING_FOR_VAD state...

VA config option: "vad_threshold: 3" is not allowed :(

If I change config into the following:

external_components:
  - source:
      type: git
      url: https://github.com/gnumpi/esphome_audio
      ref: 13-apply-idf_v44_freertospatch-at-build-time

    components: [ adf_pipeline, i2s_audio ]

Then VAD is deactivated at all and VA works smoothly. But it constantly stream the sound from mic... which is also not great in quiet room.

Later, I've realised that this repo also include voice_assistant component, so with this config:

external_components:
  - source:
      type: git
      url: https://github.com/gnumpi/esphome_audio
      ref: main
    components: [ adf_pipeline, i2s_audio, voice_assistant ]

voice_assistant:
  id: va
  microphone: adf_microphone 
  # speaker: external_speaker
  media_player: adf_media_player
  use_wake_word: true
  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 3.0
  vad_threshold: 0


I still basically shout into the mic...

I don't know if this is mic issue or something else, but other project examples where VAD was involved worked much smoothly and react better when sound appear in quiet room.

Thanks again for the great job!

@gnumpi
Copy link
Owner

gnumpi commented Mar 21, 2024

hey, thanks for reporting! Actually I ran into that problem earlier as well. I disabled the VAD again for now and will check why it is not working. Please try the latest version in the main branch.

@hunterua
Copy link
Author

Hi!
Yes, i see the changes, it's working without VAD right now on the main brach.
Any idea what the issue with VAD threshold ?

@gnumpi
Copy link
Owner

gnumpi commented Mar 21, 2024

not yet ;)

@gnumpi
Copy link
Owner

gnumpi commented Mar 23, 2024

I think I found the problem, could you try if my modification helps for you? Its in a separate branch which is called 'adf-test':

external_components:
  - source:
      type: git
      url: https://github.com/gnumpi/esphome_audio
      ref: vad-test
    components: [ adf_pipeline, i2s_audio, voice_assistant ]

also if you are using the INMP411, please make sure to set the bits_per_sample to 32bit:

adf_pipeline:
  - platform: i2s_audio
    type: sink
    id: adf_i2s_out
    i2s_audio_id: i2s_out
    i2s_dout_pin: GPIO10

  - platform: i2s_audio
    type: source
    id: adf_i2s_in
    i2s_audio_id: i2s_in
    i2s_din_pin: GPIO4
    channel: left
    sample_rate: 16000
    bits_per_sample: 32bit


microphone:
  - platform: adf_pipeline
    id: adf_microphone
    pipeline:
      - adf_i2s_in
      - self

@hunterua
Copy link
Author

Hi!
Thank you for the response!
Yes, I confirm it working much better. I would say - "as expected"
However, in my config I've also had:

micro_wake_word:
  model: github://esphome/micro-wake-word-models/models/okay_nabu.json
  on_wake_word_detected:
   - logger.log: ">>>>> WAKE DETECTED >>>>>>>"
   - media_player.stop

Which is stopped to work as I see ((
may be its because of 32bit in microphone, or something else...

@gnumpi
Copy link
Owner

gnumpi commented Mar 23, 2024

good to hear that VAD is working now, but MWW should work as well. Actually it should work both with 32 bit, I will have a look ...

@gnumpi
Copy link
Owner

gnumpi commented Mar 24, 2024

I tried the VAD branch with MWW and 32bit mic settings and it works just fine. Can you give some more details into which problems you run?

@hunterua
Copy link
Author

Hi there!
Yes, I realised that I was in the middle of project code changes and MWW was disabled. I can confirm that MWW with VAD branch working. However, may be I didn't notice before, but when VA is active and streaming microphone to server MWW do not trigger. I'm not sure if it's correct behaviour.
Thank you anyway!

@gnumpi
Copy link
Owner

gnumpi commented Mar 25, 2024

yes, thats expected behaviour. As soon as MWW detects the wake word it goes to idle and waits for a start command to start listening for wake words again. Usually the VA is configured such, that it restarts the MWW when it is done:

micro_wake_word:
  model: okay_nabu
  on_wake_word_detected:
      - media_player.stop:
      - light.turn_on:
          id: led
          effect: "Slow Pulse"
      - voice_assistant.start:

voice_assistant:
  microphone: adf_microphone
  media_player: adf_media_player
  use_wake_word: false

  noise_suppression_level: 4
  auto_gain: 31dBFS
  volume_multiplier: 4.0

  on_client_connected:
    - lambda: id(init_in_progress) = false;
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - micro_wake_word.start:
          - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
        else:
          - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};

  on_client_disconnected:
    - lambda: id(voice_assistant_phase) = ${voice_assist_not_ready_phase_id};
    - voice_assistant.stop
    - micro_wake_word.stop

  on_end:
      then:
        - light.turn_off:
            id: led
        - voice_assistant.stop
        - wait_until:
            not:
              voice_assistant.is_running:
        - if:
            condition:
              switch.is_on: use_wake_word
            then:
              - micro_wake_word.start:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants