[Python]PyAudioを用いて発音した音声のみを録音

2024年9月12日

PyAudioでは、音声を録音することができます。
本記事では、発音した音声をマイクで録音する機能をPyAudioライブラリを用いてプログラミングします。

やすひら

発音した音声のみをPyAudioを用いて録音する方法を紹介します

この記事でわかること

PyAudioライブラリによる録音
発音/無音の判定方法

ライブラリをインストール

pipで関連ライブラリをインストールします。

コマンドライン

$ sudo apt install portaudio19-dev
$ pip install pyaudio
$ pip install wave
$ pip install numpy

関連ライブラリをインストールします。

トラブルシューティング

pyaudioは単体でインストールするとエラーが発生する場合があります。

コマンド実行例

$ pip install pyaudio
Defaulting to user installation because normal site-packages is not writeable
Collecting pyaudio
  Downloading PyAudio-0.2.14.tar.gz (47 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 47.1/47.1 KB 508.5 kB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: pyaudio
  Building wheel for pyaudio (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  × Building wheel for pyaudio (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [17 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-aarch64-3.10
      creating build/lib.linux-aarch64-3.10/pyaudio
      copying src/pyaudio/__init__.py -> build/lib.linux-aarch64-3.10/pyaudio
      running build_ext
      creating build/temp.linux-aarch64-3.10
      creating build/temp.linux-aarch64-3.10/src
      creating build/temp.linux-aarch64-3.10/src/pyaudio
      aarch64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -I/usr/local/include -I/usr/include -I/usr/include/python3.10 -c src/pyaudio/device_api.c -o build/temp.linux-aarch64-3.10/src/pyaudio/device_api.o
      src/pyaudio/device_api.c:9:10: fatal error: portaudio.h: No such file or directory
          9 | #include "portaudio.h"
            |          ^~~~~~~~~~~~~
      compilation terminated.
      error: command '/usr/bin/aarch64-linux-gnu-gcc' failed with exit code 1
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for pyaudio
Failed to build pyaudio
ERROR: Could not build wheels for pyaudio, which is required to install pyproject.toml-based projects

筆者の環境では、portaudio19-devを事前にインストールすることで、エラーが解消されました。

発音した音声のみをPyAudioで録音する

発音した音声のみをPyAudioで録音します。

ソースコード

import pyaudio
import wave
import numpy as np

# 録音の設定
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
CHUNK = 1024
SILENCE_THRESHOLD = 500  # 無音のしきい値(音量)
SILENCE_DURATION = 30  # 無音と判定する持続時間(フレーム数)

# 音声を録音する関数
def record_audio():
    audio = pyaudio.PyAudio()

    stream = audio.open(format=FORMAT, channels=CHANNELS, rate=RATE,
                        input=True, frames_per_buffer=CHUNK)

    print("Recording started...")

    frames = []
    silence_frames = 0
    recording = False

    while True:
        data = stream.read(CHUNK)
        audio_data = np.frombuffer(data, dtype=np.int16)

        # 音声が無音かどうかを判定
        if np.abs(audio_data).mean() > SILENCE_THRESHOLD:
            if not recording:
                print("Voice detected, recording...")
                recording = True
            frames.append(data)
            silence_frames = 0
        else:
            if recording:
                silence_frames += 1
                if silence_frames > SILENCE_DURATION:
                    print("Silence detected, stopping...")
                    break
            frames.append(data)

    stream.stop_stream()
    stream.close()
    audio.terminate()

    # 録音結果を保存
    wf = wave.open("output.wav", 'wb')
    wf.setnchannels(CHANNELS)
    wf.setsampwidth(audio.get_sample_size(FORMAT))
    wf.setframerate(RATE)
    wf.writeframes(b''.join(frames))
    wf.close()

    print("Recording finished and saved as 'output.wav'")

if __name__ == "__main__":
    record_audio()

numpyライブラリを用いて、無音か発音ありかを判定しています。
発音ありを検知した場合、PyAudioライブラリで録音します。
録音結果はwaveライブラリで、wavファイルを出力します。

まとめ

発音した音声のみをPyAudioを用いて録音する方法を紹介しました。

発音した音声のみの録音方法は

numpyライブラリで無音/発音を判定できる
PyAudioライブラリで録音できる
録音結果をwaveライブラリで音声ファイルとして出力できる

音声アシスタントを作成する際は、発音した音声をインプットとしたいです。
発音/無音判定を行うことで、発言者が発音した音声をインプットに、AIが回答するアプリケーションを作成することができます。

URLをコピーしました！