Python常用音频库|江阴雨辰互联

2023年7月10日发(作者：)

Python常⽤⾳频库之前做⼀个Python项⽬，跟⾳频有关，具体包括录⾳、绘制波形以及特征分析等。为此接触到⼏个⾳频库，简单总结⼀下。sounddevice与soundfilePython sounddevice库提供了⾳频设备的查询、设置接⼝，以及⾳频流的输⼊（录⾳）和输出（播放功能）。函数接⼝简单易⽤，详细⽂档可参考。sounddevice库底层使⽤PortAudio库的接⼝，⽽PortAudio本⾝就⽀持多个平台的⾳频接⼝，如Linux的ALSA，macOS的Core Audio，以及Windows的多种API。详见。sounddevice在播放和录⾳时，⽤到了numpy数组。sounddevice中的⾳频流其实就是numpy数组流。如果要操作存储在磁盘上的⾳频⽂件，就涉及到numpy数组（流）和存储格式之间的转换。这时就要⽤到另⼀个⼯具库，即soundfile。利⽤soundfile库，可以⽅便地将numpy数组存储到⾳频⽂件或者将⾳频⽂件加载到numpy数组中。关于soundfile库，详细⽂档可参考。因此soundfile库常常和soundfile库⼀起配合使⽤。安装安装sounddevice：pip install sounddevice安装soundfile:pip install soundfile设备管理安装好sounddevice后，直接运⾏该模块就会列出系统中可⽤的⾳频设备。这也可以⽤来测试sounddevice库是否⼯作正常。⽐如在本⼈的macOS上，其结果如下。 python3 -m sounddevice> 0 Built-in Microphone, Core Audio (2 in, 0 out)< 1 Built-in Output, Core Audio (0 in, 2 out)当然也可以在代码⾥列举⾳频设备，只需要调⽤query_devices():#!/usr/bin/env python3import sounddevice as sdprint(_devices())python3 sd_> 0 Built-in Microphone, Core Audio (2 in, 0 out)< 1 Built-in Output, Core Audio (0 in, 2 out)设置参数在使⽤sounddevice库进⾏播放或者录制⾳频前，可以设置设备的参数，包括声道数、数据类型、采样率、延迟等。设备设置保存在sounddevice的default对象⾥。>>> import sounddevice as sd>>> print()[0, 1]>>> print(rate)None>>> print(ls)[None, None]>>> print()['float32', 'float32']>>> print(y)['high', 'high']播放与录⾳sounddevice库的播放与录⾳接⼝有三种形式：最简单的play()、rec()以及playrec()⽤于播放或者录制numpy数组指定的数据。play(data, samplerate=None, mapping=None, blocking=False, loop=False, **kwargs) Play back a NumPy array containing audio (frames=None, samplerate=None, channels=None, dtype=None, out=None, mapping=None, blocking=False, **kwargs) Record audio data into a NumPy c(data, samplerate=None, channels=None, dtype=None, out=None, input_mapping=None, output_mapping=None, blocking=False, **kwargs) Simultaneous playback and recording of NumPy arrays.使⽤numpy数组的⾳频流，包括Stream、InputStream以及OutputStream。使⽤Stream模式时，⽀持回调⽅式。使⽤buffer的原始数据流，包括RawStream、RawInputStream以及RawOutputStream。sounddevice库的帮助⽂档⾥给了丰富的例⼦，复制⽰例代码稍加修改基本就可应⽤于实际项⽬中，详见。因此这⾥就不再举例。PyAudioPyAudio是另⼀个基于PortAudio的Python⾳频I/O库，其功能与sounddevice类似。PyAudio库主要通过两个类来实现功能，即PyAudio类和Stream类。安装pip install pyaudio使⽤PyAudio类主要⽤于初始化⾳频库，管理⾳频设备，打开和关闭⾳频流。class PyAudio() | Python interface to PortAudio. Provides methods to: | - initialize and terminate PortAudio | - open and close streams | - query and inspect the available PortAudio Host APIs | - query and inspect the available PortAudio audio | devices | | Use this class to open and close streams. | | **Stream Management** | :py:func:`open`, :py:func:`close` | | **Host API** | :py:func:`get_host_api_count`, :py:func:`get_default_host_api_info`, | :py:func:`get_host_api_info_by_type`, | :py:func:`get_host_api_info_by_index`, | :py:func:`get_device_info_by_host_api_device_index` | | **Device API** | :py:func:`get_device_count`, :py:func:`is_format_supported`, | :py:func:`get_default_input_device_info`, | :py:func:`get_default_output_device_info`, | :py:func:`get_device_info_by_index` | | **Stream Format Conversion** | :py:func:`get_sample_size`, :py:func:`get_format_from_widthStream类则表⽰⼀个流，可⽤于播放/录制或者停⽌⾳频流。class Stream() | | PortAudio Stream Wrapper. Use :py:func:`` to make a new | :py:class:`Stream`. | | **Opening and Closing** | :py:func:`__init__`, :py:func:`close` | | **Stream Info** | :py:func:`get_input_latency`, :py:func:`get_output_latency`, | :py:func:`get_time`, :py:func:`get_cpu_load` | | **Stream Management** | :py:func:`start_stream`, :py:func:`stop_stream`, :py:func:`is_active`, | :py:func:`is_stopped` | | **Input Output** | :py:func:`write`, :py:func:`read`, :py:func:`get_read_available`, | :py:func:`get_write_available` | Stream(PA_manager, rate, channels, format, input=False, output=False, input_device_index=None, output_device_index=None, frames_per_buffer=1024, starPyAudio的⽂档也提供了基本的例⼦，详见。librosalibrosa库是⼀个强⼤的⾳频处理Python库。它提供了⾳频处理的基本⽅法，包括时域⽅法和频域⽅法，还提供了⾳频特征提取以及可视化等功能。librosa库的⽂档给出了API的详细使⽤说明，并且还给了⽰例代码，，这⾥仅列举⼏个常⽤的API。安装pip install librosa⾳频I/O加载⽂件(path, sr=22050, mono=True, offset=0.0, duration=None, dtype=, res_type='kaiser_best')重采样le(y, orig_sr, target_sr, res_type='kaiser_best', fix=True, scale=False, **kwargs)

特征提取(y=None, sr=22050, S=None, n_mfcc=20, dct_type=2, norm='ortho', lifter=0, **kwargs)梅尔频谱ctrogram(y=None, sr=22050, S=None, n_fft=2048, hop_length=512, win_length=None, window='hann', center=True, pad_mode='re过零率_crossing_rate(y, frame_length=2048, hop_length=512, center=True, **kwargs)数据可视化时域波形ot(y, sr=22050, max_points=50000.0, x_axis='time', offset=0.0, max_sr=1000, ax=None, **kwargs)频谱图ow(data, x_coords=None, y_coords=None, x_axis=None, y_axis=None, sr=22050, hop_length=512, fmin=None, fmax=None, tuning=pyAudioAnalysispyAudioAnalysis是另⼀个⾳频分析库，同样可提取各种⾳频特征，在某些功能上与librosa相同。项⽬主页。spafespafe库的详细信息见。前⾯的两个库librosa和pyAudioAnalysis提供了⾳频处理的API，对于特定的⾳频数据，⽤户需要调⽤这些API来分割⾳频，然后对每个⾳频帧获取特征。⽽spafe则更是更容易使⽤的特征提取库，直接⼀⾏代码就能提取⼀个特征。

发布者：admin，转转请注明出处：http://www.yc00.com/news/1688931161a184769.html