Hearing Aid Research Data Set for Acoustic Environment Recognition (HEAR-DS)
Hearing Aid Research Data Set for Acoustic Environment Recognition
(Andreas Hüwel, Dr. Kamil Adiloğlu and Dr. Jörg-Hendrik Bach), published at ICASSP2020
Download
HEAR-DS download link
Parts of HEAR-DS
HEAR-DS consists of this parts, for each its licensing see LICENSE.txt in subfolders:
- HEAR-DS/RawAudioCuts
- HEAR-DS/AudioSnippets
- HEAR-DS/Code
Further details see
HEAR-DS README.txt
"Your browser may have problems with correctly showing the tree structure used in the readme.txt file. For this reason, please download the readme.txt and open it in an appropriate editor."
Acoustic Environments Overview
Cocktail party | |
Interfering speakers | |
In traffic | Speech in traffic |
In vehicle | Speech in vehicle |
Music | Speech in music |
Quiet indoors | Speech in quiet indoors |
Reverberant environment | Speech in reverberant environment |
Wind turbulence | Speech in wind turbulence |
Example of Speech in Background SNR Variations
Acoustic Environment | |||||
Speech in vehicle | SNR -10 | SNR -5 | SNR 0 | SNR 5 | SNR 10 |
As described in the paper, some audio material was used from 3rd party, and thus cannot be provided here. But all the needed data is accessible online. With our provided scripts everyone can re-generate the whole data set by themselves.
Audio for interfering speakers comes from CHiME5 and the material for speech mixing for the speech in background environments comes from CHiME2. For CHiME2 (2013) and CHiME5 (2018), please contact the organizators to get access to the data sets. Audio for music comes from GTZan.
Data and Format
An acoustic environment holds audio from different recording situations. Each recording situation has a unique id (rec_id) containing one or more recording sessions. From the raw audio of each recording session we manually cut suitable audio pieces (the cuts) to fill its recording situation with audio material, each cut has a local unique cut_id. To generate the actual data set to train machine learning systems, we performed a further processing step, which produces for every acoustical environments all 10s audio samples, as further described in sub section Audio Samples.
HEAR-DS Raw Audio Cuts
For each recording situation one folder holds all cut wav files.
Folder structure of HEAR-DS samples:
Details see
HEAR-DS README.txt
With <REC_ID> being a 3 digit number and <CUT_ID> a 2 digit number. The <DESCRIPTION> could e.g. be "startengine_driveoff" for InVehicle or "bell" in ReverberantEnvironment. <TRACKNAME> stands for one of the used hearing aid microphones [Mic_BTE_L_front, Mic_BTE_L_rear, Mic_BTE_R_front, Mic_BTE_R_rear, Mic_ITC_L, Mic_ITC_R]. <EXPORTFORMAT> is the name of the used audio-exporter, currently "raw_48kHz32bit".
HEAR-DS Audio Samples
In this processing step the raw audio cuts were further sliced into 10s snippets. This 10s snippets are either used directly as background sample or are further mixed with random speech, at multiple SNRs, to create audio samples for the speech in background environments. The binaural speech source material comes from five different directions, which we randomly choose from, the start and end-time of this source speech, and the start time of the background snippet are also randomized. This 10s samples finally form the HEAR-DS audio-material for training of machine learning systems, e.g. as input for the feature-extraction step of deep neural networks.
Audio Sample Snippet File Format
The naming scheme for snippets is:
<ENV_ID>_<REC_ID>_<CUT_ID>_<SNIP_ID>_<TRACKNAME>_<SAMPLERATE>.wav
- <ENV_ID>: 2 digit id of acoustical environment, where each speech in background environment has its own id, separated from the pure background environment.
- <REC_ID>: 3 digit id of record situation.
- <CUT_ID>: 2 digit id of cut of the record situation (unique for all sessions of that situation)
- <SNIP_ID>: 3 digit id of the snippet of this cut.
- <TRACKNAME>: as described above.
- <SAMPLERATE>: in [48kHz, 16kHz]
For e.g. ReverberantEnvironment, recording situation "Oldenburg Church", first cut, first snippet the 16kHz Version the snippet filename is 06_005_00_000_BTE_L_front_16kHz.wav
Details see
HEAR-DS.README.txt
Acknowledgements
This work was supported by the German Ministry of Education and Science (BMBF), FZK 02K16C202 AUDIO-PSS.
The authors would like to thank Marei Typlt and the partners in the AUDIO-PSS project for support in designing the acoustic environments and Audifon GmbH for providing the hearing aid dummies.
Publikationen: Publikationen ansehen Public-Key: Public-Key