From: Supervised Attention Multi-Scale Temporal Convolutional Network for monaural speech enhancement
 | No RIR | RIR | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SNR | − 5 dB | 0 dB | 5 dB | 10 dB | 15 dB | 20 dB | \(\Delta\)Avg. | − 5 dB | 0 dB | 5 dB | 10 dB | 15 dB | 20 dB | \(\Delta\)Avg. |
Unprocessed | 1.36 | 1.57 | 1.82 | 2.23 | 2.73 | 3.22 | - | 1.42 | 1.68 | 1.97 | 2.44 | 2.98 | 3.47 | - |
CRN | 1.74 | 2.06 | 2.32 | 2.65 | 2.95 | 3.23 | 0.33 | 1.84 | 2.20 | 2.49 | 2.85 | 3.17 | 3.45 | 0.34 |
MSTCN | 1.77 | 2.14 | 2.46 | 2.87 | 3.27 | 3.60 | 0.53 | 1.86 | 2.26 | 2.62 | 3.07 | 3.49 | 3.83 | 0.53 |
LSTM-IRM | 1.99 | 2.38 | 2.71 | 3.11 | 3.46 | 3.75 | 0.74 | 2.12 | 2.56 | 2.90 | 3.34 | 3.71 | 4.00 | 0.78 |
GCRN | 1.98 | 2.34 | 2.62 | 2.93 | 3.20 | 3.42 | 0.59 | 2.04 | 2.44 | 2.76 | 3.13 | 3.42 | 3.65 | 0.60 |
GaGNet | 1.93 | 2.30 | 2.59 | 2.94 | 3.23 | 3.49 | 0.59 | 2.03 | 2.44 | 2.76 | 3.14 | 3.47 | 3.74 | 0.60 |
Conv-TasNet | 2.13 | 2.52 | 2.81 | 3.15 | 3.46 | 3.70 | 0.80 | 2.17 | 2.58 | 2.92 | 3.32 | 3.65 | 3.91 | 0.76 |
DCCRN | 2.16 | 2.59 | 2.92 | 3.29 | 3.59 | 3.85 | 0.91 | 2.24 | 2.70 | 3.07 | 3.48 | 3.81 | 4.06 | 0.90 |
DPCRN | 2.17 | 2.58 | 2.89 | 3.24 | 3.54 | 3.79 | 0.88 | 2.37 | 2.83 | 3.18 | 3.56 | 3.87 | 4.10 | 0.99 |
SA-MSTCN\(^{1}\) | 2.40 | 2.84 | 3.15 | 3.49 | 3.77 | 3.98 | 1.11 | 2.63 | 3.10 | 3.40 | 3.75 | 4.01 | 4.21 | 1.19 |
SA-MSTCN\(^{2}\) | 2.43 | 2.87 | 3.18 | 3.51 | 3.78 | 3.99 | 1.13 | 2.63 | 3.10 | 3.43 | 3.77 | 4.03 | 4.22 | 1.20 |