From: Supervised Attention Multi-Scale Temporal Convolutional Network for monaural speech enhancement
 | Training dataset | English | Mix | English\(\varvec{\rightarrow }\)Mix | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Test dataset | Metrics | PESQ | STOI (%) | SDR | PESQ | STOI (%) | SDR | \(\varvec{\Delta }\)PESQ | \(\varvec{\Delta }\)STOI (%) | \(\varvec{\Delta }\)SDR |
French | Unprocessed | 2.19 | 92.4 | 15.53 | 2.19 | 92.4 | 15.53 | - | - | - |
CRN | 2.54 | 93.1 | 19.28 | 2.63 | 93.8 | 19.77 | 0.09 | 0.7 | 0.49 | |
MSTCN | 2.91 | 94.2 | 17.51 | 2.89 | 94.3 | 17.54 | − 0.02 | 0.1 | 0.03 | |
LSTM-IRM | 2.97 | 94.9 | 20.26 | 2.98 | 94.9 | 20.35 | 0.01 | 0.0 | 0.09 | |
GCRN | 2.93 | 94.2 | 20.98 | 2.93 | 94.2 | 20.98 | 0.00 | 0.0 | 0.00 | |
GaGNet | 3.03 | 94.8 | 21.50 | 3.08 | 94.9 | 21.66 | 0.05 | 0.1 | 0.16 | |
Conv-TasNet | 3.14 | 95.2 | 22.34 | 3.06 | 94.7 | 21.95 | − 0.08 | − 0.5 | − 0.39 | |
DCCRN | 3.26 | 95.6 | 22.42 | 3.28 | 95.8 | 22.56 | 0.02 | 0.2 | 0.14 | |
DPCRN | 3.24 | 95.6 | 21.84 | 3.25 | 95.7 | 22.22 | 0.01 | 0.1 | 0.36 | |
SA-MSTCN\(^{1}\) | 3.35 | 95.5 | 21.34 | 3.38 | 95.8 | 21.93 | 0.03 | 0.3 | 0.59 | |
SA-MSTCN\(^{2}\) | 3.36 | 95.7 | 21.56 | 3.40 | 96.0 | 22.28 | 0.04 | 0.3 | 0.72 | |
Spanish | Unprocessed | 2.24 | 93.6 | 15.48 | 2.24 | 93.6 | 15.48 | - | - | - |
CRN | 2.59 | 94.2 | 18.39 | 2.67 | 95.1 | 19.21 | 0.08 | 0.9 | 0.82 | |
MSTCN | 2.84 | 94.2 | 16.35 | 2.85 | 95.2 | 16.69 | 0.01 | 1.0 | 0.34 | |
LSTM-IRM | 2.91 | 95.5 | 19.17 | 2.95 | 95.7 | 19.68 | 0.04 | 0.2 | 0.51 | |
GCRN | 2.87 | 95.5 | 20.37 | 2.87 | 95.5 | 20.37 | 0.00 | 0.0 | 0.00 | |
GaGNet | 2.91 | 95.7 | 20.16 | 3.01 | 96.0 | 20.57 | 0.10 | 0.3 | 0.41 | |
Conv-TasNet | 3.05 | 96.0 | 20.85 | 2.99 | 93.6 | 20.74 | − 0.06 | − 2.4 | − 0.11 | |
DCCRN | 3.18 | 96.4 | 21.07 | 3.23 | 96.6 | 21.55 | 0.05 | 0.2 | 0.48 | |
DPCRN | 3.20 | 96.4 | 21.09 | 3.21 | 96.6 | 21.49 | 0.01 | 0.2 | 0.40 | |
SA-MSTCN\(^{1}\) | 3.28 | 96.5 | 21.03 | 3.31 | 96.6 | 21.22 | 0.03 | 0.1 | 0.19 | |
SA-MSTCN\(^{2}\) | 3.30 | 96.6 | 21.59 | 3.33 | 96.7 | 21.69 | 0.03 | 0.1 | 0.10 | |
Japanese | Unprocessed | 1.96 | 92.3 | 13.75 | 1.96 | 92.3 | 13.75 | - | - | - |
CRN | 2.33 | 92.8 | 17.77 | 2.34 | 93.1 | 17.86 | 0.01 | 0.3 | 0.09 | |
MSTCN | 2.42 | 93.0 | 15.60 | 2.49 | 93.2 | 15.75 | 0.07 | 0.2 | 0.15 | |
LSTM-IRM | 2.63 | 94.1 | 18.28 | 2.65 | 94.3 | 18.37 | 0.02 | 0.2 | 0.09 | |
GCRN | 2.52 | 93.5 | 18.88 | 2.57 | 93.5 | 18.89 | 0.05 | 0.0 | 0.01 | |
GaGNet | 2.59 | 93.7 | 17.87 | 2.67 | 93.8 | 19.00 | 0.08 | 0.1 | 1.13 | |
Conv-TasNet | 2.71 | 94.1 | 19.68 | 2.66 | 93.7 | 19.23 | − 0.05 | − 0.4 | − 0.45 | |
DCCRN | 2.83 | 94.2 | 19.70 | 2.88 | 94.4 | 19.93 | 0.05 | 0.2 | 0.23 | |
DPCRN | 2.90 | 94.8 | 19.67 | 2.91 | 94.7 | 19.85 | 0.01 | 0.1 | 0.18 | |
SA-MSTCN\(^{1}\) | 2.95 | 94.8 | 18.98 | 2.97 | 94.9 | 19.13 | 0.02 | 0.1 | 0.15 | |
SA-MSTCN\(^{2}\) | 2.97 | 94.9 | 19.12 | 3.00 | 95.0 | 19.56 | 0.03 | 0.1 | 0.44 |