From: Supervised Attention Multi-Scale Temporal Convolutional Network for monaural speech enhancement
Metrics | PESQ | STOI (%) | SDR | OUTE | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Duration | 100 | 500 | 1000 | 1500 | 100 | 500 | 1000 | 1500 | 100 | 500 | 1000 | 1500 | 100 | 500 | 1000 | 1500 |
Unprocessed | 2.08 | 2.08 | 2.08 | 2.08 | 91.7 | 91.7 | 91.7 | 91.7 | 14.81 | 14.81 | 14.81 | 14.81 | - | - | - | - |
CRN | 2.13 | 2.55 | 2.59 | 2.64 | 91.3 | 93.8 | 94.0 | 94.1 | 17.69 | 19.29 | 19.38 | 19.58 | 75 | 335 | 340 | 330 |
MSTCN | 2.67 | 2.77 | 2.78 | 2.80 | 93.7 | 94.3 | 94.5 | 94.4 | 15.873 | 16.82 | 17.16 | 17.07 | 59 | 230 | 540 | 690 |
LSTM-IRM | 2.57 | 2.90 | 2.93 | 3.01 | 93.8 | 95.2 | 95.2 | 95.5 | 18.42 | 19.90 | 20.03 | 20.22 | 34 | 120 | 150 | 285 |
GCRN | 2.55 | 2.85 | 2.91 | 2.96 | 92.9 | 94.4 | 94.6 | 94.9 | 18.84 | 20.83 | 20.99 | 21.40 | 93 | 355 | 400 | 525 |
GaGNet | 2.67 | 2.98 | 2.98 | 3.02 | 93.4 | 94.9 | 95.0 | 95.1 | 19.51 | 21.04 | 21.14 | 21.44 | 50 | 230 | 260 | 300 |
Conv-TasNet | 2.62 | 2.99 | 3.12 | 3.09 | 93.4 | 95.0 | 95.6 | 95.4 | 19.58 | 21.50 | 22.15 | 22.02 | 78 | 200 | 260 | 315 |
DCCRN | 3.06 | 3.22 | 3.28 | 3.25 | 95.1 | 95.7 | 95.8 | 95.8 | 20.73 | 21.48 | 21.75 | 21.56 | 66 | 145 | 210 | 300 |
DPCRN | 3.15 | 3.19 | 3.27 | 3.24 | 95.4 | 95.6 | 95.9 | 95.7 | 21.23 | 21.53 | 21.84 | 21.74 | 45 | 130 | 210 | 345 |
SA-MSTCN\(^{1}\) | 3.16 | 3.38 | 3.44 | 3.44 | 95.4 | 96.1 | 96.3 | 96.3 | 20.53 | 21.45 | 21.70 | 21.74 | 58 | 190 | 340 | 420 |
SA-MSTCN\(^{2}\) | 3.16 | 3.41 | 3.50 | 3.48 | 95.4 | 96.2 | 96.6 | 96.4 | 20.53 | 21.95 | 22.31 | 22.15 | 87 | 355 | 640 | 720 |