Dual-Teacher Agreement for High-Precision Synthetic Data in Low-Resource MT
Abstract
Low-resource machine translation (MT) is limited by scarce parallel data, and synthetic bitext from monolingual corpora can help but is often noisy and harmful in low-resource regimes. We propose dual-teacher agreement for high-precision synthetic data construction: two independent multilingual MT teachers translate each source sentence, and an agreement-based filter retains reliable pairs using surface consistency, cross-lingual semantic alignment, and target-side fluency. Experiments show that unfiltered synthetic augmentation is unstable, while single-teacher filtering yields smaller gains. In contrast, dual-teacher agreement consistently improves chrF++ and BLEU and increases robustness under distribution shift. Quality and error analyses confirm that agreement filtering produces cleaner synthetic corpora with fewer entity errors, reduced meaning drift, and improved adequacy.