To mitigate, we propose a novel scale propagation normalization (SP-Norm) method to propagate scales from input to output, and simultaneously preserve the normalization operator for easy convergence.