We consider the estimation of an audio source from multiple noisy observations, where the correlation between noise in the different observations is low. We propose a two-stage method for this estimation problem. The method does not require any information about noise and assumes that the signal of interest has a sparse time-frequency representation. The first stage uses this assumption to obtain the best linear combination of the observations. The second stage estimates the amount of remaining noise and applies a post-filter to further enhance the reconstruction. We discuss the optimality of this method under a specific model and demonstrate its usefulness on synthetic and real data.