Does anyone know what the scientific term is for when the brain “adds” audio?

I’m wondering if there’s a Scientific term for this, let me describe what I mean:

I’m mastering a song for a client (they produced and mixed the track) where they’ve used a sample of a speech in the intro, right before the first verse. The issue is that you really can’t make out what that clip of the speech is saying, it just sounds like words that (to my ears) sound tucked under the instrumentation.

He tells me he can hear it just fine (again, he’s produced the track and mixed it so he knows what that bit is, because he sampled it from the original speech and also mixed it).

We are moving on, and I’ve told him that it’s possible that because he has heard the isolated speech, his brain is able to pick out the words over the rest of the music, but that someone listening to the track for the first time might not, especially if they’ve never heard that particular speech. I thought it would be good to know if there’s a proper term for this effect; does anyone know?