In my opinion, upon a very quick listen this is a situation where the vocal in general is too loud but because it's uneven in level (or the music is uneven in level) there are moments where it sticks out as obviously too loud and moments when it's on the bubble, so to speak.
Usually to rectify a situation like this you'll ultimately want to get the vocal down just enough so that if you wanted to understand the lyrics you could but that it's not so loud that you hear the vocal above everything else. When a vocal or the track that it mates with is uneven you can of course add limiting, but sometimes (depending on the specific circumstances) that kills the liveliness of the vocal if you're too heavy on the limiter. So I sometimes like to try compression with a moderately slow attack (you have to listen to ultimately get the attack right). That way you can let the initial transient of the vocal notes peek through the mix and then the compressor can clamp down on the body of the notes to keep them from being presented as too loud.
Remember you'll still need to listen to get your release times just right. Also, there is no recipe, so what I suggest might not help. When it all comes down to it you have to know what an in-the-pocket vocal sounds like in your space on your monitors. That way you can just know when it's sitting "right". But give that compression strategy a try. Maybe it will help.
This is also a case where having an alternate set of monitors to give you a more "typical" playback system reference is very useful. Mixcubes
, a boombox, whatever works for you. I do that because I find that when a vocal doesn't sit right on those types of systems it doesn't sit right at all. Again it comes back to knowing what "right" sounds like. To get an idea for the purpose of educating your ears it's useful to have a reference commercial release in the style of music you're working on to playback through your systems so you can hear what an in-the-pocket vocal sounds like in your space.