Easy... these are passive boxes, headphones would be on a 7 pin XLR on the other side of the box giving connections for balanced mic and L / R Head phone o/p.
On the side you can see the connections would be...
"mic o/p" and "Talk-back out" the head set mic goes to a 3 position switch TB-OFF-Program
Inputs into the box would be "program" and "T/B" from the truck, often these 2 signals are combined into 1 signal (as shown in the pic) called IFB (Interrupted Fold Back) it saves on cable runs, there might also be a loop through to feed the second box. Often the different commentators will get different feeds to listen to, with the main host commentator getting the time cues and the guest commentator just gets program as not to confuse them.
In the H/Phone mix crowd FX is normally NOT added as it can become annoying but if you want to get the commentators to hype it up and get them excited add crowd FX and lower their mics so they start yelling to get over the crowd noise.
Although the boxes in the pic are passive there are active boxes around requiring mains powering and some of the newer setups use a single cat5 cable and do it all digitally.
I assume that this outside broadcast was done by a smaller TV facility as that sort of set up is fairly antiquated.