Warren, it IS simple. Calibrate each device to a single, pre-recorded frequency-neutral tone. Use a split cable, not your ears. And by all means, don't bring out a bunch of songs to do it.
Matching volumes by ear is cognitive dissonance. Not only is it not possible, it is wasteful of effort, time, and, when published, wasteful of headfi bandwidth. At worst, it is deceptive.
Myriad problems ensue even when matching via line outputs. The most insidious is current variability between devices, which can render differences in volume between tracks normalised to the same mean volume levels.
Why? When not fed ample enough current, headphones will return various anomalies, including the loss of contrast, and sound pressure in certain frequencies, all of which affect perceived volume levels.
Hire the best ears in the world. Give them the rest of their lives to match volumes. They will fail. It is neither possible - nor an expedient use of someone's life. It is far quicker and accurate to simply split the output between a single pair of earphones and a sound card. Match the volumes against a frequency-neutral calibration signal. Voila!
Subsequent volume differences indicate output defaults. They cannot be normalised across devices. I won't even get into the problems of various stimuli that trick the ear to thinking it hears one thing when it hears another.
I wish you luck Warren, but your test has already begun on the wrong foot.
Source: Mini Astell&Kern DAP Shoot-Out