written by
Austin Summers

How To Mix Vocals Like a Pro: Part 1

interview 1 min read

Ah, Yes, Vocal Mixing.

The most underestimated task in complexity and the most sought-after task in engineering knowledge.

I’m embarking on a 3 part journey with you, to take you from the absolute start to finish with this entire process, making sure you get everything right.

This series will be pure accumulated knowledge and careful, deep analysis from many years of experience.

To achieve greatness, one must first gain understanding.

There’s an important reason why vocal mixing is the hardest thing out there to get right.

The human voice is an embodiment of the full range of frequencies. It differs so drastically from individual to individual. It often behaves in ways that are so nuanced that the most critical components of what makes a performance captivating can be easily missed.

Have you always wondered why you can never quite match the vocal mixing of your favorite songs, despite having all the plugins, possibly even hardware, and having watched all the famous mixing engineer videos while scouring the internet trying to find the magic formula?

It’s simple, yet a little more complex at the same time.

In its simplicity, it’s related to the mindset and approach you take. In its complexity, it’s about the right balance of the movement of compression, relation of frequencies to the rest of the instruments and voices, the most nuanced automation imagined(both in volume, eq, compression, as well as INCREDIBLY detailed automation movements in different effects and send effects.

It’s how you create depth and tension with more than one vocal layer, crowds vocals, harmonies, vocal swells leading up to the climactic moment of a song, shouts, filler vocal ad-libs, creative outlooks on singular moments in the song in terms of processing(one word or phrase at the end of a pre-chorus, for example, heavily distorted and filtered off to give impact to a chorus when it hits)

I know that seems like a lot, but it’s essential to make what we hear at the top level.

Basic Approach

Before you even touch an EQ, there’s a vast amount of process that goes into preparing the file and performance to be mixed.

Yes, even before the pre-mixing stage, there’s a stage before that matters so much; it determines the quality of the entire outcome.

The recording phase, as well as the performance given.

The recording phase is altogether complex in its own right, with mic placement, acoustic treatment, and so many other factors ranging from pop filters, off-axis or on-axis recording, outboard gear, conversion, preamps, and importantly, the microphone itself.

The Microphone

The microphone matters a lot because of the way it reacts to frequency sensitivity.

People tend to have the wrong idea. They’re focusing on the frequency curve of a microphone but forget that frequency sensitivity matters just as much, if not more.

You could take a rode nt1 and analyze specific mid-range frequencies in response to amplitude from the singer. Often, You’re going to get a much sharper and harsher peak in those mid-range frequencies in response to the singer’s performance than a Neumann tlm102.

It’s not that the frequency curve of the mic is so drastically different; it’s that the rode reacts to input differently. Frequency Sensitivity response can get quite complex. As someone who wants to have the absolute best product, your job is to test, figure out, and determine which microphone works best with each vocalist. This is to ensure a problem doesn’t happen, and you don’t end up having the most sibilant vocal recording in existence from a client, which makes you tear your hair out.

My recommendation for an all-around neutral microphone is the Neumann tlm102. It’s highly mixable, applicable to many different vocalists and applications, and tends to have a very balanced way of reacting to spurious accents from vocalists. Other microphones do lovely jobs, but this microphone always stood out to me for its power to deliver on every person I ever put in front of it. If you don’t possess this, just try to audition the mics and determine which one has the most unoffensive reaction to the vocalist.

The new TF51 from Telefunken sounds remarkably like a 251, and you’ll find that the industry-standard mics like the 251, the c800g, the u87, the u67, the u47 all sound exceptional. Some might not suit the vocalist, but it’s worth trying if you have access to those mics.

Performance, Off-Axis, and More

The proximity effect. It’s such a special thing when used on the right vocalist but terrible when used on the wrong one.

The proximity effect is when the vocalist goes close to the microphone, and the voice becomes full, direct, and present.

Depending on the song’s contents, the style of song, the space available in the song(as in how crowded and plentiful the instruments are), and the actual singer, you’re going to have to decide what suits that particular situation.

If you decide the singer will sound best being close to the microphone, but you’re facing issues with S’s and P’s, angling the microphone slightly up or down can help eliminate the problem. Sometimes, suppose the vocalist is highly dynamic and goes in for a loud part instead of pulling away. In that case, all they have to do is turn slightly to the left or right of the mic, and their voice will attenuate, still giving a full sound, but bringing the volume to a more manageable level.

You’re going to have to do multiple takes. Splitting your focus between good pitch, articulation of words, tonal quality, and desired accent generally goes a long way as opposed to completely honing in on pitch. Remember, it’s about the performance and how it makes someone feel, not just about pitch-perfect vocals.

Once you’ve got a couple of takes for a section, go through each sentence and choose the one’s that most tick off all the attributes I just mentioned above.

Build your take, and then go through each sentence, word, and phrase of that combined take and determine if a particular word or phrase can be done better. If you think the vocalist can do it better, record that part a few times.

If you find an improvement, add it to the combined take. If not, move on.

Breathes and Silences

This is a tricky one.
Breaths often make a vocal performance feel alive while having too many can make a vocal performance seem strained.

Some music calls for the removal of all breathes completely, and some music calls for the reduction in the volume of these breaths and/or removing only some. Generally, the more EDM-focused tracks have less/no breaths, and the more natural-sounding tracks retain them. Often in commercial music, there’s a delicate balance between the two.

Acoustic Treatment

As much as it hurts to hear this, the detail put into the science of your acoustic treatment goes a long way in changing how good your recordings will be. While this topic in itself is so vast that it can only be addressed in another article, it’s important to note a few things.

  • Foam is not good enough. It generally causes a bad imbalance in the room, causing more harm than good.
  • Egg Boxes do NOT work. They’re not viable diffusion, and they’re WORSE than foam.
  • Rockwool, Polyester Fibre, and Fibreglass are the main options for treatment when building acoustic panels. While most people utilize Rockwool and fiberglass, I have allergies, and I’ve built my entire studio with a modest amount of polyester fiber and ended up with a decent RT60. The absorption coefficient of polyester fiber is quite excellent, especially when utilizing more advanced panels with an air gap, using calculated measurements in relation to the calculations from the absorption coefficient. It’s all heavily dependent on what polyester fiber you get and where you get it from.

Hopefully, this might be useful for the outlier of person who finds themselves allergic to Rockwool and fiberglass.

Pop Filters

Sometimes, the artist might sound better with either a cloth or a metal pop filter. It’s not complicated, try both, and choose the one you feel sounds the best for that artist.


Most modern-day preamps are good enough to get a good recording, but the different preamps aid in adding a bit of texture, tonal quality, and feeling to a performance.

The Neve 1073 is often used in most hit songs because of how it makes the artist sound.

It adds a bit of pleasing distortion, mid-range presence and gives the perception of a more familiar sound our brains are used to from years of listening to commercial music.

Some preamps are not usable on every vocalist, such as a Manley Core Channel Strip. While this preamp is excellent on some vocalists, on others, it changes the character of the person’s voice too much, causing it to sound unnatural. You’ll have to test different preamps, but try not to oversaturate the recording too much before you get to the mixing stage to give yourself some room to make adjustments if needed. But maybe the genre calls for that, and then, by all means, go for it.

AD/DA Conversion

Conversion impacts the fidelity of the signal coming in. While it doesn’t double the audio quality compared to a cheap converter, it helps aid the naturalness, nuance, and quality of the sound. Prism, Burl, Lavry, Crane Song, Merging Anubis, Apogee, and surprisingly, Audient(id44) are considered very high-end in their conversion quality.

Antelope, UAD, and Lynx are considered just below that, with a few exceptions. Placing conversion quality gets rather complicated once you leave the realms of Prism and equivalents because some of the units are so close that it becomes difficult to say accurately.

You should aim as high as possible with conversion to ensure no weak chains in your signal quality.

Some of you might only have sound cards like the Focusrite Scarlett with built-in preamps(places to plug your microphone), and that’s okay. That works just fine, and it’s possible to make a successful record without all the fancy conversion and outboard gear. It helps a lot to have it, though.

Outboard Gear

While not necessary, subtle outboard gear processing before hitting the converters helps save you time when approaching the mix. It can help control extremely loud peaks, clean up some muddy frequencies, and do some light tonal shaping to reduce CPU load.

Utilizing something like the Elysia xfilter(also found here at Mix: Analog) or the SSL 611 EQ and Dynamics goes a long way in creating an easier time in the mixing stage.

I can’t change the recording phase, a client sent me the files, and I have to live with it. What do I do?

This is a common scenario, and there are ways around this. I’ll go more into detail in this later on, in another part of this series.

It often involves utilizing heavily targeted de-essing, understanding the limitations of the particular vocal, attempting to use proximity tools(don’t put much faith in them, they’ve only saved me a handful of times), and most importantly, learning how to mask the issues you can’t solve with special techniques, whether it’s certain reverb settings, automated multiband compression with tactical attack and release ratios and spectral dynamics.

This covers the basics of the recording phase, and due to the sheer magnitude of what I need to share in this topic, I’ll have to address the actual mixing process in the next part of this series.

This information shared today is extremely vital, and the mixing stage will be much harder to get right without addressing this important information first.

I hope you guys enjoyed this, and be sure to keep an eye out for the next part of this series, where we get into the depths of processing vocals with Clip gain, EQ, Compression, Saturation, automation, and a few other exciting tricks. The 3rd part of this series will explain the depths of what ties it all together, using complex reverb and delay effects, automation, and various other unique tactics.

vocals mixing mixanalog