Hi @blabbs, welcome to the community.
Here is some of that extra insight into what you are trying to do…
There are a couple of things you need to understand in order to get ‘high quality’ pitch-shifting results…
First of all, in most western based music we use a system of tuning called ‘Equal Temperament’. In this system, an octave is actually 12 half-steps or semitones not 24. If you put ‘-24’ into Audacity’s ‘Change Pitch/Semitones/half-steps’ function (as in the video above that @Juanmapinker uploaded) and then process it, your audio will be re-pitched down two octaves.
As far as I can remember, you don’t need to change the tempo/speed because this process is independent of that function in this software. So use ‘-12’ for an octave down, ‘-24’ for two octaves down, etc.
Next is the quality issue, this involves some simple math to see what is happening and accounts for what we will hear when playing back pitch-shifted audio material…
Two things to think about here, ‘Sample Rate’ and ‘Nyquist Frequency’.
Sample Rate controls how detailed the audio will be and how far up into the frequency range the audio can be captured/recorded.
Nyquist… Well you could spend some time reading about this on the web to fully understand it, or all we need to do is divide our sample rate by two to find the resulting available audio bandwidth (frequency range).
For example 48KHz is a standard broadcast setting, divide by two and we get 24KHz, this is the highest frequency your audio can contain at a sampling rate of 48KHz. This is fine because (most) humans can only hear in a range smaller than this, usually between 20Hz - 20KHz in our prime. It changes and deterierates as we get older.
Now to save time on a long winded explanation on how we could pitch-shift audio using different processes, I’m going to show you a method using some simple math that I was taught for producing higher quality audio that will sound more ‘pro’ when using any pitch-shifter. This is especially important when shifting beyond an octave down as we will see what happens to our frequency response.
Take a recording at 48KHz, divide by two and we get 24KHz (Nyquist Freq).
Now allow for pitch shifting down one octave so divide again by two this gives us 12KHz, this is the highest frequency available. So the frequency range of the audio file is now 20Hz - 12KHz, we have lost the top 12KHz from the audio signal.
On playback, the audio sounds duller than the original because of how high our hearing can go. High frequency sounds like Cymbals, Cabassas, Maracas, Triangles, Gongs, Tubular Bells, anything with lots of high frequency content are all going to start to loose some of their harmonics which accounts for each instruments individual brightness/crispness.
Take this down another octave, divide once again by two and we get 6KHz.
This is a significant cut in the upper frequency range. Harmonic content in speech/singing and instruments all sound extremly muffled with a noticeable loss in audio quality.
So from the figures above I can tell that if I wanted or needed to shift an audio file down two or more octaves and require a good quality product, I really need to adjust my sample rate to compensate for the losses in the higher frequency range.
Lets look at 192KHz. This is a high sample rate and I only use it when I know I want to shift audio way beyond its normal range…
192KHz divided by two gives 96KHz (Nyquist Freq).
Pitch shift down an octave so divide by two again and we get 48KHz.
This is good because it still has harmonics all the way across the human range of hearing and thus we tend to perceive it as a high quality audio pitch-shift (assuming that the software used is a well written algorithm!).
Pitch-shift again (divide by two again) and we get 24KHz. So the audio is now two octaves down but still sounds good quality as it still produces harmonics all the way across the human hearing range.
Only at ‘-36’ semitones (divide by two again giving us 12KHz/-3 octaves down) do we start to notice the quality suffering.
So the higher sampling rate is a better option when wanting to produce high quality pitch-shifting…
Take your sample rate and divide by two to deal with the ‘Nyquist Frequency’ and for every octave you pitch-shift your audio down, divide by two again this leaves you with the total audio bandwidth available.
I’m still using this method to help me estimate how good an excessive pitch-shift will be when required.
There are some other factors…
All of the above really depends on your audio content, a ‘Bass Drum’ for example. Would I use a high sample rate?. No in most cases, but if I was recording a scrapped car dropped twenty foot onto solid concrete floor and wanted to pitch-shift it later, yes without a doubt.
Another factor to consider is that you have recorded the original source audio in the first place at your choosen sample rate. If you are using audio from another party, you could be restricted to their original sample rate (if not high enough) and therefore the frequency bandwidth, which in turn will effect the quality of your results when pitch-shifting.
Karaoke files are normally mp3 (plus some other stuff/or even a MIDI file with other stuff attached) so the quality has most likely already been compremised in some way, even if the conversion was well done before you have touched the audio.
In a nutshell, if you want great sounding pitch-shifting, produce all the audio yourself at the correct sample rate and use some great software to manipulate/process it further afterwards…