Monday, August 14, 2017

What quality should my sounds be?


Sound and it's quality is something like religion. Never ending story.


People will say, that MP3 320kbps is lossless, while experts will say that any MP3 is crap, but in the end, no one will hear a difference in the result... unless they're in the music industry for 20 years+.


What would be the best format for video game, and what properties should be enough (Hertz, bitrate etc.), assuming that 320kbps MP3 is "heavy"?


Maybe an example or two, how some AAA titles work with their sounds.



Answer





If you are using Unity or another big engine that has an asset management system, don't request Ogg Vorbis from your sound designers and composers. Get WAVs or AIFFs.


Unity and Unreal are structured to work with high quality bounces and then apply compression settings per-platform. Having the source asset as Ogg or Mp3 means you are double-compressing the audio and introducing additional artifacts for no benefit.


If you see that starting from ogg or mp3 reduces your build size, that's not a good reason. It likely means you are pre-exporting with different compression settings than you have applied in Unity/Unreal. Are there execptions? Yes, but you wouldn't be looking up this answer if you knew when those exceptions were applicable.


If you're pre-compressing in order to reduce the size of your repo, use LFS, use a centralized version control system, or grin and bear it.






  1. Yes, MP3 320k sounds great. BUT . . .

  2. Don't use MP3, use Ogg Vorbis at 44.1kHz, quality 6 for music. For sound effects, just use 16/44.1 WAV unless you really feel you need the trim the fat.


  3. If you have to use MP3, such as in Flash, try to use 192k-256k (VBR 1 or 2), but you may have to settle for 128k. Don't go lower than 128k (VBR 6).




People will say, that MP3 320kbps is lossless, while experts will say that any MP3 is crap, but in the end, no one will hear a difference in the result... unless they're in the music industry for 20 years+.



Audio encoded in MP3, regardless of encoding quality, is always lossy. Its a perceptual codec, and therefore works by encoding properties of the sound over time in ~1152 sample chunks in a compressed form, from which uncompressed samples can be extrapolated by the decoder. Its goal is not to accurately recreate the original audio, just provide one that is "good enough".


However, like you said, 320kbps sounds very good. It's generally regarded to be as good or better than CD quality. However, it is still not possible to perfectly recreate the original samples of an uncompressed WAV encoded as a 320kbps MP3.


Generally Ogg Vorbis is a better format than MP3. It's generally agreed to give you better quality for the same size file, and unlike MP3 can be easily looped seamlessly. Those 1152-sample chunks that MP3 uses to encode audio will often leave silence at the beginning and end of a sound. Not as big a deal for basic sound effects, but a massive problem for music loops.


The Flash IDE gets around this during the .swf export, it strips out the silence manually. People using streaming audio (or pure mxmlc) achieve looping via the SampleDataEvent and manually dropping samples or preprocessing the MP3 file (see Andre Michelle's blog and the CompuPhase mp3loop utility)



Also, using an MP3 decoder technically requires you to acquire a patent license to use (since the MP3 patent is owned by Technicolor, Fraunhofer, and others). Obviously tons of people have released freeware games that used MP3, but it's best not to screw around with that.



What would be the best format for video game, and what properties should be enough (Hertz, bitrate etc.), assuming that 320kbps MP3 is "heavy"?


Maybe an example or two, how some AAA titles work with their sounds.



That depends: What are your target platforms, what other technologies are you using, how are you distributing your game, and what style are you going for? I'm going to break this down into a few categories based on platform, technology, and aesthetic.


High-End PC and Console Titles


AAA games are going for top-of-the-line production quality, so they're recording and producing assets uncompressed 24bit/48kHz (also the standard for film postproduction). Titles with slightly lesser ambitions than say Battlefield 3 might record and produce in 16/44.1, which is the official standard for CD quality audio.


Of course you can't ship a bunch of 24/48 uncompressed WAVs with a game, it'd be too big. So ultimately there has to be some sort of compression happening. Generally the rule of thumb is, if it's a quick sound effect like a gun sound (like the Portal 2 gun fizzle in Sprunth's answer), it's fine to leave it as a WAV, possibly reducing the sample rate depending on the frequency (see the Nyquist theorem, sounds that are made up of low-frequency content can be encoded at lower sampling rates). For music, there's really no way around compression. Ogg Vorbis at CD quality is the way to go (44.1kHz, quality 5-6 or higher).


Also, AAA games will often use an intermediary tool for the compression, either an in-house tool or audio middleware like FMOD or Wwise. The way it works in FMOD and Wwise is that you import most things as 16/44.1 or 24/48 WAVs (or, if the sound is all low-frequency content, it may be imported with a lower sampling rate), then give FMOD a compression factor for each asset, choosing an encoding like ADPCM, MP3, or Ogg Vorbis.



FMOD actually recently dropped support for encoding assets in the soundbanks you export from FMOD Designer (.fsb files) as Ogg Vorbis in favor of a new codec from Xiph called CELT. Ogg Vorbis can be a little tough on the CPU, so CELT is being developed to provide an alternative. You can load the files directly, but no longer use for encoding from the Designer application.


By the way, here's a cool link about Battlefield Bad Company's audio that also goes into surround a bit. DICE is pretty much at the forefront of audio technology in games, so it's a good series for study.


Also, related to surround is the issue mono vs stereo. Just in case you didn't know, all of your sound effects should be mono, unless some of them actually make use of panning effects. Stereo's awkward to spatialize into a 3d environment, and you can pan sounds in code to place them in a 2d environment.


Slightly-Less-High-End Titles, Indie Games


Obviously this can widely vary. A quick glance shows that Frozen Synapse uses entirely Ogg Vorbis files, for both sound effects and music. Dungeons of Dredmor on the other hand follows the scheme of Ogg Vorbis for music, and 16/44.1 WAV for sound effects.


The Dungeons of Dredmor approach is preferable. Even stored as uncompressed WAVs, the sound effects are generally short enough that they don't take up that much space, and you save a lot of CPU cycles not having to decode them. You want to be able to quickly load a sound effect into memory and play it. If you encode your sound effects in Ogg Vorbis, there's the potential for a tiny amount of delay before a player hears your sound effect for the first time.


Browser Games, HTML5 and Flash (with a dash of Mobile)


HTML5 audio is a mess. You have to provide both ogg and MP3 versions of your sounds. Encode in the highest quality you can without your user raging at the long load time. For MP3, don't go below 128k, it's bad enough at 128.


Flash only accepts 16bit/44.1kHz MP3 unless you go nuts and write your own decoder for some other format (like the experimental Ogg Vorbis decoder in the Alchemy labs). In the past, Flash had problems with Variable Bit Rate MP3s, but I've never had a problem. The quality setting you choose for your Flash game will depend on how large you want your final .swf to be.


Update: As Tetrad mentioned, mobile games have to be considered with memory and storage. The way you encode your audio for mobile games is much like Flash, you want to retain as high a quality as you can, but ultimately you have to fit a memory and storage budget. Tracker music is especially good if you're on a tight storage budget for music. Tell your composer to limit his sample palette and you can fit a ton more music in the game.



"8-bit" or Chiptune Type Sound Effects and Music


Most games are just going to do what Frozen Synapse and Dungeons of Dredmor do. However, you can probably get away with reducing sampling rate and bit depth. Not only might it fit the aesthetic you're going for, but it could save you some space.


Also, tracker music generally stores samples at low sample rates, just let it happen.


No comments:

Post a Comment

Simple past, Present perfect Past perfect

Can you tell me which form of the following sentences is the correct one please? Imagine two friends discussing the gym... I was in a good s...