I first noticed blend shapes in video games, outside of cutscenes, in Dota 2: there were creeps that were breathing, and their chests were expanding and contracting in an incredibly smooth way that I had never noticed before. I began noticing it all over, in facial expressions that changed, objects that swelled up before exploding, little deformations in damaged items.
The way it works is surprisingly simple and elegant.
A character's "model" is made up of a huge number of triangles. A blend shape (aka shape key - there are many akas) is the same model, made up of the same triangles, except that some of the shapes have been moved, rotated, or resized in some way. They call the combination of a position, rotation, and a size a "transform."
When you have a character with a neutral facial expression, and you would like them to smile, you can gradually blend ("interpolate") between the current model's triangle transforms and those in the smiling blend shape.
You don't always want to create a custom shape for every possible shape a face can make. Instead, you categorize the different ways that the face can move and create one blend shape for each of those categories. To create any desired shape, you blend towards a combination of blend shapes instead of just one. For example, if you wanted a contemptuous smile, and had one shape for smiling and one for contempt, you simply apply both - or some other equivalent combination depending on your setup.
If you are doing lip sync, and the speaker pronounces a syllable, its made up of a variety of parts called 'phonemes,' often more than one. You use one blend shape per phoneme, and apply all of the appropriate ones when that syllable is produced in order to match the face to the sound. Eg, you can have somebody smile or frown as they say hello, both of which can affect mouth shape, by applying or averaging both.
There are other ways to do this besides blend shapes. You can animate a face using 'bones' similar to how joints are animated. I'm less familiar with this approach unfortunately so its hard to compare the two. I suspect that bones are more performance expensive because I suspect they take more runtime calculation - its hard to compete with pure interpolation. I might be wrong though.
These blendshapes are easy to create from a UI point of view, but a little tedious to create in practice. It can be tiresome to create the tracks that control the timing of the blendshape changes, as well as their playback.
To execute the playback, you can take a sequence of blend shape changes and 'bake' them into an animation (or equivalent) to be played at an appropriate time. To create those tracks, you usually use one of two approaches: you either use no track at all and generate those timings dynamically, (in response to an audio track for lip sync); or you record a person's movements and use software to translate their movements to blend shape changes, and create your baked track automatically. Epic has an iOS app called Unreal Link to help with this for facial expressions.
The main question for me is just what the best workflow for creating blend shapes in Blender is. Its just time consuming for a large number of models and I'm sure that somebody has an incredibly streamlined workflow in an article somewhere out there.