Appendix A: QuickTime Confidential
Structure of QuickTime movies
When you imagine a QuickTime movie, chances are good that you think of something similar to film: a single series of frames arranged linearly in time, with or without sound.
In fact, the structure of QuickTime movies is far more similar to the model you may be familiar with from MIDI and audio sequencers.
QuickTime movies are containers for tracks. Each track contains a particular type of media—such as audio or video—and defines the spatial and/or temporal boundaries of that media. For instance, a QuickTime movie might be 320x240 in size, and 3 minutes in length, while one of its video tracks is 80x60 in size, and only 30 seconds long. QuickTime movies are never longer than the end of their longest track.
Each track in a QuickTime movie might be of a different type—video, audio, sprites, MIDI (QuickTime calls these music tracks), and VR Panoramas are some of the available types—and might start at a different point within the movie. You can have several tracks of identical types in a single movie as well. Tracks are independent—they can be turned on and off individually, and they can all be of different lengths.
The visual priority of tracks is determined by the track's visibility layer. Track layers are numbered from –32768 to 32767. Tracks with low visibility layer values will be displayed in front of tracks with high visibility layer values. Audio tracks will automatically mix, regardless of their layer value.
The jit.movie object offers a number of track-centered functions. A brief list of some useful messages follows:
General Track Editing
: create a new track
: delete a track
: displays several pieces of track information (index, name, type, enabled status, visibility layer)
: displays a track's temporal offset (set using the message)
: displays a track's duration (set using the message)
: displays a track's spatial dimensions (set using the message)
: displays a track's type (type cannot be set—use to create a new track)
: display a track's visibility layer (set using the message)
: display a track's codec (codec cannot be set)
: displays a track's enabled status (set using the message)
: displays a track's drawing mode—the manner in which it interacts with other tracks (set using the message)
: set the pan for a VR movie
: set the tilt for a VR movie
: set the field of view for a VR movie
: set the current node for a multi-node VR movie (use the message to enumerate available nodes)
(see Tutorial 24: QuickTime Effects for more information)
: create a QuickTime Effects track set (1 to 3 actual tracks), by importing a .qfx file
: delete any and all QuickTime Effects tracks created by the message
: create a background color track
: delete a background color track created by Jitter.
Time in QuickTime
Unlike most hardware-based media, QuickTime does not use a frame-based time system. Instead, QuickTime uses a system based on timescale and time values. Timescale is an integer value that indicates how many time values make up one second of a movie. By default, a new QuickTime movie has a timescale of 600—meaning that there are 600 time values per second.
Frame rate is determined by the concept of interesting time—times where the movie changes. If a movie changes every 40 time values, then it has an effective frame rate of 15 frames per second (since 600 divided by 40 equals 15). When Jitter determines how many frames are in a movie with themessage, it scans the movie for interesting times, and reports their number. The and messages move the play head between different points of interesting time.
For more information on the relationship between timescale and frame rate, refer to Tutorial 4: Controlling Movie Playback.
In Jitter, recording operations permit you to specify a frame rate, and Jitter takes care of the calculations for you as it constructs a new movie. Editing operations, in contrast, use time values to determine their range of effect.
Optimizing Movies for Playback in Jitter
Although Jitter will gladly play any movie you throw at it, there are some guidelines you can follow to improve performance. Sadly, there is no precise recipe for perfection—performance in a real-time application such as Jitter is the result of the interaction between a movie's track codecs (which affect data bandwidth and processor load), movie dimensions, frame rate and, to some extent, the complexity of the media being processed.
Visual media, in particular, contain large amounts of data that must be read from disk and processed by your computer before they can be displayed. Codecs, or compressor/ decompressors, are used to encode and decode data. When encoding, the goal is generally to thin out the data so that less information has to be read from disk when the movie plays back. Reading data from disk is a major source of overhead when playing movies.
When decoding, the goal is to return the data to its pre-encoded state as quickly as possible. Codecs, by and large, are lossy, which means that some data is lost in the process. As users, our goal is to figure out which codec offers the greatest quality at the greatest speed for our particular application.
Codecs are available for both video and audio tracks. For online distribution, you might want to use an MPEG 2 Level 3 (.mp3) or AAC to create smaller files. In Jitter, however, if you are playing movies with video and audio tracks, you'll achieve the best results with uncompressed audio (PCM audio) simply because there will be no audio codec decompression overhead.
Video codecs may be handled in hardware—you may have a special video card that provides hardware compression and decompression of a particular codec, like MPEG or Motion-JPEG—or more typically in software. In Jitter, hardware codec support is only relevant to video output components. Movie playback will always use a software codec, with the important exception of the jit.movie object's direct to video output component feature (see Tutorial 22 Video Output Components and the Object Reference entry for the jit.movie object for more information).
Video codecs generally use one or both of the following schemes: spatial and temporal compression.
Spatial compression is probably familiar to you from the world of still images. JPEG, PNG and PICT files each use types of spatial compression. Spatial compression schemes search a single image frame for patterns and repetitions that can be described in a simpler fashion. Most also simplify images to ensure that they contain these patterns. Nevertheless, more complex images are harder to compress, and will generally result in larger files. Spatial compression does not take time into account—it simply compresses each frame according to its encoding algorithm.
Temporal compression is unique to the world of moving images, since it operates by creating a description of change between consecutive frames. In general, temporal compression does not fully describe every frame. Instead, a temporally compressed movie contains two types of frames: keyframes, which are fully described frames (usually spatially compressed, as well), and regular frames, which are described by their change from the previous keyframe.
For applications where a movie will be played from start to finish, temporal compression is quite useful. Codecs like Sorenson use temporal compression to create extremely small files that are ideal for web playback. However, temporal compression is not a good choice if you need to play your movie backwards, since the order of the keyframes is vital to properly describing the sequence of images. If we play a temporally compressed movie backwards, the change descriptions will be processed before the keyframe that describes the initial state of the frame! Additionally, the Sorenson codec is quite processor-intensive to decompress. Playback of a Sorenson-compressed movie will be slower than playback of a movie compressed using a lighter method.
For Jitter playback, we recommend using a video codec without temporal compression, such a Photo-JPEG or Motion-JPEG (Photo- and Motion-JPEG compression use the same compression method, but Motion-JPEG is optimized for special hardware support [see note above]). At high quality settings, JPEG compression offers a nice compromise between file size and image quality. It's also relatively simple to decode, so your processor can be put to better use than decompressing video.
If image quality is of principle importance, the Animation codec looks better than Photo-JPEG, but creates much larger files.
Different versions of QuickTime support different audio and video codecs. For instance, QuickTime 5 doesn't support the MPEG-4 codec, although QuickTime 6 does. (Quicktime 10 has dropped many codecs. For use of legacy codecs, you can still get QuickTime 7 from Apple.com) You should experiment with different codec settings to find the best match for your particular use of Jitter.
Movie Dimensions and Frame Rate
Compared to codec, movie dimensions and frame rate are more straightforward factors in Jitter performance. Simply put, a bigger image frame or a higher frame rate indicates that Jitter has more data to process each second.
A 640x480 movie produces 1,228,800 individual values per frame for Jitter to process (640 times 480 times 4 (separate values for alpha, red, green and blue channels). A 320x240 movie produces a mere 307,200 individual values per frame, one quarter the size of the larger movie. On most machines, 640X480 movies will give fine performance with one or two processes. If your patch is elaborate, you will have to change to the smaller size.
If you are working with DV media, your movies are recorded at 29.97 frames per second (in NTSC) or 25 frames per second (in PAL). Even using a 360x240 movie, Jitter has to process 10,357,632 values per second in NTSC and 8,640,000 values per second in PAL. Thinning this data by reducing the frame rate to 15 or 20 frames per second will improve performance significantly if you are using Jitter for heavy processing.
Our Favorite Setting
We've found that, for most movies, the following parameters yield consistently good results:
QuickTime movies can contain many tracks of different types—audio, video, QuickTime VR and text, to name a few. In Jitter, we can both get and set basic information about these tracks—including their temporal offset, spatial dimensions, duration, visibility layer and graphic transfer mode—using the jit.movie object's track messages.
QuickTime's time model is somewhat different from the standard frame-based model, relying on a timescale value to determine the number of time values in a second of QuickTime media. The jit.movie object allows for playback navigation using both frame and timescale models. All editing functions in the jit.movie object use the timescale model to determine the extent of their effect.
Determining the ideal settings for a movie used in Jitter is an inexact science. Actual performance depends on a number of factors, including codec, dimensions, frame rate and media complexity. For best results on current hardware, we recommend using 320x240, 15 fps, Photo-JPEG compressed movies