Appendix A: QuickTime Confidential

Appendix A: QuickTime Confidential

Note: Some of the information in this article is outdated. Quicktime .mov files are one container format supported in Jitter. Others include .mp4, .m4v, and .avi. Support for each container format and track codec depends on the enabled video engine and platform.

The codec and dimension recommendations are also outdated. At the time of this writing 4K and even 8K video dimension output is achievable, depending on CPU, GPU, disk speed, codec, and video engine. More than anything else, the most efficient way to process video with Jitter is to make sure output_texture is enabled for, jit.playlist, and jit.grab objects.

Time in QuickTime

Unlike most hardware-based media, QuickTime does not use a frame-based time system. Instead, QuickTime uses a system based on timescale and time values. Timescale is an integer value that indicates how many time values make up one second of a movie. By default, a new QuickTime movie has a timescale of 600—meaning that there are 600 time values per second.

Frame rate is determined by the concept of interesting time—times where the movie changes. If a movie changes every 40 time values, then it has an effective frame rate of 15 frames per second (since 600 divided by 40 equals 15). When Jitter determines how many frames are in a movie with the getframecount message, it scans the movie for interesting times, and reports their number. The frame and jump messages move the play head between different points of interesting time.

For more information on the relationship between timescale and frame rate, refer to Tutorial 4: Controlling Movie Playback.

In Jitter, recording operations permit you to specify a frame rate, and Jitter takes care of the calculations for you as it constructs a new movie. Editing operations, in contrast, use time values to determine their range of effect.

Optimizing Movies for Playback in Jitter

Although Jitter will gladly play any movie you throw at it, there are some guidelines you can follow to improve performance. Sadly, there is no precise recipe for perfection—performance in a real-time application such as Jitter is the result of the interaction between a movie's track codecs (which affect data bandwidth and processor load), movie dimensions, frame rate and, to some extent, the complexity of the media being processed.


Visual media, in particular, contain large amounts of data that must be read from disk and processed by your computer before they can be displayed. Codecs, or compressor/ decompressors, are used to encode and decode data. When encoding, the goal is generally to thin out the data so that less information has to be read from disk when the movie plays back. Reading data from disk is a major source of overhead when playing movies.

If you have enough RAM, you can use the loadram message to to copy a movie's (compressed) media to RAM. Since accessing RAM is significantly faster than accessing a hard disk, movie playback will generally improve, although the object still has to decompress each frame as the movie plays. To buffer decompressed matrix data to RAM, use the jit.matrixset object.

When decoding, the goal is to return the data to its pre-encoded state as quickly as possible. Codecs, by and large, are lossy, which means that some data is lost in the process. As users, our goal is to figure out which codec offers the greatest quality at the greatest speed for our particular application.

To determine a QuickTime movie's track codecs, use the gettrackcodec message to

Audio Codecs

Codecs are available for both video and audio tracks. For online distribution, you might want to use an MPEG 2 Level 3 (.mp3) or AAC to create smaller files. In Jitter, however, if you are playing movies with video and audio tracks, you'll achieve the best results with uncompressed audio (PCM audio) simply because there will be no audio codec decompression overhead.

Technical Note: .mp3 files can be read by as audio-only movies (they are compressed with the MPEG 2 Layer 3 codec). Although you can't process the audio data with any other Jitter objects besides the object, you can use the's soc attribute to send the audio to MSP via the spigot~ object (Note: the spigot~ supports the 32-bit QuickTime engine only and requires the object to function. See Tutorial 27: Using MSP Audio in a Jitter Matrix for more information).

Video codecs

Video codecs may be handled in hardware—you may have a special video card that provides hardware compression and decompression of a particular codec, like MPEG or Motion-JPEG—or more typically in software. In Jitter, hardware codec support is only relevant to video output components. Movie playback will always use a software codec, with the important exception of the object's direct to video output component feature (see Tutorial 22 Video Output Components and the Object Reference entry for the object for more information).

Technical Note: Video cards that provide hardware support of codecs usually only support them for onscreen decompression of media. Since Jitter generally decompresses media into an offscreen buffer (with the exception noted above), software codecs are used.

Video codecs generally use one or both of the following schemes: spatial and temporal compression.

Spatial compression is probably familiar to you from the world of still images. JPEG, PNG and PICT files each use types of spatial compression. Spatial compression schemes search a single image frame for patterns and repetitions that can be described in a simpler fashion. Most also simplify images to ensure that they contain these patterns. Nevertheless, more complex images are harder to compress, and will generally result in larger files. Spatial compression does not take time into account—it simply compresses each frame according to its encoding algorithm.

Temporal compression is unique to the world of moving images, since it operates by creating a description of change between consecutive frames. In general, temporal compression does not fully describe every frame. Instead, a temporally compressed movie contains two types of frames: keyframes, which are fully described frames (usually spatially compressed, as well), and regular frames, which are described by their change from the previous keyframe.

For applications where a movie will be played from start to finish, temporal compression is quite useful. Codecs like Sorenson use temporal compression to create extremely small files that are ideal for web playback. However, temporal compression is not a good choice if you need to play your movie backwards, since the order of the keyframes is vital to properly describing the sequence of images. If we play a temporally compressed movie backwards, the change descriptions will be processed before the keyframe that describes the initial state of the frame! Additionally, the Sorenson codec is quite processor-intensive to decompress. Playback of a Sorenson-compressed movie will be slower than playback of a movie compressed using a lighter method.

For Jitter playback, we recommend using a video codec without temporal compression, such a Photo-JPEG or Motion-JPEG (Photo- and Motion-JPEG compression use the same compression method, but Motion-JPEG is optimized for special hardware support [see note above]). At high quality settings, JPEG compression offers a nice compromise between file size and image quality. It's also relatively simple to decode, so your processor can be put to better use than decompressing video.

If image quality is of principle importance, the Animation codec looks better than Photo-JPEG, but creates much larger files.

Different versions of QuickTime support different audio and video codecs. For instance, QuickTime 5 doesn't support the MPEG-4 codec, although QuickTime 6 does. (Quicktime 10 has dropped many codecs. For use of legacy codecs, you can still get QuickTime 7 from You should experiment with different codec settings to find the best match for your particular use of Jitter.

Movie Dimensions and Frame Rate

Compared to codec, movie dimensions and frame rate are more straightforward factors in Jitter performance. Simply put, a bigger image frame or a higher frame rate indicates that Jitter has more data to process each second.

A 640x480 movie produces 1,228,800 individual values per frame for Jitter to process (640 times 480 times 4 (separate values for alpha, red, green and blue channels). A 320x240 movie produces a mere 307,200 individual values per frame, one quarter the size of the larger movie. On most machines, 640X480 movies will give fine performance with one or two processes. If your patch is elaborate, you will have to change to the smaller size.

If you are working with DV media, your movies are recorded at 29.97 frames per second (in NTSC) or 25 frames per second (in PAL). Even using a 360x240 movie, Jitter has to process 10,357,632 values per second in NTSC and 8,640,000 values per second in PAL. Thinning this data by reducing the frame rate to 15 or 20 frames per second will improve performance significantly if you are using Jitter for heavy processing.

Our Favorite Setting

We've found that, for most movies, the following parameters yield consistently good results:

• 320x240 frame size
• 30 frames per second
• Video tracks: Photo-JPEG codec, using a medium to high spatial quality setting
• Audio tracks: no compression


QuickTime's time model is somewhat different from the standard frame-based model, relying on a timescale value to determine the number of time values in a second of QuickTime media. The object allows for playback navigation using both frame and timescale models. All editing functions in the object use the timescale model to determine the extent of their effect.

Determining the ideal settings for a movie used in Jitter is an inexact science. Actual performance depends on a number of factors, including codec, dimensions, frame rate and media complexity. For best results on current hardware, we recommend using 320x240, 15 fps, Photo-JPEG compressed movies