hybred camera systems

Inertial Hybrid Tracking Systems.

Optical Systems Using Natural Features.

 

 

All the systems mentioned so far rely on some form of sensors or  special markers in the environment in which the system is being used. A current topic of active research is the development of tracking systems that can use naturally occurring features in the scene. One example of a system that combines tracking of natural features and the  use African Mango of inertial sensors is described in Chandaria et al. [15]. In applications such as sports outside broadcasting, there is a distinct advantage in being able to track the camera movement from the video image alone; some systems specifically tailored for these applications are described in Section 4.

 

 

 Tracking Hand-Held Objects

 

There are applications where it is useful to be able to attach virtual graphics to small physical objects rather than fixing them in the world reference frame. This allows a presenter to pick up a virtual object, and can provide an interesting plus size wedding dresses way to interact with such objects [16]. An example of the use of an image-based marker tracking system for TV production is shown in Fig. 12. Similar systems have been used for hand-held augmented reality using a PC and webcam [17], although for broadcast use the systems need to satisfy the high frame-rate and robustness requirements for TV. A trade show booth potential problem with the use of such systems is that by the very nature of the close interaction between presenter and penny stocks to watch object, there is a risk of the  presenter covering the marker, or placing a hand into the space that the virtual object should occupy. Careful rehearsal is usually required. The system shown in Fig. 12 uses markers consisting of a rectangular colored rectangle, containing a simple 2D bar code sufficient to identify electric cigarettes uniquely a small number of markers, and to allow their orientation to be determined. The use of a strong color allows the marker to be easily located in the image, using a chroma-key

SEO Services technique similar to that described in Section 3.6.3. The corners of the marker are located by fitting straight lines to the edges of the rectangle in the key signal, and from these the locations of the centers of the 2D bar code can be deduced leather furnitureand the bar code can be read. The 3D position and orientation of the marker with respect to the

 

camera is then estimated, using an iterative optimization process. This processing can be carried out at full video rate on a standard PC. It is theoretically snoring chin strap possible to estimate the focal length of baby shower cakes the camera in addition to the pose of the marker; however, this is an ill-posed problem in that there are generally many combinations of distance of marker from the camera and focal length of the lens that would provide a good fit to the observed corner positions. Therefore the system is usually used either with the focal length fixed to a known value or with a lens sensor fitted. It is worth noting moncler that a small discrepancy between the actual and ugg boots assumed camera focal length will not have a significant effect on the rendered graphics, ugg outlet as the estimated marker pose will always result in good alignment of the virtual cheap uggsgraphics at the corners of the marker. If the virtual object is significantly larger than the marker, or extends a long way out of the plane of the marker, then errors such as incorrect perspective or uncertainty in marker uggs on saleorientation will become more apparent if the actual focal length is significantly different from that assumed.

Renderer

The tracking data is sent from the camera tracking system, via a link such as

leather furniture RS232, Ethernet, or USB, to a PC which runs a proprietary real-time rendering software application. sole f63 This will typically allow preprepared animations to be trig- gered and will allow other changes to the 3D model to be made interactively such as changing lighting. Generally, it is sole-f80 necessary to generate an alpha channel (also known as key or mask signal) associated with the video to delineate objects that should be forced into the foreground by the keyer.

There are a range of software packages available commercially for the real-time rendering of graphics for mixed reality/virtual studio production. Most require high- end PC systems, pokies equipped with broadcast-standard video capture and output cards. The continuing pokies rise in performance of the graphics processing unit (GPU), driven by the needs of the gaming market, allows such systems to offer very rich 3D environ- ments, including effects pokies such as shadows, reflections, motion blur, and depth-of- field. email lists The features that differentiate such systems from related gaming applications include:

l Broadcast-standard video output : This is usually achieved by using either a graphics card that supports serial digital video output, or a separate card that provides broadcast-standard output, in conjunction with software that copies the rendered image from the frame-buffer on the graphics card. An alternative is the use of a stand-alone scan converter connected to the video output of a conventional graphics card, although this makes frame synchronization impossible.

 


Detail Enhancement.

Most cameras include a control for total gym xls detail enhancement. This allows the high frequencies in the image to be boosted, making edges appear sharper. It is sometimes referred to as “aperture correction,” as one purpose of boosting the high pokies frequencies is to compensate for uggs the low-pass filtering effect of the finite size (or “aperture”) of the capture and display sampling elements. In the days of tube cameras and CRT displays, this “aperture” corre- sponded to the area of the tube that the electron beam sampled (in the camera) or illuminated (in the display). With quick payday loans  digital image sensors and flat-screen displays, the pixel sizes are generally smaller, so there is less inherent filtering and therefore less of a need for aperture correction as such. However, the CCU operator will baby hair accessories still adjust the detail control to give a picture that looks subjectively pleasing on his display; this

can result in “ringing” around gym mats edges (e.g., a black overshoot around a bright white area), as this makes pictures look shaper. This can present a problem with subsequent image processing (such as image-based tracking or keying), and may result in a different “look” between elements of the scene captured with a diabetic diet camera and virtual elements inserted using computer-generated graphics. 3.3 Camera Tracking System To insert a nuratrim review virtual object into a real image from a TV camera, the position of the object must appear correct as the camera moves. This requires the camera position and orientation (referred to as the camera “pose”) as well as its field-of-view bread maker to be measured at the same rate as the video signal (50 or 60 Hz), and with a stability sufficient to ensure negligible visible drift between the real and virtual elements. 3.3.1 Accuracy Requirements Specifying the measurement
weight loss pills accuracy required for a convincing result is difficult, as it depends on numerous factors including the field-of-view of the lens, the resolution of the TV system (standard or high definition), and the work from home composition of the scene (where virtual and real objects appear close together). For example, if the virtual object was to drift by no more than one TV line in a standard-definition image of around 500 WP Robot lines vertical resolution, at a lens angle of 5

o (a fairly tight zoom), an angular accuracy of around 0.01o

is needed. If the virtual object in this scene was at a distance of around 6 m, a Tinnitus Miracle camera movement of about 1 mm would correspond to one picture line, indicating that the spatial position needs to be measured to an accuracy of about 1 mm. Measurement inaccuracies may take various forms, such as random noise (which may have both low-frequency and high-frequency components, and may vary in amplitude depending on the speed of camera motion) and drift as a function of camera position. coq10 The influence of these inaccuracies will be different; for example, an error that varies smoothly with camera position will generally be much less of a problem than a rapid noise-like Spray Tan variation that is present even when the camera is still.

3.3.2 Types of Tracking System In the early days of virtual studios, the only way of measuring camera movement was by fitting sensors to the camera mounting (e.g., to measure pan and tilt, plus motion along a track), or by using motion-control Bose Companion 3 mounts that were already equipped with sensors. A range of logitech z-2300 different tracking systems that do not rely on mechanical mounts are now available, allowing the use of conventional camera


Broadcast Camera Image Characteristics

The area of virtual graphics, by definition, involves the seamless integration of captured video images and computer-rendered graphics. This section looks at some of the characteristics of broadcast cameras and the ways in seo firms which the image is sampled that need to be understood in order to ensure that the two image sources are as closely matched as possible. More facial hair removal in-depth information relating to broadcast video standards can be found in textbooks such as Poynton [5]. 3.2.3.1 Integration Time. Cameras usually have adjustable integra- tion time. This can have a significant effect on the registration of virtual content: if Cosmetic Surgery Thailand the camera tracking system does not operate “through the lens” by tracking visible features (and thereby inherently matching the apparent motion in the image), then it ideally needs to be configured to measure the camera parameters at a time halfway through the integration period. For a through-the-lens system, feature positions are likely to be measured to be roughly in the middle of the blurred region, thereby roughly compen- sating for the replica handbags effective time delay caused by the camera integration. However, signifi- cant motion blur may make feature-based tracking unreliable. In many cases, it is simplest to use a short integration time, giving a well-defined image capture time, not requiring motion blur to be added to graphics to achieve a convincing match with the real image, and easing feature detection in image-based tracking approaches. However, very short integration times result in significantly reduced camera sensitivity (needing increased gain which introduces noise), and unnatural motion in areas of the image that the viewer’s eye is not tracking (e.g., a waterfall will appear as a succession of sharp water droplets rather than giving the appearance of continuous flow). In practice, an integration time of around 1/100th of a second gives a good compromise. Whilst cameras using CCD sensors sample the whole image at the same time, it is worth noting that cameras with CMOS sensors generally have a ‘rolling shutter’ (sampling the bottom of the image later than the top); this can make it difficult or impossible to achieve perfect alignment of real and virtual parts of the scene during rapid camera motion.

 

 

3.2.3.2 Sampling and Aspect Ratio. Once the image is cap- tured, it is digitized according to the relevant standard (ITU-R Recommendation BT.601 for standard definition images, BT.709 for high definition). It is worth looking at some aspects of these standards, since they have some idiosyncrasies (particularly at standard definition) that a developer of a virtual graphics system should be aware of. All TV production now uses widescreen (16:9) images (for 4:3 images replace “16/9″ with “4/3″ in the explanation below). Use of the correct aspect ratio when inserting virtual graphics is vital in order to avoid the graphics losing registration as the camera is panned. A widescreen 16:9 image in its standard definition digital form, as specified by ITU- R Recommendation BT.601 and shown in Fig. 7, actually has an aspect ratio wider than 16:9, due to the presence of “nonactive” pixels. The nonactive pixels are there principally because the edge pixels could be corrupted by ringing in the analog to digital conversion process, and because it is difficult to guarantee that exactly the right part of the signal is captured by the digitizer. These make the aspect ratio of the whole image slightly wider than 16:9. The choice of 13.5 MHz for the sampling frequency, which determines the pixel aspect ratio, was the result of an international standardization process which considered many factors including finding a value that gave a whole number of samples per line for existing analog TV systems. An interesting account of the process is given by Wood and Baron [6]. The figure of 702 active pixels comes from the “active” portion of the 625-line “PAL” (European TV format) analog signal being

52 ms, and 52 X 13.5 ¼ 702. Using the figures of 702 X

576 corresponding to a 16:9

image, we can calculate the aspect ratio of a pixel (W/Has follows:702X W ¼ 16=9 X 576 X H; therefore W=H ¼ ð16 X 576Þ=ð9 X 702Þ ¼ 1:458689

For “NTSC” (American TV format) images at 59.94 Hz, there are only 480 active lines, and 704 pixels are “active,” although there are still 720 pixels per full line. The pixel aspect ratio is therefore 1.21212. Note also that when two fields are “paired up” to store as a frame, the first field occupies the lower set of lines, rather than the upper set as in “PAL” (European standard TV).

Images should be composed under the assumption that the whole image is

unlikely to be displayed, with all important information lying in the so-called “safe area.” Historically this is because it is difficult to set up the image on a cathode ray tube (CRT) to exactly fill the display—even with careful setting-up, poor regulation of the high voltage supply tends to make the displayed picture size change as the average image brightness varies. Rather than shrink the image so it is guaranteed to be all visible, and thus end up with a black border round the image (as was common with CRT monitors used in PCs), CRT TVs usually slightly overfill the display (“overscanning”), so the outer part may not be visible on all displays. Thus, important picture content (such as captions) should not be placed so close to the edge that it may not be visible on some displays. Ironically, many modern flat screen displays also overscan, because broadcasters sometimes do not fully fill the active part of the image as they do not expect the outermost edges to be displayed! There are actually two “safe areas”: one for “action” and the other for graphics, with the graphics safe area being smaller. Details can be found in EBU Recommendation R95 [7].

Some rationalization has happened in the standardization of HDTV. The image format is 16:9, contains 1920 1080 pixels, and has no “nonactive” pixels. The pixels are square. However, interlace will still be common for the short to medium term, except for material originated on film or designed to mimic film, where instead of 50 Hz interlaced, the standard supports 24 Hz (and 23.98 Hz) progressive. At least, all interlaced formats (50 and 59.94 Hz) have the top field first. Unfortunately, some HDTV displays still overscan, hence the continued need for the concept of a safe area. For a discussion of overscan in the context of flat- panel displays, see EBU [8]. 3.2.3.3 Chrominance Sampling. Although computer-based image processing usually deals with RGB  signals, the majority of broadcast video signal routing, recording, and coding operates on luminance and color-difference signals. Fundamentally this is because the human visual system is less sensitive to resolution loss in the chrominance channels, so the best use of the available bandwidth can be made by having more resolution in luminance than in chrominance.There are a variety of ways of subsampling the chrominance channels, which have potentially misleading names. Some of the more common formats are as follows: 4:2:2 format subsamples the color difference signals horizontally by a factor of 2. This is the format used in digital video signals according to ITU Rec. 601 (for standard definition) or 709 (for high definition) and is the format most com- monly encountered in TV production.


Elements of a TV Production System Incorporating Virtual Graphics

 

The main elements of a studio-based system for inserting virtual graphics are shown in Fig. 3. The figure shows the system for a single camera (often referred to as a “camera channel”); in a multicamera studio, each camera generally has its own system. The TV camera is equipped with a zoom lens that incorporates a sensor to measure the zoom and focus settings. The camera is fixed to a tracking system to measure its position and orientation. The lens sensor and tracking system send information at video rate (generally 50 or 60 Hz) to a graphics rendering system, which usually consists of a high-end PC with a powerful graphics card. The PC renders the virtual elements of the scene with camera parameters (position, orientation, and field-of- view) matching those measured by the sensors on the real camera. The graphics output is converted to broadcast video format usually via a serial digital interface (SDI). It often consists of two signals: the graphics themselves, plus a “key” signal that indicates the areas of the image occupied by the graphics that should appear in front of any real elements. Meanwhile, the video signal from the camera is passed through a delay to compensate for the processing time of the tracking system, renderer and video

format conversion. This delay is typically of the order of a few video frames. A keyer combines the video and graphics, using the key signal generated by the renderer and/ or an internally generated key signal from a chroma keyer to determine whether graphics or camera video should appear in each part of the image. In some systems, the keyer may be implemented in software and run on the same PC as the renderer. If the tracking system works by analyzing the camera video to track the motion of the camera, it too may run on the same PC, leading to a very compact system.

The following sections discuss the elements of the system in more detail, focusing on those aspects that are particularly relevant when attempting to insert virtual graphics into the camera image.

3.2 Camera and Lens Before considering specific aspects of a TV camera lens, it is worth summarizing the way in which a camera and lens are usually represented in computer vision and computer graphics.

3.2.1 Lens Model A camera is usually modeled as a pinhole camera , where all light passes through a point which is in front of the image sensor by a distance corresponding to the focal length of the lens. This model provides a straightforward way of calculating the location in the image where a given point in the world would appear, using simple geometry. It ignores lens distortion; this can be accounted for by applying a suitable transformation to the “ideal” image coordinates from the pinhole model. It is often convenient to think of the image sensor being in front of the pinhole, rather than behind it, to avoid the image being inverted. The camera model used in this chapter is shown in Fig. 4. The coordinate system of the camera is chosen to match that of a camera in OpenGL, with x pointing to the right, y pointing up, and the camera looking in the negative z direction. The origin of the camera reference frame is at the location of the pinhole. The image plane is a distance f (the focal length) in front of the camera center. The point on the image sensor that lies on the z-axis is known as the principal point

. This is the point where a ray from a world point to the camera center is at right angles to the image sensor. Note that this point is not necessarily at the middle pixel in the image, as the center of the sensor may not be exactly in line with the center line of the lens, or the portion of the image read from the sensor may not be exactly central. In the case of an analog camera, a shift can also be introduced by the presence of a slight timing offset in the horizontal

synchronizing signal relative to the video before being digitized. In order to establish the relationship between the 3D position of a point in the world and its 2D position in the image, it is necessary to know the

intrinsic parameters of the camera and lens (focal length, pixel spacing, coordinates of the principal point in the image, and any lens distortion parameters) and the extrinsic  parameters (orientation of the camera and the position of the camera center in the world). 3.2.2 Lens Calibration There are many well-known methods of estimating the intrinsic parameters for a camera with a fixed-focal-length lens, for example, see Ref. [4]. These typically involve capturing multiple images of a calibration object (such as a flat chart marked with squares of known dimensions), and using an optimization process to find the set of intrinsic parameters that give the best match between predicted and observed locations of the calibration markings. The focus setting of the lens will usually be fixed to a chosen value before calibration, as changes of focus usually also affect the focal length.

Whereas many film and digital cinematography cameras use a fixed focal length (or “prime”) lens, a typical TV camera is fitted with a zoom lens, so it is necessary to calibrate the focal length of the lens as a function of the setting of the zoom control. The focus control can also have a significant effect on the focal length, so it is necessary to measure the settings of both zoom and focus. The distortion introduced by the lens will vary with zoom (and possibly with focus) as well. Furthermore, the location of the nodal point (effectively where the “pinhole” is) varies as well, moving along the axis through the center of the lens. For some lenses, its effective position can even be behind the back of the camera for a tight zoom. The movement of the nodal point position with zoom and focus is often referred to as “nodal shift.” Some lenses (sometimes referred to as “digital lenses”) now include built-in zoom and focus sensors, which send out data specifying the settings via a serial interface. For lenses without this interface, a separate sensor needs to be fitted. To be useful in virtual graphics applications, the repeatability and resolution of the sensors needs to be at least 14–16 bits. Figure 5 shows a typical lens, with a sensor for zoom and focus, fitted to a broadcast camera. The lens sensor has two cog wheels that mesh with the “teeth” around the zoom and focus control rings. This particular sensor has no “absolute” reference points (it just counts up or down from where it was when power was first applied), so to get an absolute measurement of the position of the zoom and focus rings, they must be turned to one end when powered up. The system remembers the maximum value it sees, and subtracts this from the current value, to produce an absolute measurement.Calibration of a zoom lens typically involves calibrating the lens at a range of different zoom and focus settings, and either using a lookup table to interpolate values for focal length, or fitting a polynomial to the data.


VIRTUAL GRAPHICS FOR BROADCAST PRODUCTION

 

Virtual graphics, defined here as computer-generated graphics that appear to be a part of a real scene, are now a common sight in TV programs. They are essentially an application of augmented reality to TV production. From the humble weather map inserted behind the presenter using a blue screen, to sophisticated 3D virtual graphics showing distances and off-side markings on live sports coverage, they provide a useful tool for the program maker to present information in an easy-to-understand and visually appealing manner. Furthermore, they can be used as a creative tool around which new kinds of programs can be based, or can reduce the cost of producing conventional programs by replacing all or parts of the scenery with computer-gener- ated backgrounds. New developments promise to further extend the reach of real-time 3D graphics into new areas, and to help the development of crossplatform content, as 3D graphics are already a mainstay of computer-based entertainment.

 

Although much of the underlying technology used for adding virtual graphics to TV is the same as that used for effects in the film industry, its application in TV usually differs in two important aspects: speed of production and level of perfection of the result. While films generally take a long time to make, and have production budgets that can support many months of both computer-intensive and labor-inten- sive postproduction, TV production budgets rarely support this level of postproduc- tion, nor need to produce content with the high spatial resolution or production values demanded by the film industry. Also, many kinds of programs that can benefit most from virtual graphics, such as news and sport, need to be broadcast live, or at least within an hour or so of being recorded. Any program that relies on a significant level of interaction between real and virtual content also needs to have that content inserted into the image in real time during the recording, so that the cameraman can frame the shots appropriately, and those taking part in the program can see how they are interacting with the virtual content. The focus of this chapter is thus on the use of real-time virtual graphics for TV production, rather than techni- ques relying on extensive postproduction.

After presenting a brief history of virtual graphics in TV production, this

chapter examines the main elements needed in a program production system that incorporates virtual graphics, namely the camera and lens, the tracking system to estimate the motion of the camera relative to the object with which the graphics are to be registered, the real-time graphics rendering system, and the keying system. Methods of providing real-time visual feedback to actors are then discussed. Some specific issues concerned with the production of virtual graphics for sports are then presented. Following that, an alternative approach to mixing real and virtual images,

where 3D models are created from real-world images, is briefly described.

 

A Short History of Virtual and Augmented Reality in TV Production Reality in TV Production

Electronic graphics have been used in live TV program production for many

years, with chroma-key being used in applications such as placing a virtual weather map behind a presenter. The presenter stands in front of a brightly colored back- ground (usually green or blue), and any parts of the image having this color are automatically switched to show another image, such as a weather map.

Chroma-key was used in films from the very early days—certainly before any electronic processing was around. The term “traveling matte,” still sometimes used in the industry, refers to the use of a second reel of film, with transparent areas of the negative showing the areas that should be keyed in, derived from placing a blue filter over the film while making a copy. This masked out the background blue screen area when the film was being copied. An inverted version of this was run on top of the

desired background footage, and exposed onto the copy, thereby placing a back-

ground behind the keyed-out foreground. It was first used in the 1940 film

 

Thief of

Bagdad

 

 

 

, long before electronic image processing.

With the advent of electronic image capture and thus the arrival of the age of television, it became possible to implement the keying process in real time. One of the first uses was by the BBC in 1959 [1] in a contribution to a live program. This was before the dawn of color TV, so the key signal for the blue background was obtained from a monochrome camera sensor with a blue filter in front, while the foreground signal was obtained from a camera sensor with a yellow filter. A beam splitter was used to allow both cameras to look through the same lens. The composite shot of a singer against a pictorial background was successfully broadcast live.

 

With a normal chroma key setup, it is important to keep the camera stationary, as any movement will affect the foreground objects while the keyed-in background remains static. There were some early attempts to allow camera movement; for example, the BBC used a system called Scene-sync for the 1980

 

 

Doctor Who story Meglos

. This system moved a second camera viewing a model by the same amount as the main camera, so that the model could be keyed into the background of the main camera. However, this approach was restricted to using background images shot using a motion-controlled camera.

It was not until the 1990s that graphics hardware became powerful enough to render high-quality graphics at video frame rate, allowing virtual backgrounds to be used. This allowed camera movement, since 3D virtual objects or backgrounds could then be re-rendered for each TV field (50 or 60 times each second), to match the current camera view. The freedom to move the camera allowed virtual objects or backgrounds to be used in a much wider range of programs, and the term “virtual studio” was used to distinguish these new systems from simple chroma-keying techniques. Early virtual studio systems relied on high-end graphics supercomputers such as the SGI Onyx; an example of this was the ELSET system [2], demonstrated in

 

1994 at IBC in Amsterdam.

 

Since the early days of virtual studios, the technology has developed significantly. Camera tracking systems need no longer rely on mechanical sensors on the camera mounting to measure the camera movement, and graphics can be rendered on a conventional PC rather than needing specialized graphics systems. The way in which the technology is used has also developed: the initial enthusiasm for replacing the entirety of a real set with a virtual background (as in the example in Fig. 1) has to some extent given way to the addition of virtual objects into a real environment, so that only those elements of the set that cannot be easily created for real are synthesized. Examples include virtual video walls and graphics for news, and overlaid graphics for sports analysis.