2023 SMT AR Guide | SMT Presentations

2023 AR GUIDE

GENERAL BACKGROUND

FUNDAMENTAL PREMISE

Virtually inserting a graphic into a live video stream or a recorded video stream of the real world so that the resulting combined output looks like the inserted graphic is plausibly part of the real-world scene requires a system that “knows” for each frame of video produced:

The exact [ X, Y, Z ] coordinates in real-world space of the camera’s optical center for:
- the camera that is producing the live video stream; or
- the camera whose video stream was recorded;
The POV or Line of Projection of the camera
- (i.e., the PAN and TILT values in absolute degrees of rotation)
The ZOOM and FOCUS values (in absolute units) of the camera
- (i.e., how far along the Line of Projection is the camera focused)
The RADIAL DISTORTION CHARACTERISTICS of the lens across all zoom levels
A precise 3-D computer model of the real-world space
A precise registration/mapping of the 3-D computer model onto the real-world space

POINT OF VIEW (POV) or LINE OF PROJECTION

Is where the camera is pointing at any given moment
Is a function at any given moment of:
- [ X, Y, Z ] COORDINATES in real-world space of the camera’s optical center
- the PAN and TILT values of the camera
Is the “Line of Projection” from the [ X, Y, Z ] optical center of the camera at any given moment based on
PAN and TILT Values

Analogy: Think of a single beam of light that projects from optical center of the camera panning and tilting the camera can point that single beam of light at anything in the scene.

FIELD OF VIEW

Is what the camera “sees” at any given moment;
Is what the camera “shows” in the viewfinder at any given moment;
Is what the camera “outputs” at any given moment;
Is any video frame output from the camera at any given moment;
Is a function of:
- the [ X, Y, Z ] COORDINATES in real-world space of the camera’s optical center
- the POV (i.e., the PAN and TILT values) of camera, also known as the Line of Projection
- the ZOOM and FOCUS values of the camera
- the LENS’ RADIAL DISTORTION CHARACTERTISTICS at that zoom level

Analogy: A picture frame that is centered on the Line of Projection or a picture frame that moves along
the Line of Projection based on zoom values.

THE “SET OF DATA” NEEDED FOR EACH VIDEO FRAME
TO PERFORM VIRTUAL INSERTION

[ X, Y, Z ] COORDINATES in real-world space of the CAMERA’S OPTICAL CENTER
PAN, TILT, ZOOM, FOCUS
LENS’ RADIAL DISTORTION CHARACTERISTICS AT EACH ZOOM LEVEL
The VIDEO FRAME’S CONTENT

APPROACHES

PRODUCING THE DATA

There are two broad approaches for producing the "Set of Data" needed (i.e., [X, Y, Z], Pan, Tilt, Zoom, Focus) for the virtual insertion process.

APPROACH ONE: Electronic Camera Modeling

Produce all data electronically using a combination of GPS Sampling, introduction of sensors to the camera chain, and pre-event modeling of the lens characteristics (i.e., camera calibration)

GPS to determine [ X, Y, Z ] COORDINATES in real-world space of the camera’s optical center
Add electronic sensors to the pan head to report degrees of rotation for PAN and TILT
Add electronic sensors to the camera’s zoom electronics to report units of ZOOM and FOCUS
Conduct an on-site camera calibration training to:
- Model the LENS’ RADIAL DISTORTION CHARACTERISTICS across all zoom levels
- Refine the [ X, Y, Z ] COORDINATES in real-world space of the camera’s optical center

APPROACH TWO: Optical Camera Modeling

After the pre-event modeling of lens characteristics in complete, use computer vision software to analyze a single field of view (i.e., video frame) that contains a significant portion of the features inherent in the 3-D world model (e.g., the lines of the tennis court).

The computer vision software can use the recognizable features (e.g., scale, orientation, and perspective) of the 3-D world model as it appears in the field of view (i.e., video frame) to calculate/produce/infer the complete set of data needed for the virtual insertion process.

In essence, this method “reverse engineers” the set of data needed by analyzing the contents of the field of view to infer the set of data parameters that would have been required to produce the 3-D world model elements as they appear in that field of view.

GPS to determine [ X, Y, Z ] COORDINATES in real-world space of the camera’s optical center
Add sensors to the pan head that report degrees of rotation for PAN and TILT
Add sensors to the camera’s zoom electronics to report units of ZOOM and FOCUS
Conduct a camera calibration exercise to:\
- Model the LENS’ RADIAL DISTORTION CHARACTERISTICS across all zoom levels
- Refine the [ X, Y, Z ] COORDINATES in real-world space of the camera’s optical center

USING AN AERIAL CAMERA TO PRODUCE VIRTUAL INSERTIONS (A/R) AT A SPORTS EVENT

I. BACKGROUND STATEMENTS:

Successful deployment and implementation would require strict adherence to the following
Aerial Camera Use-Case Protocol

Pre-Event System Connections and Calibrations:

Step 1: Connect the optical tracking system to the aerial camera outputs

a. Video Frames
b. X, Y, Z data
c. Pan, Tilt, Zoom, Data

Step 2: Work with the camera pilot to take the camera through a series of calibration steps so the optical tracking system can model the lens’ radial distortion characteristics

In-Event Usage

Step 1: Move the aerial camera to an establishing POV and field of view location that contains a significant portion of the features inherent in the optical tracking system’s 3-D world model (e.g., the lines of the tennis court).

Step 2: Pause in this position until the optical tracking system:

a. Ingests the static X, Y, Z, as reported from the aerial camera;
b. Locks on the features inherent in its 3-D world model that are in the field of view
c. Analyzes the appearance of the 3-D world model features in the field of view to compute all sets of data
needed for the virtual insertion process

Step 3: Once Step 2 is complete (generally, just a few seconds), the aerial pilot is free to move the aerial camera to
any position in space (X, Y, Z), any POV (Pan, Tilt), and any field of view (Zoom, Focus) – provided that each transitioning field of view includes the optical tracking systems’ 3-D world model features.

Step 4: Production is free to insert any virtual graphics at any time during the Step 3 aerial camera moves.

II. STATEMENT OF PROBLEM(s)

The above protocol was not communicated by SMT’s operations team to HBS’ production team and was not communicated to the aerial camera vendor;
Ergo, no formal rehearsal of the above protocol occurred between and among the three (3) parties;
HBS’s production team insisted that all A/R inserts must be consistent with the “mockups” that they had created in advance of the event without the benefit/knowledge of the protocol
In all cases the “mockups” did not include Step 2 above; without which all inserts were unusable.

III. PROPOSED SOLUTION

Share the protocol with all parties in advance
Design A/R graphics with the aerial protocol and expansive optical tracking capabilities in mind
Rehearse all Steps with all parties in advance of the start of the event