Object detection

Hello again! In this 3rd blog entry I will give an overview of the technology behind AR that makes the magic happen. Let’s go.

Technology

To superimpose digital media on physical spaces in the right dimensions and at the right location 3 major technologies are needed: 1) SLAM, 2) Depth tracking and 3) Image processing & projection

SLAM (simultaneous location and mapping) renders virtual images over real-world spaces/objects in the right dimensions. It works with the help of localizing sensors (i.e. gyroscope or accelerometer) that map the entire physical space or object. Today, common APIs and SDKs for AR come with built-in SLAM capabilities.

Depth tracking is used to calculate the distance of the object or surface from the AR device’s camera sensor. It works the same a camera would work to focus on the desired object and blur out the rest of its surroundings.

Then the AR program will process the image as per requirements and projects it on the user’s screen (For further information on the “user’s screen” see section “AR Devices” below). The image is collected from the user’s device lens and processed in the backend by the AR application.

To sum up: SLAM and depth tracking make it possible to render the image in the right dimensions and at the right location. Cameras and sensors are needed to collect the user’s interaction data and send it for processing. The result of processing (= digital content) is then projected onto a surface to view. Some AR devices even have mirrors to assist human eyes to view virtual images by performing a proper image alignment.

There are two primary types used to detect objects, which both have several subsets: 1) Trigger-based Augmentation and 2) View-based Augmentation

Trigger-based Augmentation

There are specific triggers like markers, symbols, icons, GPS locations, etc. that can be detected by the AR device. When pointed at such a trigger, the AR app processes the 3D image and projects it on the user’s device. The following subsets make trigger-based augmentation possible: a) Marker-based augmentation, b) Location-based augmentation and c) Dynamic augmentation.

a) Marker-based augmentation

Marker-based augmentation (a.k.a. image recognition) works by scanning and recognizing special AR markers. Therefore it requires a special visual object (anything like a printed QR code or a special sign) and a camera to scan it. In some cases, the AR device also calculates the position and orientation of a marker to align the projected content properly.

Example for marker-based augmentation with a special sign as trigger

b) Location-based augmentation

Lacotion-based (a.k.a. markerless or position-based augmentation) provides data based on the user’s real-time location. The AR app picks up the location of the device and combines it with dynamic information fetched from cloud servers or from the app’s backend. I.e. maps and navigation with AR features or vehicle parking assistants work based on location-based augmentation.

BMW’s heads-up display as an example of location-based augmentation

c) Dynamic augmentation

Dynamic augmentation is the most responsive form of augmented reality. It leverages motion tracking sensors in the AR device to detect images from the real-world and super-imposes them with digital media.

Sephora’s AR mirror as an example of dynamic augmentation. The app works like a real-world mirror reflecting the user’s face on the screen.

View-based Augmentation

In view-based methods, the AR app detects dynamic surfaces (like buildings, desktop surfaces, natural surroundings, etc.) and connects the dynamic view to its backend to match reference points and projects related information on the screen. View-based augmentation works in two ways: a) Superimposition-based augmentation and b) Generic digital augmentation.

a) Superimposition-based augmentation

Superimposition-based augmentation replaces the original view with an augmented (fully or partially). It works by detecting static objects that are already fed into the AR application’s database. The app uses optical sensors to detect the object and relays digital information above them.

Hyundai’s AR-based owner’s manual allows users to point their AR device at the engine and see each component’s name + instructions for basic maintenance processes.

b) Generic digital augmentation

Generic digital augmentation is what gives developers and artists the liberty to create anything that they wish the immersive experience of AR. It allows rendering of 3D objects that can be imposed on actual spaces.

The IKEA catalog app allows users to place virtual items of their furniture catalog in their rooms based on generic digital augmentation.

It’s important to note that there is no one-size-fits-all AR technology. The right augmented reality software technology has to be chosen based on the purpose of the project and the user’s requirements.

AR Devices

As already mentioned in my previous blog entry, AR can be displayed on various devices. From smartphones and tablets to gadgets like Google Glass or handheld devices, and these technologies continue to evolve. For processing and projection, AR devices and hardware have requirements such as several sensors, cameras, accelerometer, gyroscope, digital compass, GPS, CPU, GPU, displays and so on. Devices suitable for Augmented reality can be divided into the following categories: 1) Mobile devices (smartphones and tablets); 2) Special AR devices, designed primarily and solely for augmented reality experiences; 3) AR glasses (or smart glasses) like Google Glasses or Meta 2 Glasses; 4) AR contact lenses (or smart lenses) and 5) Virtual retinal displays (VRD), that create images by projecting laser light into the human eye.

That’s it for today 🙂

_____

Sources:

https://thinkmobiles.com/blog/what-is-augmented-reality/

https://learn.g2.com/augmented-reality-technologies

Tag: Object detection

AR in Education #3: Technological aspects of AR

Technology