UVAP

Your video analysis service

UVAP is a set of software services that can be composed and extended in a flexible way to build scalable AI-based video analysis capability and functionality.

UVAP

Main elements

and their functions

Demonstration set

Editable ready-to-run versions of all core and composite detectors for immediate customer demonstrations.

Run-time environment

Pre-set dockers supporting the immediate running of detectors.

Core detectors

AI neural network-based video processing components with pre-configured, but easily adjustable settings.

Composite detectors

Leveraging and combining core detector outputs, composite detectors are ready-to-deploy feature sets with configurable settings.

Custom applications

Business-specific parameterization and rules built over composite detectors to deliver solutions customized to client needs. With a few off-the-shelf applications provided by Ultinous, partners are free to build their custom applications.

Integrations with end-user applications

Custom applications triggering event-based end-user operations.

Are you interested in adding Video Analytics to your offered integrator services?

Core

Detectors

null

Head Detection

Detects human heads on video frames and saves the position of their bounding boxes using Gaussian bounding box estimations.

Usable for detecting human presence. The detector can find all heads from 16-256 pixels in one shot. The detector is tolerant of a wide range of head poses and lighting conditions.

Head detection from
as low as 30×30 pixels

Applications:

  • Count people in an area
  • Determine the position of people
    • Further processing: Movement detection, Anonymization
  • Determine the position of people’s heads
    • Further processing: Head Pose, Demography, Facial Recognition

Input: Video frames
Output: Bounding boxes

A fundamental building block of higher level UVAP solutions.

null

Anonymization

Saves video frames as anonymized images blurring heads, faces or entire bodies to support compliance with data privacy regulations.

The anonymization process depends on the Head Detection feature, and it blurs the (extended) bounding box of a detected human head.

Applications:

  • Make people in the video unrecognizable
  • Supports data privacy regulations (e.g. GDPR)

Prerequisite: Head Detection

Input: Video frames

Output: Anonymized images

    • Blurring
    • Coloured box
null

3D Head Pose detection

Detects the 3D orientation of human heads on video frames and saves them as three angles (yaw, pitch, roll in degrees).

Provides low-level data for human attention detection and serves as a filter to increase the accuracy of other detections, such as Age, Gender, and Face Feature Vector.

It can also be used in higher level UVAP features.

High tolerance to a wide pose range (+-45 degrees yaw, +-20 degrees pitch and +-30 degrees roll)

Applications:

  • Attention detection: determine the direction people are looking
  • Filtering head detections for further processing: better results in demography, facial recognition

Prerequisite: Head Detection

Input:

  • Video frames
  • Bounding boxes

Output: 3D orientations of the heads: Roll, Pitch, Yaw

null

Face Feature Vector

Ultinous Feature Vector extractor can create a 1024 dimensional Feature Vector that can represent a person through its facial attributes, that enables very high accuracy, 1-to-1 face recognition.

Feature Vectors cannot be reconverted to an image, but serve as basic data for re-identification and human similarity search.

Only human head images facing the camera (in good pixel size) are usable for Feature Vector creation.

World’s second- fastest
vector creation (by NIST)

Applications:

  • Representation of the facial features characteristic to an individual
  • To be used in advanced applications not covered by Basic Reidentification

Prerequisite:

  • Head Detection
  • 3D Head Pose (to improve accuracy by focusing on images facing the camera)

Input: Video frames
Output: Feature Vectors

null

Demography

Estimates and saves the age and gender data of a human face on a video frame.

Only human head images facing the camera (in good pixel size) are usable for gender estimation. In practice, a person’s gender can be estimated from multiple detections, and the Gender feature only provides basic gender data.

+/- 5 years accuracy (average standard deviation)

Applications:

  • Provide statistical information about demography
  • Help human decision about individuals, e.g. age limits

Prerequisite:

  • Head Detection
  • 3D Head Pose (to improve accuracy by focusing on images facing the camera)

Input: Video frames
Output: Gender, Age (estimate in years)

null

2D Body Pose (Skeleton)

Detects a human body’s position including its joints on a video frame.

Ultinous’s human body parts and joint detector can identify joints, body parts and can continuously differentiate between similar organs on left/right.

It can read a wide variety of body moves (falls, dance moves, walks) on a large scale.

Applications:

  • Fall Detection
  • Attitude Detection: “hands up”, bowing, kneeling etc.

Further processing:

  • Skeletonization: guide to connecting body parts to create skeletons for visualization
  • 3D Positioning: using two cameras running body pose detection (e.g. for Fall Detection)

Input: Video frames
Output: 2D positions (x, y) and types of body parts belonging to each person e.g. Left Ear of Person 2 is at (52, 137)

Head Detection
null

Head Detection

Detects human heads on video frames and saves the position of their bounding boxes using Gaussian bounding box estimations.

Usable for detecting human presence. The detector can find all heads from 16-256 pixels in one shot. The detector is tolerant of a wide range of head poses and lighting conditions.

Head detection from
as low as 30×30 pixels

Applications:

  • Count people in an area
  • Determine the position of people
    • Further processing: Movement detection, Anonymization
  • Determine the position of people’s heads
    • Further processing: Head Pose, Demography, Facial Recognition

Input: Video frames
Output: Bounding boxes

A fundamental building block of higher level UVAP solutions.

Anonymization
null

Anonymization

Saves video frames as anonymized images blurring heads, faces or entire bodies to support compliance with data privacy regulations.

The anonymization process depends on the Head Detection feature, and it blurs the (extended) bounding box of a detected human head.

Applications:

  • Make people in the video unrecognizable
  • Supports data privacy regulations (e.g. GDPR)

Prerequisite: Head Detection

Input: Video frames

Output: Anonymized images

    • Blurring
    • Coloured box
3D Head Pose
null

3D Head Pose detection

Detects the 3D orientation of human heads on video frames and saves them as three angles (yaw, pitch, roll in degrees).

Provides low-level data for human attention detection and serves as a filter to increase the accuracy of other detections, such as Age, Gender, and Face Feature Vector.

It can also be used in higher level UVAP features.

High tolerance to a wide pose range (+-45 degrees yaw, +-20 degrees pitch and +-30 degrees roll)

Applications:

  • Attention detection: determine the direction people are looking
  • Filtering head detections for further processing: better results in demography, facial recognition

Prerequisite: Head Detection

Input:

  • Video frames
  • Bounding boxes

Output: 3D orientations of the heads: Roll, Pitch, Yaw

Face Feature Vector
null

Face Feature Vector

Ultinous Feature Vector extractor can create a 1024 dimensional Feature Vector that can represent a person through its facial attributes, that enables very high accuracy, 1-to-1 face recognition.

Feature Vectors cannot be reconverted to an image, but serve as basic data for re-identification and human similarity search.

Only human head images facing the camera (in good pixel size) are usable for Feature Vector creation.

World’s second- fastest
vector creation (by NIST)

Applications:

  • Representation of the facial features characteristic to an individual
  • To be used in advanced applications not covered by Basic Reidentification

Prerequisite:

  • Head Detection
  • 3D Head Pose (to improve accuracy by focusing on images facing the camera)

Input: Video frames
Output: Feature Vectors

Demography
null

Demography

Estimates and saves the age and gender data of a human face on a video frame.

Only human head images facing the camera (in good pixel size) are usable for gender estimation. In practice, a person’s gender can be estimated from multiple detections, and the Gender feature only provides basic gender data.

+/- 5 years accuracy (average standard deviation)

Applications:

  • Provide statistical information about demography
  • Help human decision about individuals, e.g. age limits

Prerequisite:

  • Head Detection
  • 3D Head Pose (to improve accuracy by focusing on images facing the camera)

Input: Video frames
Output: Gender, Age (estimate in years)

2D Body Pose
null

2D Body Pose (Skeleton)

Detects a human body’s position including its joints on a video frame.

Ultinous’s human body parts and joint detector can identify joints, body parts and can continuously differentiate between similar organs on left/right.

It can read a wide variety of body moves (falls, dance moves, walks) on a large scale.

Applications:

  • Fall Detection
  • Attitude Detection: “hands up”, bowing, kneeling etc.

Further processing:

  • Skeletonization: guide to connecting body parts to create skeletons for visualization
  • 3D Positioning: using two cameras running body pose detection (e.g. for Fall Detection)

Input: Video frames
Output: 2D positions (x, y) and types of body parts belonging to each person e.g. Left Ear of Person 2 is at (52, 137)

Composite

Detectors

Links head detections on a sequence of video frames. If the position of a head detection is close enough to a former position and the corresponding video frames are close enough to each other in time then they form a track.

A track is a sequence of head detections of a human on a video stream from the first detection to the disappearance – eventually the path of their movement. Provides basic data for higher level services, such as people counting or dwell time measurement.

Applications:

  • Movement detection on a single camera stream
    • Connect detections of the same person over time (track)
    • Based on proximity in space and time (no face recognition)
    • Predict track for missing detections for a short time

Prerequisite: Head Detection
Further processing: Pass Detection

Input: Bounding boxes
Output: Track updates frame by frame (actual or predicted head position)

null

Movement Tracking

Re-identification is using Feature Vector Extraction as a core detector, combined with a matching algorithm based on cosinus distances of previously recorded faces to calculate similarity and identify the same face.

Usable for detecting a re-appearance of a person, security purposes, e.g. giving access to a resource for the authorized individuals only, or tracking people on multiple video streams.

Re-identification from as low as 40×40 pixels.
Face check against a 1 M database in real-time.

Applications:

  • Finding the same person at different places and different times
  • Multi-camera tracking, customer journey, waiting time, dwell time
  • Staff exclusion, returning customer, entrance system etc.

Prerequisite: Feature Vectors

Input: Feature Vectors
Output: Registration/Re-identification events

null

Re-identification

Linked head detections on a sequence of video frames form a track, showing the path of a person.

Pass Detection determines if an individual’s track is crossing a predefined (configurable) line on a video frame.

>99% accuracy on traffic counting

Applications:

  • Detect when a person crosses one or more (broken) lines
  • Footfall, people counting at entrance/exit, passer-by counting

Prerequisite:

  • Head Detection
  • Tracking

Input: Track updates
Output: Pass events, grouped by lines and direction

null

Pass Detection

Movement Tracking

Links head detections on a sequence of video frames. If the position of a head detection is close enough to a former position and the corresponding video frames are close enough to each other in time then they form a track.

A track is a sequence of head detections of a human on a video stream from the first detection to the disappearance – eventually the path of their movement. Provides basic data for higher level services, such as people counting or dwell time measurement.

Applications:

  • Movement detection on a single camera stream
    • Connect detections of the same person over time (track)
    • Based on proximity in space and time (no face recognition)
    • Predict track for missing detections for a short time

Prerequisite: Head Detection
Further processing: Pass Detection

Input: Bounding boxes
Output: Track updates frame by frame (actual or predicted head position)

null

Movement Tracking

Re-identification

Re-identification is using Feature Vector Extraction as a core detector, combined with a matching algorithm based on cosinus distances of previously recorded faces to calculate similarity and identify the same face.

Usable for detecting a re-appearance of a person, security purposes, e.g. giving access to a resource for the authorized individuals only, or tracking people on multiple video streams.

Re-identification from as low as 40×40 pixels.
Face check against a 1 M database in real-time.

Applications:

  • Finding the same person at different places and different times
  • Multi-camera tracking, customer journey, waiting time, dwell time
  • Staff exclusion, returning customer, entrance system etc.

Prerequisite: Feature Vectors

Input: Feature Vectors
Output: Registration/Re-identification events

null

Re-identification

Pass Detection

Linked head detections on a sequence of video frames form a track, showing the path of a person.

Pass Detection determines if an individual’s track is crossing a predefined (configurable) line on a video frame.

>99% accuracy on traffic counting

Applications:

  • Detect when a person crosses one or more (broken) lines
  • Footfall, people counting at entrance/exit, passer-by counting

Prerequisite:

  • Head Detection
  • Tracking

Input: Track updates
Output: Pass events, grouped by lines and direction

null

Pass Detection

Your

Video analysis

service

Customer’s video feeds

UVAP

Integrators’ custom applications

X