What is human pose estimation?
One of the most mature artificial intelligence technologies, human pose estimation, has been quietly making its presence felt in business applications
Currently hovering at the edge of mainstream adoption, many believe 2020 will see human pose estimation experience widespread growth in an array of new applications.
So why is human pose estimation so important? Put simply, it’s easy for adults to tell the difference between different types of movements: we can effortlessly distinguish between someone breakdancing and doing push-ups, for example. Computers, however, see pictures as a number of pixels, so it’s more difficult for them to understand what’s happening in the picture or video.
Pose estimation is a machine-learning technique that’s been developed to help a computer understand what position a human has taken, to analyse their movement further for a wide range of uses and applications.
How does it work?
First, let’s look at how human pose estimation works. The easiest way to estimate the human body position in a picture is to determine key points on the person, such as knees, elbows, hips, feet, shoulders and neck.
“Pose estimation is usually run on 2D images or individual frames of a video sequence,” Dr Rob Dupre, an expert in crowd counting and behaviour analytics, explains.
“The joint positions can be connected to form a rudimentary skeleton which can then be visualised on the image. The skeleton is usually presented in 2D on top of the image; however, another area of pose estimation research looks at projecting those joints into 3D, allowing for applications like motion capture for moving 3D CGI [computer-generated imagery] models.”
Dupre says this kind of key point detection is comparable to the technology which supports programmes, such as Snapchat filters or Apple’s Animoji, although they focus on just key points associated with the face.
Now that we can detect and identify where a person is in a scene, the next question is what are they doing?
Pose estimation, through its ability to detect joint positions, can help find the answer. Using its additional metadata, we can infer a host of new information from what a camera is seeing. The relationship between these points allows us to detect specific body positions.
“For example, if a person’s ankles are horizontally aligned with their hips and their head, we might be able to infer that they are lying down,” says Dupre. “Using tracking technology, we can check to see if that same person was lying down in a previous frame. This might indicate they have fallen down and we might want to alert someone to this.”
Why is the technology set to take off?
Detection of people has long been a major focus for a range of industries, including security, business intelligence, health and safety, and entertainment. With recent developments in machine-learning algorithms, the accuracy of these detections, and the hardware requirements to run them, pose estimation has now reached a point where it has become commercially viable.
In addition, the technology’s growth is also being driven by the coronavirus crisis, says Dupre.
“Businesses have had to respond to stringent requirements on capacity and social distancing which has typically been addressed through manual monitoring and counting. The cost to do this at scale has generated an instant need for accurate automation of these tasks and pose estimation is one technology that provides this as well as much more,” he says.
Indeed, social-distancing applications, combining human pose estimation and distance projection heuristics, now exist that encourage individuals to maintain physical distance from each other during the pandemic.
Social-distancing applications, combining human pose estimation and distance projection heuristics, now exist that encourage individuals to maintain physical distance from each other during the pandemic
It’s also worth stating that pose estimation is compliant with the General Data Protection Regulation as it doesn’t identify personal features; it simply uses key points on the body to identify a human presence.
IDC describes the convergence of technology and humanity, as augmented humanity. It notes: “Technologies are generating new human-like experiences, leveraging a new range of human-machine interactions that are three-dimensional, more intuitive and more natural.”
Human pose estimation will help enable a new generation of interaction and intersection between the physical and digital worlds.
Human pose estimation applications
Human pose estimation is lauded as especially important in a wide range of use-cases.
One of the more widely known examples of pose estimation was with Microsoft’s Kinect hardware for the Xbox. This allowed players to interact with a game using their body without the need for the player to hold anything for the system to track, as now commonly seen in virtual-reality applications.
Kinect tracked 25 joints across the full body. However, this technology relied on specialist camera technology that limited the range. Modern pose estimation techniques only require a standard RGB (red, green and blue) camera, making the technology far more applicable.
“This change is the major reason why we believe this technology will start to become prominent as it allows the reuse of existing installed cameras,” says expert in crowd counting and behaviour analytics Dr Rob Dupre.
Virtual coaches are also one of the most promising technology applications. The user has a device in front of them, deep neural networks receive the stream from the device, process it and provide information on a joint’s localisation. There are additional algorithms that determine what exercise was completed, how many times the user did it and if there were any mistakes like hips going higher than shoulders when doing a plank, and so on.
Elsewhere, an advanced tracker has been designed for the retail market. It can be used for counting at entrances and exits, using other business intelligence, or BI, metadata such as how long someone has been queuing, whether there are more than a set number of people waiting to be served or just as a way of detecting which parts of the store generate the most interest to customers.
Typical motion-based tracking engines would struggle with false detections caused by moving doors or changes in illumination, but the use of pose estimation eliminates many of these issues.