I'm Asanuma, an intern at Quixotiks.
In the previous article, I discussed images of Japanese individuals and the elderly, often considered outliers in AI facial expression recognition datasets, and experienced the difficulties firsthand.
This time, I researched and compiled information on datasets specifically tailored for Japanese (Asian) individuals and the elderly.
As I wrote in my first blog post, facial expression recognition datasets can be broadly categorized into two types based on the environment in which they were captured.
- Controlled: Data captured in a controlled environment, such as a laboratory. While the quality of expressions and labels is high, backgrounds and poses tend to be uniform.
- In-the-Wild: Data collected from various real-world situations, such as movies and the internet. This type includes more natural and diverse expressions.
In this article, I will introduce a total of four datasets, categorized according to these two classifications.
【Controlled】Datasets Captured in a Controlled Environment
ATR Facial Expression Image Database (2006)
- Data Source: Captured in a laboratory (video, still images)
- Features:
- Ten Japanese individuals (6 men, 4 women) in their 20s to early 30s, who had received basic training in facial expression formation (e.g., through theater), performed instructed expressions.
- Captured not only frontal faces but also with varying gaze and head orientations.
- Labeling:
- 10 emotion labels including neutral (neutral, joy (open mouth), joy (closed mouth), sadness, surprise, anger (open mouth), anger (closed mouth), disgust, contempt, fear)
- Labeled by 27 individuals (university students)
- Annotation details are unknown
- License: Commercial use requires inquiry
- Source:https://www.atr-p.com/products/face-db.html
Facial Expression Database Based on Dimensional and Categorical Models of Emotion (2018)
- Data Source: Filmed in a laboratory (video, still images)
- Features:
- 8 individuals (4 males, 4 females) aged 20-40 (average age 34.25, standard deviation 5.47) expressed emotions
- Expressions were reproduced using two methods: the Imaginary Method, where participants imagine a situation to naturally create an expression, and the Facial Action Coding System (FACS) Method, where participants consciously manipulate facial muscle movements to create an expression.
- Labeling:
- 7 emotions: surprise, fear, sadness, anger, disgust, happiness, neutrality
- Average age39 individuals (19 males, 20 females) with an average age of 21.33 (standard deviation 2.39)
- License: For academic and R&D purposes only
- Source:https://www.tandfonline.com/doi/full/10.1080/02699931.2017.1419936#abstract
Facial expression database specialized for the elderly
- Data Source: Filmed in a laboratory (video, still images)
- Features:
- Videos of 111 elderly individuals (56 males, 55 females, 73.2±4.6 years old) were recorded.
- From videos taken with a front-facing camera, frames with the most prominent expressions were extracted as still images.
- Labeling:
- 8 emotions (joy, sadness, fear, surprise, anger, disgust, excitement, relaxation) and neutral expression
- 36 participants (18 males, 18 females, age: 39.3±11.6 years old)
- License: Not for commercial use, for research use only
- Source:https://www.nii.ac.jp/dsc/idr/rdata/NUFDB/
【In-the-Wild】Real-world dataset
Large-scale database of diverse East Asian facial expressions (2022)
- Data Source: Movies, Web (images)
- Features:
- Collected approximately 450,000 frames from 113 movies (over the past 30 years) from China, Japan, and South Korea, and 50,000 images from five search engines (Google, Bing, Baidu, Goo, NAVER).
- Labeling:
- Seven emotions: surprise, fear, sadness, anger, disgust, happiness, and neutrality.
- Only images where all three annotators agreed on the same facial expression label were used.
- Finally, an administrator re-checked all labels and removed images if there was any doubt.
- In other words, images were only used if there was agreement from four parties (three annotators + one administrator) for each image.
- License: Available upon request to the authors. Commercial use is unknown.
- Source:https://www.mdpi.com/1424-8220/22/21/8089
Summary
This time, we introduced four facial expression recognition datasets specifically designed for Japanese (Asian) people and the elderly.
While "Controlled" data, captured in experimental settings, allows for controlled expression quality, it has the drawback of not fully reflecting the diverse situations of the real world. In contrast, "In-the-Wild" data, collected from sources like movies, holds the potential to train AI with more natural expressions.
As far as our research shows, "In-the-Wild" datasets targeting Asians and the elderlyvideodatasets still appear to be in their early stages of development. I felt that further enrichment of such datasets is essential for AI to accurately understand the emotions of a more diverse range of people.
We look forward to future advancements in this research!