As part of a different line of research, we (The we in this case refering to me and Chris Polack) started looking at visual patterns we could see in posts tagged #selfie on Instagram. During this process, We realized that roughly a third of the posts cotained GPS information about where the photo was taken which means we could start looking at geo-dependent patterns within selfie posting behaviors!
We wrote a script to gather images tagged #selfie from Instagram’s API - we got about 2.5 million images total gathered over a couple of different days. We restarted the search to get more variety in the posts and to get selfies taken at different times (and hopefully different time zones!). What we found at this stage is that rough half of the photos tagged selfie dont contain a face (mostly becuase of spam). We filtered images like this out automatically using facial detection results from face++’s API, which left us with a nice cleaned dataset of more than 1 million images with faces.
As mentioned earlier, about third of these cleaned images are also GPS tagged with the location of where the photo was taken - which isn’t a lot of the photos, but enough to look at some emerging patterns.
So where are Selfies being taken?
Since we are mining Instagram for images, the inherent bias on this platform is captured in the data. Thus, most western contries like the US are over-represented in the data and other countries (like Nigeria where instagram is less popular) are underrepresented. For reference here are the top selfie posting countries:
What we can say from this is that selfies are not really representative of the demographics by country.
Selfies By Country
At this point, we thought it would be interesting to look at the average selfie by country represented in the dataset.
Here is how we made the average faces:
Lets start with a selfie:
We used the facial landmarks returned from Face++ to align the faces. These are basically interest points in the face.
Any selfies that are too extreme in pose are removed (We qualtify this as looking at the yaw distributions of the selfies). This is to make the images appear more clear in the final result - instead of blurry.
We align the faces by the ceneter of the faces