Human Contractors Are Mining Social Media Posts to Train AI Systems

By Laura Stotler May 08, 2019

Social media privacy, if such a thing exists, is always a contentious topic. Now it has been revealed that private Facebook and Instagram posts are being categorized and labeled by Indian contract workers in the name of AI and machine learning.

A report from Reuters reveals that WiPro, an Indian outsourcing firm, has contracted 260 workers to sort Facebook and Instagram “private” posts into five categories. The object of categorizing the posts is to train AI and machine learning software on the platforms to identify different types of content. In order for AI to be successful, it needs algorithms based on sample data, which is being categorized and labeled by humans through a data annotation process. The WiPro effort is just one of 200 Facebook global content-labeling projects, encompassing thousands of contractors sifting through user data.

Facebook has embarked on the endeavor in an effort to better sort and serve content on its platforms. The company will use AI to sort posts by content, occasion and the author’s intent. The contract workers are annotating around 700 items each day, sorting through status updates, Instagram Stories, videos, photos and shared links, with two workers checking each piece of content for accuracy. Facebook has revealed that some of the content used in the project includes users’ “private posts,” which are meant to be shared only with select friends on the platforms. The data also sometimes includes user names and other sensitive information.

“It’s a core part of what you need,” said Nipun Mathur, director of product management for AI at Facebook. “I don’t see the need going away.” Mathur was referring to the amount of data required to program successful AI sorting algorithms on Facebook’s platforms. The AI systems will then be able to sort and serve content, including making recommendations on Facebook’s Marketplace and describing photos and videos for visually-impaired users. AI and machine learning are also being used to ensure specific advertisements do not appear with adult or political content.

Facebook isn’t the only company outsourcing AI projects that require human contract workers to sift through private user data. Other reports have revealed teams of workers are labeling potentially sensitive information collected by Amazon Echo devices and Ring security cameras. In some instances, online users are helping companies train AI without knowing or being compensated for it. For instance, Google’s CAPTCHA system asks users to identify objects in photos to prove they are human. That information is used to digitize information and train AI systems.

According to Facebook, all categorization efforts comply with the company’s legal and privacy policies, and the company has recently rolled out an auditing system to make sure privacy expectations are being followed. Whether the company is complying with the EU’s stringent GDPR regulations on collection and use of personal data remains to be seen.

Edited by Maurice Nagle

Get stories like this delivered straight to your inbox. [Free eNews Subscription]