Social media posts reveal bad-air days in Chinese cities

Editor’s note: A link to a high-resolution image for download appears at the end of this release. 

David Ruth

Mike Williams

Social media posts reveal bad-air days in Chinese cities

Rice University scientists find social media provides proxy measurement of pollution 

HOUSTON – (Sept. 26, 2016) – Residents of China’s megacities who post comments about air quality to social media can give environmental scientists a window into pollution levels there.

A multidisciplinary study by Rice University researchers showed that the frequency of key words like dust, cough, haze, mask and blue sky can be used as a proxy measurement of the amount of airborne particulate matter in the country’s urban centers at any given time.

The words were culled from millions of posts to China’s Weibo, a popular microblogging platform. The posts were collected by Rice computer scientists for a study on Chinese censorship of social media three years ago.

The research led by Rice computer scientist Dan Wallach and environmental engineer Daniel Cohan appears this month in the open-access journal PLOS One.

“The big takeaway is that people grouse about air quality, and as it gets worse, people complain more,” said Wallach, a professor of computer science and electrical and computer engineering, whose lab collected the publicly available posts.

“When it’s really bad, it flattens out,” he said. “They’re as complained-out as they’re going to be. And if it gets good enough, few people complain. But there’s a zone in the middle where people really grouse, and we can measure that.

“A city the size of Beijing has air-quality meters, but not many,” Wallach said. “But if you have millions of people, you potentially have millions of meters. It’s a way of adding extra data.”

The researchers came up with a metric, the Air Discussion Index (ADI), based on the frequency with which pollution-related terms appeared in 112 million posts from 2011 to 2013 by residents of Beijing, Shanghai, Guangzhou and Chengdu, where pollution is thought to be most troublesome in China.

“We looked at what words correlated with the pollution-level data we had,” Wallach said. “Some words that came out were nonsense. But others, like cough or wheeze, clearly had something to do with the conditions. Others, like blue sky, inversely correlated with the weather or pollution.”

“There’s a lot of discussion about censorship in Chinese media, including in Dan Wallach’s work, but one of the things we like about this particular study is that it relies on data that are almost never censored, the most innocuous terms of all,” said co-author Aynne Kokas, an assistant professor of media studies at the University of Virginia and an affiliate of Rice University’s Baker Institute for Public Policy.

“These terms are almost impossible to censor because of how common they are,” she said. “As a result, we think this method is really effective not only in China but could also work in other contexts where there are heavily regulated social-media environments.”

The most accurate ADI readings were those for Beijing. When matched to hourly sensor readings from the U.S. Embassy there, the researchers found the technique analyzed pollution levels with an accuracy of 88.2 percent. ADI performance for the other cities where the pollution isn’t as severe and Weibo posts not as plentiful wasn’t as accurate: 63 percent for Shanghai, 42 percent for Guangzhou and 36 percent for Chengdu.

Particulate matter measuring less than 2.5 microns in diameter — about 30 times less than the diameter of the average human hair — is known to permanently damage the lungs. The United States’ air-quality standard for concentrations of this size of particulate matter is no more than 35 micrograms (millionths of a gram) per cubic meter over any 24-hour period and an annual average of no more than 12 micrograms per cubic meter.

Cohan said Chinese air pollution standards aren’t vastly different from those in the U.S., but the pollutant concentrations are. “Particulate matter levels in Beijing are often 10 times as high as we typically observe in U.S. cities,” he said.

Wallach said he was surprised by the level of air-quality information that was found in the Weibo posts — data that he and colleagues had collected for a 2013 study on social media censorship.

“I was chatting with Dan Cohan, and I said, ‘Hey, I’ve got all this data about China. Do you think we could measure something about pollution from all this data?'” Wallach recalled. “We all got together to see if the Weibo data told a story, and it turns out it did.”

Cohan said, “China is an ideal testbed, because the pollutant levels are so high and so variable that you can literally see the difference day to day. Still, I was surprised that social media posts could correlate so strongly with air-quality conditions.”

Wallach said it was interesting to note that the U.S. Embassy measurements correlated well with the Chinese government’s own ground-level reporting on urban pollution. “Some people in China think their government might be lying to them about air quality, but based on what we found, that isn’t the case,” he said.

Co-authors of the paper include Rice alumnus Zhu Tao, now at Google, and Rice postdoctoral fellow Rui Zhang, now at the National Park Service. Cohan is an associate professor of civil and environmental engineering.


Read the abstract at

Follow Rice News and Media Relations via Twitter @RiceUNews.

Related materials:

Rice Computer Security Lab:

Cohan Research Group:

Aynne Kokas:

Image for download:











Rice University researchers decided upon a set of bigrams — key terms in the form of two consecutive symbols – related to air quality and searched for them in a set of 112 million Weibo posts gathered between 2011 and 2013. The terms were collected from a database of 40 million Chinese bigrams and used to correlate pollution with air-quality reports from U.S. embassies in four megacities. The 10 bigrams above are only part of the set they used. (Credit: Rice University)

Located on a 300-acre forested campus in Houston, Rice University is consistently ranked among the nation’s top 20 universities by U.S. News & World Report. Rice has highly respected schools of Architecture, Business, Continuing Studies, Engineering, Humanities, Music, Natural Sciences and Social Sciences and is home to the Baker Institute for Public Policy. With 3,910 undergraduates and 2,809 graduate students, Rice’s undergraduate student-to-faculty ratio is 6-to-1. Its residential college system builds close-knit communities and lifelong friendships, just one reason why Rice is ranked No. 1 for happiest students and for lots of race/class interaction by the Princeton Review. Rice is also rated as a best value among private universities by Kiplinger’s Personal Finance. To read “What they’re saying about Rice,” go to

About Mike Williams

Mike Williams is a senior media relations specialist in Rice University's Office of Public Affairs.