The ERC-funded Politus Project is based on an interdisciplinary multi-method approach, which will combine social scientific, computational, ethical and legal perspectives to deliver an innovative breakthrough technology that will radically transform public opinion research. Politus will develop a new approach to public opinion research, which will be based on a fusion of different data analytics methods and data types: a range of online platforms; Natural Language Processing, multimodal deep learning for image and text data; computational network analysis; survey poll; advanced post-stratification methods for representativeness. This would allow us to attain a high-resolution, fine-grained, longitudinal and representative measurement of the societal trends in different countries and different languages. The first step will be data collection. Image and textual content from online platforms will be collected using API tools. Then, an annotation process will build a gold standard corpus, which will feed the supervised machine learning models to predict ingredients of public opinion. For model building, Politus will pursue a second strategy, the so-called amplified learning, as well, following in the footsteps of Blumenstock et al. (2015) and connecting survey data and user-generated content. An online survey will ask consent-giving respondents questions about demography and public opinion. Then a model will be built to predict these responses using the respondents’ online platform contents. These online surveys will be periodically conducted to keep the models updated, essentially because of the time-variant nature of the indicators in question – ideology, belief, value, topics, emotion, stance, and electoral behavior. Network analysis will then be applied to enrich the dataset. An analysis of the network structure surrounding individual users will provide deep insights into this public opinion. For example, an individual user’s political ideology or religiosity can be inferred from this user’s network interactions.

Overall, the Politus Project will extract public opinion from online platforms using natural language processing, multimodal deep learning, amplified learning, and network analysis. Although user-generated content offers an immense potential to measure public opinion and various trends in society, including voting behavior, consumer behavior, ideologies, perceptions, values, or beliefs (Barberá et al. 2015; Barberá 2015), it also carries a wide array of biases and limitations, which culminate in different types of errors. Using the data extracted from online networks for public opinion is often criticized as having a solid demographic bias – participation in online networks is strongly affected by users’ age, sex, education level, race, income level, etc. (Mislove et al. 2011; Sloan 2017; An and Weber 2015; Cohen and Ruths 2013; Filho et al. 2015; Barbera 2016; Olteanu et al. 2019; Sen et al. 2021). Such data is non-probabilistic, and it is challenging to generalize the findings to the real population (Sen et al 2021). To deal with demographic bias and overcome the non-probability sample characteristics of the data, Politus will utilize Bayesian multilevel regression with poststratification (MRP).