"Assessing the environmental, social, and governance (ESG) quality of an issuer is key for investment decisions, not only to take into account the sustainable risks that might weaken the issuer’s financial strength, but also to assess the product’s impact on key issues that represent a systemic risk for society, such as climate change, fraud and corruption, and social cohesion. To perform this assessment, having access to data is paramount. Today, because of regulation differences, disclosure is different from one region to another. There are nevertheless data sources that are underused and could, if correctly analyzed, strengthen our ESG assessment.

Artificial intelligence and specifically natural language processing (NLP) could enable investors to access new data sources in an efficient way. In particular, it could give us access to signals coming from unstructured data and therefore dramatically increase the scope of our analysis. Unstructured data gives us access to new types of information, but it also provides color and perspective to data already in hand. It could also help issuers make sure the data available are more efficiently communicated and used, and therefore could help combat “ESG reporting fatigue.” It is furthermore a way to limit biases due to discrepancies in the means dedicated to corporate social responsibility communication. NLP is therefore particularly useful in the emerging market space.

ESG is a complex, multidimensional field. To understand reality we need to analyze it from different angles, just as we would assess the different sides of an object or a sphere. Discrepancies in data could mean that the angle used is different. Cross-checking sources is therefore an important task for an ESG analyst, to check facts but also to improve the quality of analysis and capture the complexity of cases. Recognizing and tackling complexity, assessing risks, making a professional judgment, and expressing it in a short and intelligible manner: this is what the ESG analyst does.

Quality and traceability of data are also keys for ESG analysts. Having access to raw data  is essential for a sound and fruitful dialogue when it comes to engaging with companies to discuss their views and to influencing and encouraging their adoption of best practices and their positive impact on key societal issues such as the Sustainable Development Goals.

Therefore, having access to good-quality data with a large breadth, on a large number of issuers will improve the quality of Amundi’s ESG analysis as well as the equal treatment of issuers. It is thus a public good that could improve the quality of ESG assessment, especially in emerging markets."


This paper describes the potential approaches for institutional investors and asset managers to align their investment strategies for emerging markets with the Sustainable Development Goals through environmental, social, and governance (ESG)-integrated investments. Although investors have historically regarded emerging markets as riskier than developed economies, emerging markets hold the potential for greater financial and development impact returns.

Although the global ESG fund universe has tripled since 2015, most of this growth has been in the developed world.1 A key challenge limiting the ability of asset managers and institutional investors to invest in emerging market issuances is the lack of ESG data. The boom in ESG investing and advances in artificial intelligence (AI) technologies have the potential to help investors overcome ESG data challenges. This assistance could play a transformative role in unlocking emerging markets for greater investments in the short and medium term.

Investors are making use of AI and emerging technologies to support ESG data collection and analysis for both developed and emerging markets. Research shows that unstructured ESG data, such as news articles; multilateral development bank (MDB) project disclosures; annual, integrated, and sustainability reports; and bond prospectuses are generally underused in analyses of issuers’ ESG performance.2 The use of AI applications, enhanced by a new generation of machine learning (ML) algorithms and cloud computing, has recently led to innovations in the analysis of unstructured text data on a massive scale through the use of natural language processing (NLP) techniques.

Amundi and IFC are collaborating on ESG research, analytics, and tools to increase ESG data, advance issuer transparency, create reporting infrastructure for emerging markets, and support harmonization of reporting standards. In this paper, Amundi and IFC describe the results from one of many areas of collaboration: developing and testing an ESG-domain-specific NLP application (esgNLP) to support the analysis of the ESG performance of emerging market financial institution (FI) issuers of fixed income bonds.

This joint experiment compares Amundi’s controversy and ESG scores for a group of FI issuers of hard currency debt with an NLP analysis of short-term ESG controversy and long-term ESG performance signals that have been extracted from unstructured data.

The experiment successfully demonstrates the potential for esgNLP analysis to provide additional validation of Amundi’s ESG scores by identifying the correlation between esgNLP and Amundi’s scores. In cases where the scores diverged, additional analysis into the drivers of negative and positive sentiment offered further insights. This insight can help identify gaps in performance and areas for improvement at the issuer level. It can also help support engagement strategies. Further, the esgNLP analysis was able to supplement information for the FIs for which Amundi had no scores. This finding established esgNLP’s potential as an additional ESG data analysis and scoring tool.

The experiment also illustrates the value of unstructured data as a source of insight into ESG performance for emerging market issuers. The experiment demonstrates the potential for AI solutions such as NLP to unlock value from such unstructured data. Algorithms such as esgNLP have the potential to analyze massive amounts of unstructured text from public sources and extend the analytical capabilities of investors. This tool allows for a rapid, comprehensive, and at-scale analysis.

Conducting an analysis by document type and as a corpus allows for the comparison of the sentiment profiles of different sources of information and types of documents. The observation of the difference between sentiment profiles of issuerdisclosed information and of analyses of text from media and other sources supports the case for greater transparency in material nonfinancial disclosures. This also makes the case for independent verification and assurance of disclosed information.

This analysis also provides useful insights to identify baselines for the state of emerging market disclosure. Even when such disclosures were available, non-English text posed an additional complexity. This is a shortcoming of the esgNLP model that will be addressed in future iterations through training in additional languages.

Data analysis at scale is constrained by the decentralized location of ESG information. Investors face an array of manual steps when preparing data for analysis. This requires identifying and downloading individual reports, performing text extraction, and conducting data cleaning before any analysis can be conducted. These processes constrain analysis at scale. The number of recent initiatives to increase ESG data availability and access are welcome.

Transparency about the limitations of esgNLP is essential to ensure that the tool provides an effective complement to the analysis of human experts. It is important to be able to explain how such models make decisions so that users can understand model output. Future iterations of esgNLP will include explainability features to increase model transparency and trustworthiness. Transparency is also essential with regard to training data and data bias. Future iterations of esgNLP will integrate refinements to manage data bias, such as ensuring diversity of data sources, tracking geographic and sector data coverage, and ensuring highquality training data. Another common challenge with ML models is model drift. Model drift can be managed by periodically retraining and redeploying the model to align with inference data.

esgNLP is in beta testing and has not been validated by IFC through operations. The arguments made in this paper for efficiency gains and for the use of sentiment scores as proxies of ESG risk require additional testing. Future refinements will include adjustments to the analysis to account for FI size.

The paper ends by (a) recommending support for open-access ESG and impact data and analytical tools for emerging markets, (b) reinforcing the need to extend analytics to the development of additional AI and NLP solutions in the short term, and (c) proposing cross-industry collaborations between big finance and big tech to address ESG integration.

To find out more, download the full paper


Global Head of ESG Research, Engagement and Voting
Head of ESG Development & Advocacy, Special Operations
Business Solutions and Innovation
Principal Environmental Specialist International Finance Corporation
Principal Environmental Specialist International Finance Corporation
Associate Environment, Social and Governance Officer International Finance Corporation