Using Data Science to Create a More Perfect Union: The “Data Challenge for America”

This article was co-authored by Tianhui Michael Li, CEO of The Data Incubator.

To build a more just marketplace we need to integrate data science into impact decision-making. The upshot will be both better financial returns and better social outcomes.

We believe in the power of information. We also believe in markets and capitalism as a force for good. The two are inexorably linked, because markets don’t work properly without open access to reliable data and information and the insights and perspectives they drive. Within the impact investing and philanthropic worlds, this is doubly so.

The soft underbelly of the impact investing movement – which for the purposes of this discussion also includes mission-related, sustainable, socially-responsible, and ESG investing – is the measurement, optimization, modeling and demonstration of the actual outcomes the underlying impact strategy is trying to achieve. The world of philanthropy has suffered from a similar affliction. Hundreds of billions of dollars flow every day into companies, projects, products, and investment vehicles dedicated towards making the world a better place, yet how are we measuring the social, environmental, economic benefits that we want to create? And what is the biggest impact or philanthropic return on capital? The lack of reliable, meaningful, data-driven insights relating to performance is materially hampering progress, and making it difficult to build the models we need to refine cost-benefit analyses and inform decision making about capital allocation. And by making it harder to account for impact success, it is also constraining the flow of additional resources into the sector.

We shouldn’t be too despondent. After all, traditional financial accounting has had over 500 years to evolve since Franciscan friar Luca Pacioli first invented double entry bookkeeping back in 15th Century Venice. And even now, financial performance measurement can still be as much art as it is science. Nonetheless, there is little doubt that measuring and recording impact and philanthropic outcomes with the same discipline that we use to assess financial performance is a prerequisite to driving a more just form of capitalism at scale.

One of the most widely held views in this whole field is that there is a surfeit of data within the impact world; that we are, in fact, drowning in data, and that what’s really needed are universal standards and metrics so that everyone can get on the same page, and that we can compare ‘apples with apples.’ There is some truth to this, however, in our experience, there are still two more fundamental challenges facing the impact and philanthropic space: access to data and knowledge of how to process it.

The Challenge of Data

Currently, data availability within impact resembles the proverbial dog’s breakfast. It’s a mishmash of everything from anecdotal, unrepresentative data on idiosyncratic issues, experiences, and situations to large-scale government databases on highly specific themes.

On environmental issues, government agencies, corporations, ESG data vendors, and rating companies have amassed vast quantities of comparable, if not entirely meaningful, performance data on all sorts of issues ranging from greenhouse gas emissions to water consumption. Likewise, thanks to the SEC disclosure requirements, and the work of organizations such as Institutional Shareholder Services, BoardEx, and others, we are replete with data and analysis on a wide range of traditional corporate governance metrics, such as board share of ownership, the percentage of independent directors and diversity of the board. Yet even in ESG, we are still not sure whether the things being measured are actually optimizing the outcomes we seek when we put capital to work, i.e., a cleaner, healthier environment and better governed companies.

However, when it comes to tracking social, economic, health, and other elements of the human condition, the story is reversed. Standards and metrics abound, yet reliable, consistent, meaningful performance data is scarce. And when it does exist, it is either incomplete, inconsistently gathered, or difficult to access. Typically, the information available relates to the existence of policies that companies might have on, for example, promoting gender pay equity or supporting the health of its workers. Actual performance data is rare, and analysis and insights on outcomes based on these data is even rarer.

Technology, and the demand for greater transparency, is helping. The pool of customer sentiment and product quality data via social media is vast and growing in utility. Employee pay data provided by crowdsourced websites such as GlassDoor (a JUST Capital partner) is also increasing. Information on community health; county-level economic and income conditions; local environmental conditions and pollution vectors; job quality and labor conditions; and myriad other aspects of socio economic conditions around the country is becoming more widely available. All of this can be used as the raw material for impact-oriented data science exploration.

Data Science

Collecting the data is just the first half the challenge. The second half revolves around taking the raw data and converting that into interpretable and actionable information — that is, doing the hard work of data science.

Unfortunately, data scientists are in high demand – Silicon Valley and Wall Street firms cannot hire them fast enough and are bidding top dollar for the best talent. Humble (and resource-constrained) nonprofits are having a hard time recruiting, let alone retaining, the skills and expertise required to make them more effective.

To tackle this challenge, JUST Capital and The Data Incubator are announcing the Data Challenge for America. We’ve proposed four core problems investigating the link between JUST Capital’s metrics derived from surveys of over 50,000 people all around the country on the issues they prioritize and the associated performance data of the country’s largest corporations.

In essence, the challenge projects look at how data science can help create a more just economy that better serves the broader best interests of the country. How can we incentivize companies to help produce better outcomes on things like income inequality or building healthier communities? What actions should investors and companies take to optimize better social outcomes and financial performance? Which metrics are most powerful in predicting future performance? We’re leveraging The Data Incubator’s free data science fellowship to crowdsource the answers. Each quarter, the fellowship receives thousands of PhD and masters applicants who will be asked to propose a capstone project based on these and other projects to highlight their skills to employers. The best projects will be selected to participate in the free fellowship, to present their results to the JUST board, and to see their results incorporated into JUST’s high-profile work.

This is not like a traditional data science or machine learning competition. Most data science competitions collapse performance into a single reductive metric, often missing important factors and leading to impractical, overfit, and overly complex solutions. We believe that the right approach is holistic, to look at analyses as not just improving predictive power, but impactfulness, parsimony, and interpretability – in other words, injecting the much-needed human dimension into data science for widespread social betterment.

Data and data science are both desperately needed to make impact investing more data driven. By collecting and making the data more readily accessible, we are providing the raw ingredients. By crowd-sourcing the data science, we are harnessing those raw ingredients into useful analyses. We’re hoping this contest not only delivers meaningful and impactful projects, but also inspires an entire generation to apply data science for social causes.

Have questions about our research and rankings?  We want to hear from you!