Champions
of Data + AI

Data leaders powering data-driven innovation

EPISODE 12

Scaling Data Marketplaces

Data is huge. But no matter how much data you have, it’s dwarfed by the data collected by your partners and third parties. That’s why data sharing is the next big thing. As organizations increase the use of third-party data to complement and supplement their existing data sets, the ability to securely share reliable, relevant and large quantities of data creates a whole new world of possibilities. Warren Breakstone joins us to share his thoughts on scaling data marketplaces.

headshot
Warren Breakstone
Managing Director and Chief Product Officer, Data Management Solutions, S&P Global
Warren Breakstone is Managing Director and Chief Product Officer, Data Management Solutions for S&P Global Market Intelligence. In this role, he is responsible for the profitable growth, product innovation and content delivered through XpressfeedTM and other digital distribution platforms. Separately, Warren oversees the CUSIP Global Services and Trucost business units. He also leads the S&P Global Marketplace initiative, which launched in early 2020 as an integrated platform for data discovery and exploration. Warren is a member of the division’s executive team.

With over 25 years in the banking and financial information services industry, Warren has held senior leadership positions across a variety of global businesses, including Thomson Reuters, Primark and the Chase Manhattan Bank. He joined S&P Global in May 2015.

Warren earned a bachelor’s degree from Clark University and Master of Business Administration from the George Washington University. He is a member of the Board of Directors for the New York American Heart Association and formerly served on the Scarsdale Library Board of Trustees.

Read Interview

Alexandra Mysak:
Welcome to another episode of Champions of Data and AI. I’m Alex Mysak, your host for today’s session. Data products are a thing, but what about data sharing as a product? As organizations begin to source more and more third-party data to supplement their own existing datasets, the ability to share secure, reliable, relevant, and large quantities of data creates a whole new world of possibilities. In this episode, Warren Breakstone, managing director and chief product officer of S&P Global Market Intelligence and their data management solutions business, he will share with us how S&P Global Market Intelligence is transforming the data sharing world. We’ll hear from Warren on areas of extreme growth, the types of data in demand, and best practices for sourcing data. We’ll also see how well Warren does at some S&P trivia. Warren, thank you for joining us today. Delighted to welcome you here to Champions of Data and AI.

Warren Breakstone:
Thanks Alex. Great to be here.

Alexandra Mysak:
I’d love to introduce to our viewers, Warren, a little bit more of yourself, a fellow champion of data and AI, and I’d love to share a little bit of your journey to become chief product officer for the market intelligence data management solutions group. So if you could possibly share with viewers as a little bit about your journey in getting to this position and share your experience?

Warren Breakstone:
Sure. Happy to. Thank you for the question. Maybe I’ll take a half step back first and tell you why we have a data management solutions business in the first place. What we’ve been seeing from our clients, is that they’re increasingly looking to bring data directly into their environment, into their databases, into their data warehouses, now into their data lakes, and to layer on analytics and various tools and data science approaches on top of that data in their own environment. And this has been a theme and a trend that continues to accelerate and expand. It’s relevant beyond our core investment management segment today and crosses investment banking, insurance firms, corporations, each looking to be able to take advantage of the various data and technologies and new data science approaches to try to get more out of the data, ultimately make better decisions.

Warren Breakstone:
If you’re an investment manager, that decision’s an investment decision for your clients. If you’re an insurance client of ours, very often you’re looking to build a better or more effective underwriting model. If you’re a corporation, maybe the decisions you’re making are about entering new adjacencies or identifying opportunities, or servicing your own clients, or pricing your various products and solutions. So this is a trend that is gaining in prominence and relevance across all the segments in which we operate. So we saw this, and we decided the best way for us to really drive value was to take a far more integrated approach to this opportunity to servicing our clients. And that had us build some teams that focused on productizing our various content assets to structure it, to link it, to enhance it, to create new data, and to be able to deliver that through modern delivery channels, APIs, our express feed, both delivery, which is a leading data feed solution in the industry, as well as now most recently, our cloud distribution.

Warren Breakstone:
And not just providing the data, but enhancing the data, and now layering on machines and AI and tools, analytics on top of it. So, really exciting opportunity. As it relates to me personally, I’ve been in the industry for about 20, 25 years. I started over at what is now JPMorgan Chase, and moved on to Thomson Corporation, which later became [Thomson Warriors 00:04:36], and joined S&P Global in 2015, and it’s been a really great journey. I’ve had a lot of different roles and responsibilities looking after various information services businesses and organizations, as well as technology, and a lot of time spent with clients and in the market, which I’ve really, really enjoyed. And so this data management solutions opportunity is really quite exciting as it brings together both the data, the technology, and then the application of that in various client workflows across all the different segments in which we operate. So I learn something new every day. It’s a great job.

Alexandra Mysak:
Indeed. And one of the things I heard you say there, is there’s an acceleration. So when you think about data management solutions at S&P, it sounds that there is a number of factors here at play that are causing this acceleration, and from your experience, what would you say that those are, i.e. why now? Why is this acceleration-

Warren Breakstone:
Why now. Very, very interesting question. I think it’s the convergence of a few trends, which is sort of creating this perfect storm of an opportunity. The first is around technology. It’s just far more accessible than it’s ever been. The ability get the equivalent of a terabyte of storage capacity and be able to pay roughly the cost of a Starbucks coffee, one of those with all the whipped cream and the like, 5, $10 a month renting it from any one of the major cloud providers. The ability to get and harness processing power that comes from the cloud. Not only has the price dramatically dropped over time, but it’s become far more powerful and does so every few months. So on one hand, the accessibility of technology has really created this series of great new products that have come out, analytical tools, and machine-driven tools, and the application of artificial intelligence and natural language processing and all sorts of other things that have now been created because the technology allowed for it.

Warren Breakstone:
So that, I think, would be the first, the technology. The second thing would be this explosion of data. I’m sure you’ve see this in Databricks as well. It’s just been an absolute explosion of data. Now, our clients are very much looking at harnessing their own data within their organization, very often breaking down silos that may exist between their various divisions. They have vendor data, like data that you would get from S&P Global. And so you’ve got the technology, you’ve got the data, and now you’ve got major use cases in a competitive environment where you’re just so eager to find an edge in any of the segments in which we operate. How do you become far more differentiated, how do you drive more value for your clients, and so it’s this perfect storm of opportunity.

Alexandra Mysak:
It’s really interesting you talk about the accessibility to data and the burgeoning data available to people, because Dan Jeavons, who’s the VP of computational science, was part of this series last week, and one of the things he mentioned was a learning lesson around more is not in fact always more. And so they had given more data to some of their traders, and found that this didn’t necessarily improve the results. As financial services, and then broader corporate America is now hungry for data to make better decisions around their business, or to improve their offerings and competitiveness, what do you think is the best methodology for an organization to determine the right datasets?

Warren Breakstone:
Right. Well, first I’d say that I agree, more data is not anything better. In fact, it can be far more of a burden and far more costly. We did a study recently with 451 Research, which is an independent research arm that’s owned by S&P Global, and we interviewed a few thousand clients and market practitioners, and what we found was that those people who work with data, whether they be a data scientist or a data engineer, or have some other sort of data practitioner type of role, can often spend half to 60% of their time just preparing data, scouting it, cleaning it, linking it, structuring it, databasing it, doing all the things that are necessary, but really not what they’re paid for. It’s all this work that has to happen for their value add of doing the analysis and making decisions.

Warren Breakstone:
And this is extremely costly. And with more and more data out there, it has this exponential effect. So I agree with Dan’s premise 100%. So what do we do about it? So I think I would maybe highlight three things. The first, is that we try to work with our clients in a diagnostic way to help craft the question. What is the question that you’re looking to answer? What is the problem that you’re looking to solve for? And then we work backwards from there, and we try to identify the right sets of data for our clients for their particular question, for their particular use case.

Warren Breakstone:
And then of course we’ve got to consider how much history, how the data is structured, ensure it can be delivered through integrated distribution to become part of their workflow. But it starts with a diagnostic approach, it starts with the question. Second, we’ve now complimented that with a platform which is called the S&P Global Marketplace. Marketplace is a platform for our clients and our prospects, frankly, to be able to evaluate and explore the vast sets of data that are available from the four divisions of S&P Global, plus select curated alternative data that we’ve also added to the platform, plus solutions, including solutions that we’ve built with Kensho. Kensho is a business we purchased in 2018, and it’s a leader in machine learning and applied AI in the financial industry. We’ve been working with Kensho for a number of years, employing their capabilities and solutions internally at S&P Global to help us with some of our big data challenges, linking and cleaning and tagging various datasets. And more recently, we’ve productized many of those solutions for our clients use, and we also have that available on the marketplace.

Warren Breakstone:
And so what the marketplace also does, is have some tooling associated with it that helps clients identify the right content for their specific use case. So a series of questions are answered, or behaviors are exhibited, and the system generates suggestions of content that may be most relevant to you. So, that’s the second piece. The third piece is actually in a partnership that we have with Databricks that we’re very proud about, where we’ve extended the S&P Global Marketplace to include what’s called the Workbench. And what the Workbench does, is it provides our clients with the ability to employ modern technologies like R and Python and Scala and SQL in a secure cloud-based environment on top of our data. So what we have is this notebook-driven platform, secure, where a client is able to apply modern tools and technologies on top of a query library that we’ve built, and apply compute power to be able to do some big data testing, and they can even bring in their own data into the environment as well.

Warren Breakstone:
And that’s been just great, because what that’s done, is it’s helped deal with this time issue, that how much time is being spent on trying to find the right data for your use case, going back to Dan’s original premise and example that more data is not better. So what we try to do, is get our clients through that process as quickly as we can, and the help of Databricks in this modern platform for trials has really done the trick.

Alexandra Mysak:
And Warren, I did notice that Doug Peterson made specific mention of the Workbench within your overall firm’s recent earnings calls, so congratulations to the team on that. Obviously widely acknowledged, even at the top most level that the ability to have a tool like Workbench is finding a new revenue stream, another way that customers can access S&P data more easily.

Warren Breakstone:
Yes, we’re very happy about that. In fact, he also mentioned the URL of the product, which is marketplace.spglobal.com. And right after the call, we had a flurry of new volume and activity on the site, so we were really happy that our CEO took the time to mention it.

Alexandra Mysak:
That was super exciting for you all. It doesn’t happen every day. I would love to dig in. You did mention alternate data, and that is a huge topic for many of the viewers here. Could you define in your words what constitutes alternate data today, and just talk a little bit about the industry trend that you’re seeing, in a particular for S&P Global Market Intelligence data?

Warren Breakstone:
Right. Okay. Alternative data. It’s interesting, there isn’t, I don’t think, an agreed upon definition of what is alternative data, but I’ll give you mine. I think alternative data is data that was originally designed for one use case and is now being used for a second use case that it was really never intended. And it’s that second use case that sort of makes it alternative. And let me give you an example. We recently added to the marketplace a dataset from a BitSight, and what BitSight does and does very well, is they score companies based on their cyber posture, so how secure are they, and that data was originally used for corporations to really understand their own cybersecurity. Well, what makes that data alternative is the new use cases. So what we’ve done, is we’ve now made that data available to different types of clients, packaged with other data, to help them either if you’re an investment manager, understand the cyber risk of your full portfolio, or maybe if you’re a corporation, to be able to assess your own supply chain.

Warren Breakstone:
And so that’s an example of data that has one use case, that has now been applied to a secondary use case, making it “alternative.” I will tell you… And this I think was the second part of your question, but I’m not a big fan of the term alternative data. I think the term alternative data will probably go away. What we’re seeing right now amongst our clients as they start to use this alternative data and rely on this alternative data, is that their expectations for their alternative data use and their expectations of the traditional data have sort of converged. Because what they found is, if they’re relying on this data, and they’re building models off of it, and they’re incorporating it into their workflow, and ultimately making decisions off of it, they have to have confidence, the same confidence they have with the traditional data that they utilize every day.

Warren Breakstone:
They have to have confidence that data will be complete without major gaps, that the data will have ample history, that the data will be there when we need it, and if it’s not, that they’ll have somebody to be able to call in the middle of the night. And so this convergence will probably lead to the term alternative just going away. Frankly, data is ultimately data.

Alexandra Mysak:
You there mentioned security and the reliability of data, I think the other aspect of data quality, and therefore having a customer’s trust with that data, is the sourcing of data. Could you share a little bit with us around your process there and how you go through that evaluation process?

Warren Breakstone:
Right. Sourcing data, particularly alternative data, I think probably we start with a philosophy, and the philosophy is that decisions, or at least the best decisions are made when data converges, when it’s linked together, when you can work across various datasets, and then be able to apply your own expertise and experience and tools and analytics, but that convergence is really important. So when we are out scouting for new data, we don’t think about it necessarily in a discrete silo. We find that even the most alternative and exciting and interesting datasets in the world, no one makes a decision, or few people make a decision off of one set of data, it’s that combination of data. So what we try to do, is we try to find opportunities where when we combine an alternative dataset with a traditional dataset that our clients may already be using, that combination should be able to drive something incremental, some value add for our clients.

Warren Breakstone:
And so that is a very important filter for our scouting efforts. And now we’re very fortunate we have a scouting team made up of former practitioners, clients of ours, data scientists, CFAs, even a PhD, who are out, they spend part of their time out looking for interesting data. Once we find it, they bring it into the environment, they study it like a client would, they back-test it, they look if there’s signal in the data, they evaluate the vendors, the sources of the data to make sure that there’s enough robust processes and discipline so that we can count on the data and be able to make it available to our clients, we look for opportunities, can we enhance the data, what other data can we use with it, how do we deliver it in a consistent, integrated way. So this team plays a very important role, and that’s where we come up with the term of carefully-curated alternative data.

Warren Breakstone:
We really try to do the work that normally our clients would have to do, we’re taking it on ourselves. We think that that then drives for a better solution, a better outcome for our clients. Going back to the theme around time, let’s figure out ways to get our clients through that evaluation processes and preparatory processes far quicker. Now, the second thing I’ll mention on this has to do with our own internal data. We have what’s called our quality program at S&P Global Market Intelligence, and this quality program has us guaranteeing the quality and the accuracy of our data.

Warren Breakstone:
So if a client reports that they have seen an error in our data, it happens every once in a while when you deal with all the data that we do, and they see an error or something that’s incomplete and they report it, we’ll then investigate it, go all the way to the source, make sure if it’s an issue, we will correct it, we will look for other issues that may be similar, and correct those two, and then we remunerate the client with a $50 check or a donation of the same amount to a charity of choice.

Warren Breakstone:
And that’s important for us to be able to put our money where our mouth is, that that quality of data, accuracy of data, comprehensiveness of data, is that important to us. So this quality program helps us also internally, as it keeps everybody focused on the goal of quality and quality data in all of our processes. We measure it, folks get compensated. I have a component of my compensation based off of the quality of our content as well, and it sort of becomes a spree decor. And the fact that we also bring our clients into it and pay them when they tell us we have an issue also helps align our interests, and I think good data is good for everyone.

Alexandra Mysak:
Yeah. I think maybe inspired by you, we should maybe consider renaming this series to Guardians of Data and AI, Guardians of the Data. I really like that. The last question I will ask you, and I think we’re almost out of time here as guardians. The guardianship of data, particularly in the ESG space, as I mentioned for Databricks, has become a huge exploding area where customers are looking to us for guidance, and for us on the architecture side, for you on the datasets that plug into that, what are you seeing in this space, Warren?

Warren Breakstone:
Huge relevance, only increasing. We have something like 25 different products related to ESG today. It’s become far more relevant in our client’s workflow, far more relevant in our client conversations, and that’s really a good thing. The solutions we have in the space vary from ESG scores on thousands of companies that can help investors understand and compare different stocks to others in their industry or sector. We have analytics that help us understand the carbon and water footprint of over 15,000 companies. We have tools that help clients understand the physical risk of their assets, whether it be a data center, or a factory, or property, or their supply chain to various climate-related risks. We have transition products that help companies understand, as different municipalities, different jurisdictions start to potentially introduce carbon taxes, what the impact that will be on their earnings. We have green indices and various sustainability products, and a lot of great research and commentary.

Warren Breakstone:
It’s one of the two most popular datasets that we have today on our marketplace. I think that this will only increase over time. We’ve just stood up a business squarely focused on the opportunity called Sustainable1, and my prediction is that while today ESG is still in many pockets something separate that is looked at, I think over the next year, year and a half or so, it’ll become far more integrated in the client’s workflow, whether they’re doing portfolio analysis fully integrated with ESG, whether they’re doing fundamental analysis, or research or analysis, ESG is very much becoming part of all of that. So I think we’ll see far more integration of ESG into core workflow, linked to other datasets and sources that we have. And second, I think you’ll see standards continuing to emerge and evolve, and that’s also a great opportunity.

Alexandra Mysak:
Thank you for that. And I look forward to continuing the journey partnered with all of your teams in this space, because it continues to be an area where particularly corporate America is struggling to understand how the market views their sustainability and their progression along this journey.

Warren Breakstone:
Alex, I’ll give you a second content area that is really seemingly exploding, because you mentioned it earlier. But we view about 80% of the data that’s out there is unstructured, and so we are really focusing our efforts on what we are terming textual data, so text-based data that we’ve turned into becoming far more machine-readable, adding tagging, and adding linkages, and adding structure to this data so that our clients can bring it into their environment, run it through their machines, perform natural language processing on top of it. It’s a huge growth area, and an area that our clients are asking a lot of questions on. We’ve added a lot of textual data over the past two years, whether it be our earnings call transcripts, global filings, news, credit research, as well as a lot of third-party data related to patents, US legislation, court cases, and we’re making all of this text-based data available for machine ingestion. And I don’t know, have you ever read a regulatory filing, Alex?

Alexandra Mysak:
I actually have, because as you know, I came from financial services, so many times.

Warren Breakstone:
Right. And would you say it’s exciting to read a regulatory filing, or would you say it’s pretty darn boring?

Alexandra Mysak:
We pretend it’s very interesting, don’t we?

Warren Breakstone:
We do pretend it, but the reality is they’re really boring, but the good thing is, machines don’t get bored. So as the machines are being used in these various processes to uncover insights on individual content, but also more broadly across a full spectrum of content over many, many years, the machines are able to compliment the human processes in place, find new insights, find new opportunities and really enrich the experience. So I think that that textual data, and separately, ESG, and the convergence of the two really will be an exciting opportunity for many years to come.

Alexandra Mysak:
Well, I can think of a really good platform that provides the architecture to support all of that, so thank you. To round things out, to finish things off, Warren, one piece of career advice that has either stayed with you through your career, or that you’d wish you’d known earlier in your career that you would like to share with the viewers today?

Warren Breakstone:
Okay. Advice, particularly for those starting out in this field, those first jobs, prioritize roles where you can learn. Don’t worry about the money. Look for opportunities where you can learn, learn as much as you can, the dollars will follow later in the career. The second comment is, make more closer to home. My 15 year old son this summer is playing baseball, playing basketball and taking an online course on Python. So there are many, many opportunities to begin to delve into this whole data science opportunity. And in fact, there are now more data science roles out there than there are data scientists. So if you’re looking for a career in data and the convergence of data with machines, the application of these data science approaches, I’d bet on that.

Alexandra Mysak:
It has been so great hosting you here, Warren. I think to round it out, so much of what you shared with us, accessibility to data, there are teams of people that can help with getting the right access to data, there’s infrastructure that can help with that access to data, and then it’s around leveraging the right kind of cutting edge datasets you mentioned, textual and structured ESG, you name it. They’ve all been tremendous insights. Thank you so much, Warren, for joining us today for the Champions of Data and AI.

Warren Breakstone:
I really enjoyed it, Alex. Thank you so much, and thank you Databricks for what has been a great partnership.

Alexandra Mysak:
Thanks for your continued partnership, Warren.