Gauging the effectiveness of big data in coronavirus fight

Best listening experience is on Chrome, Firefox or Safari. Subscribe to Federal Drive’s daily audio interviews on Apple Podcasts or PodcastOne.

In April we brought you the story of formation of the C3.ai Digital Transformation Institute, a coalition of companies and universities taking a big data approach understanding and coming up with solutions to the COVID-19 problem. Now they have progress to report. With an update, C3.ai CEO Tom Siebel joined Federal Drive with Tom Temin.

Interview transcript:

Tom Temin: Tom, good to have you back.

Tom Siebel: Good morning, Tom.

Tom Temin: Now, I want to point out there was a story just running this week in the Wall Street Journal about another effort in Silicon Valley, COVID-19 technology task force that did not do so well, trying to whatever they were trying to do, it was kind of confusing. Not so with your effort.

Tom Siebel: Our efforts in COVID today have been hugely successful. We launched the C3.ai Digital Transformation Institute with MIT and Carnegie Mellon, Princeton, University of Illinois, University of Chicago, UC Berkeley and now Stanford. And this is funded to the tune of almost $400 million in cash. We now have 186 proposals to do research to apply AI to mitigate the spread of pandemic, drug discovery, determining infection rates, transmission rates, what have you. And so we will be announcing actually the week of June 8, I think the first $6 million in grants to do advanced AI research in that area. Then the second effort that you’re aware of, as we launched the first effort to see three digital transformations that we did that was that’s jointly supported by us, by C3.ai and Microsoft, and entirely a pro bono effort that all of the research from that goes into the public domain. We’ve got a lot of support from CDC, NIH, UNESCO MITRE Corporation, so everybody’s leaning forward to assist with that, and that will be the world’s largest corpus of COVID data. The second effort is the publication of the data lake. And the C3 data lake. This is a joint effort with Amazon Web Services, where we have aggregated now 30 of the most significant data sets related to COVID coming from the White House, coming from MITRE corporation, coming from the World Health Organization, New York Times, the University of Washington, Johns Hopkins University, what have you. And we’ve aggregated that in to the world’s largest unified federated image of hard data related to COVID. This is everything from comorbidity, course of disease, hospitalization, location of emergency supplies, and we’ve made this data lake available to the global research community for free so that these researchers can advise our policymakers so that people could make well informed policy decisions in France, in Italy and in the United States, and I think we will argue, but if we look at California, or whether we look at Washington DC, I think we can all agree that, you know, many of these decisions that have been made are not based on science cause they’re just not well informed.

Tom Temin: Sure. And a lot of politicians talk about basing things on science, but they don’t really mean it, I think half the time. So on the one hand, you’ve got this data lake. On the other hand, you’ve got 186 proposals. And so once you choose the initial proposals, they will go to work using the data lake is that the architecture of this?

Tom Siebel: Over 186 proposals, I think 30 will be funded. And the quality of the research proposals are superlative. And some of these people are, you know, Nobel laureate level researchers, and they will be using the data lake and other data sets that they have to perform this research. In addition, we’ve made the data set available to the world for free. So people all over the world are using us, okay. And China in Europe in Washington, DC, CDC, NIH, others, and it’s a resource that thousands and thousands of people are accessing to perform, you know, meaningful research to really understand, you know, what is the infection rate? What is the morbidity rate? What is the course of disease? Does this invariably continue? What is the effect of social distancing protocols, etc. so soon we should have, you know, hard data to inform policymakers and inform us as citizens so that we can make better informed decisions, about be it California or Washington DC, or how we’re going to dispose ourselves with our family.

Tom Temin: So all 186 proposals could do research if they wanted, they would just have to do it on their dime, but the 30 that you will fund…

Tom Siebel: We will fund 30 of these, that is the digital transformation institute who will fund 30. I expect that many of these others will get funded by other organizations like CDC, like NIH, like Milliken, like the Allen Institute and others. So I’ll be surprised if other organizations, World Health Organization, do not get involved. We have a budget I think the fund about for this first phase is about $9 million in grants, cash grants, and then a very large amount of computing and software resources to support this research. We will publish these and I will be very surprised if other organizations don’t get involved and fund perhaps another 20 or 30 of these projects.

Tom Temin: Now, the data is always changing and there’s new datasets coming all the time. Is the data lake being refreshed as things come up?

Tom Siebel: Absolutely. I believe we have 30 data sources that are aggregated into the data set today in a unified federated image. As each of those data sets grow, the data set is updated in real time. So this has to do with whether we’re dealing with infection rate comorbidity, a new infection inWashington DC, a new infection in San Mateo. So as those data gets tested, this is very dynamic and it grows in real time. In addition, we’re now looking at adding another 58 data sources from around the world. So these are new and different data sets, say from India, from South Korea, from China, for example, in addition to the data set, it is not static, it is dynamic. And as the underlying data set is increases with research, with papers, with infection rates, that is reflected in real time in what’s provided. In addition, the researcher is able to take a snapshot of the data as it existed at any point in time in the past, and like, play it forward and play it back. So it really has a lot of utility for data scientists to reach conclusions with high levels of certainty.

Tom Temin: And there is a spatial quality to the data. It sounds like because it is coming from specific areas. How are you dealing with the whole privacy issue, because that’s kind of hung up some of the other big data efforts when there’s infection data about people that go into these datasets?

Tom Siebel: At C3.ai we’re involved in critical infrastructure worldwide. So we use the same technology that we use for the C3.ai data lake is the same technology that we use for the United States Air Force, United States Army, Defense Intelligence Agency, for companies involved in precision medicine. So we are subject to the highest standards of security as it relates to critical infrastructure, cybersecurity, testing GDPR, HIPAA, what have you, so that there are no personally identifiable information that are available and so all of that is encrypted.

Tom Temin: Now you have a company to run, a commercial outift. Have you had any time for that or has the spin most of your focus?

Tom Siebel: Well, the good news here is we are about 500 people around the world. And we are involved in some really important projects. We support critical infrastructure, for example, utility operators worldwide, United States Air Force, United States Army, oil and gas companies, people manufacturing ventilators, people manufacturing, C pap machines, organizations that are manufacturing medical monitors. So we are a critical infrastructure provider and we need to provide, you know, four nines of availability and reliability globally. And if our system goes down, did any of these installations people are going to die now, so we have roughly 500 people around the world who are supporting these customers, they’re doing a superlative job of it, they know their jobs, and this is freed up me and about 10 other people can focus on this COVID initiative to see if we can make a positive contribution to this dialogue and advance the science and increase our knowledge so that these policymakers aren’t just guessing. I mean, right now, they’re just guessog because they don’t have any data to support their decisions. I mean, there are no data to suggest that allowing people on the beaches in Orange County is not a healthy activity, no data at all. And we’ve seen in Washington DC one day, it’s not healthy to wear a mask, and the next day it’s mandatory to wear a mask. So they’re just guessing, they’re doing the best they can, they’re well meaning, I understand that. But I think if we provide them with more accurate, more timely information, they can make more well informed decisions about what’s safe, what’s not safe, and how to reopen the economy safely.

Tom Temin: Tom Siebel is CEO of C3.ai and founder of the digital transformation Institute As always, thanks so much.

Tom Siebel: Thank you, Tom. Nice to talk with you.

Related Stories

    DoD’s AI center helping with coronavirus, still searching for talent

    Read more

    NASA crowdsourcing initiative sparks more than 200 employee-driven ideas to fight coronavirus

    Read more

Comments

Sign up for breaking news alerts