Best listening experience is on Chrome, Firefox or Safari. Subscribe to Federal Drive’s daily audio interviews on Apple Podcasts or PodcastOne.
Want to know what’s going on at the National Transportation Safety Board? Until recently that wasn’t easy. Other federal visitors or citizens used to have to know which of three databases to query. No more. Pursuant to the Federal Data Strategy and the need for more open data, the NTSB has a new, consolidated query system named CAROL. For details, the chief data scientist in the NTSB office of research and engineering, Dr. Loren Groff joined Federal Drive with Tom Temin.
Tom Temin: So tell us about CAROL. First of all what does it stand for and what is it?
Dr. Loren Groff: CAROL stands for case analysis and reporting online query tool. It’s a comprehensive search tool that brings together previously separate systems to search the aviation and surface mode investigation data that we get from our investigations and the safety recommendation database, and then something we refer to as our accident docket system that has all of the source material that it’s collected during the investigations.
Tom Temin: It sounds like there’s a wide variety of data types that might be in there. When you say, for example, surface or air investigation data, is that notes by the investigators? Is it black box recordings and what’s in there?
Dr. Loren Groff: There is quite a bit of variety in the information that’s available. It includes basic factual information that comes from the circumstances of the event, the time, date, location, the description of the vehicles and qualification to the people involved in the event and any details of the outcome. But then within the docket system, there’s interviews, photographs, in aviation you mentioned what’s referred to as a blackbox, the cockpit voice recorder. There’s transcripts of those recorders and maybe any data that was collected off of the flight data recorder or other electronic systems on board. And the safety recommendations, that’s actually usually text information and it may include even correspondence back and forth between the NTSB and the recipient of those recommendations.
Tom Temin: And who are the primary users? I mean, can the public get to this particular information?
Dr. Loren Groff: Yes. Historically we’ve made all of our information publicly available. But in some cases, it might not have always been so easy to get to it. But we know we have a wide range of public users that may include anyone in industry or operators that are interested in tracking the safety of the the area they operate in. But we know we have a lot of people in the public that are just interested maybe for personal interest, or maybe they want to track the safety of the type of aircraft they fly or something like that. So we we have a wide range of interactions with the public. We know they use this data and we’re aware of people that have generated apps where they can take the data and use it in other ways.
Tom Temin: And is there multimedia in here? Is it all visual data like transcripts or are there recordings and videos also?
Dr. Loren Groff: Within the docket system we do have, in some cases, we have video and other formats of data that are available, yes.
Tom Temin: And in creating a new front end for people, CAROL, what did it require on the back end to integrate all of these sources?
Dr. Loren Groff: So it was quite a bit of work actually. In addition to creating the new tool, the system itself expanded quite a bit. The NTSB has a long history of collecting data and maintaining a database in aviation. But one of the things that is new that CAROL is related to is that it’s been expanded to include our investigations in all modes. So we not only had to create this new tool, but we had to incorporate all of the other modes of investigation that we do, which includes pipelines, railroads, marine highway. And so incorporating all of those, we sort of had to rebuild our system to create one that could handle all those modes. So it was quite a comprehensive project.
Tom Temin: And what was the impetus for bringing in all those new modes?
Dr. Loren Groff: Well, I mentioned that we have a long history in aviation, the NTSB and even its predecessor agency, we were early adopters. And we actually had a data system that went back to 1962. And it was a punch cards and mainframes at that time. And we could readily answer lots of questions about our investigations. But we would get questions about well what about the other modes? And that was a lot more difficult to answer those questions. And I think ultimately, we had a long standing interest in expanding that, but just to get that done is a lot of work. But ultimately, we got a question I think that came from the Hill. And it was a question about, I believe the exact question was how many times have we identified human fatigue issues across all modes of transportation? The answer was we can tell you very quickly in aviation, but it’s going to take us a little bit longer to get that for the other modes. And they said, well why is that and explained our data system. And then the next question is, well what can we do about that to improve that? And so we ultimately got, I think what we consider a friendly mandate in our 2018 reauthorization to expand the system to include all modes.
Tom Temin: And they gave you a little money to do that too?
Dr. Loren Groff: Yes, thankfully, and very appreciative of the fact that it came with some money to support that mandate.
Tom Temin: Yeah, that certainly makes for a friendly mandate. And let me ask you this, the investigators themselves often may want to have questions and look at earlier data. Is the way that it’s pooled and formatted, can that create the ability to do a broader range of data analysis, for example, so that they can maybe identify trends that were not apparent or that they couldn’t have before all of this was put together?
Dr. Loren Groff: Yes, one of the other goals of the project, in support of the federal data strategy to expand the use of data both making it available to the public and the way we use it internally, improving the data maturity of the agency, we had a goal to sort of, I’ll say, democratize access to the data where in the past it may have fallen on me or one of my analyst colleagues to do a query and provide a result, we wanted to expand the tools to allow investigators to do some of those queries and searches themselves and track issues. So they have a an internal version of this public CAROL query that has a little more access to ongoing investigations and things. And they’re able to identify and track safety issues if they may be of interest. And it sets the stage also maybe prompt them of similar investigations that may have been done in the past by the agency.
Tom Temin: And there is a federal data strategy and the Open Data Act and so on, in fact, having chief data scientists such as yourself is part of the mandate — does this system pursue that issue also having compliance and so on with the federal data strategy?
Dr. Loren Groff: Yes. And the timing couldn’t have been better for all of these things to be coming together, because there is exactly as you mentioned, within the federal data strategy, there’s milestones and targets, to, as I said, sharing data and using data internally. But there’s also even the format in which data are made available. The system had been very, very similar for many, many years and hadn’t changed, and so that one of the requirements is to make available in non proprietary, you don’t need a particular manufacturers software to open up the data file, so we are making the data files available in non proprietary formats now. So this is good timing for all these things to come together.
Tom Temin: And getting back to the front end, called CAROL, agencies often back into an acronym by starting with the acronym first. And that’s the case here, isn’t it?
Dr. Loren Groff: Yes, in fact, it is. You’re right. Carol Floyd was a long serving, more than 40 years, analyst within the same office that I worked with in the Office of Research and Engineering, and anyone who has requested any aviation data from the NTSB for the last several decades, you probably talked to Carol at some point. Just a fabulous person. And she recently retired and it was sort of an homage to her and how fondly we all think of her.
Tom Temin: Dr. Loren Groff is chief data scientist and the Office of Research and Engineering at the National Transportation Safety Board. Thanks so much for joining me.