Friday, March 30, 2007

The Missing Link in Clinical Data Standards

The term “clinical data standards” means different things to different people. In the world of clinical trials, it has traditionally meant having case report forms and a database structure that are reusable from study to study. At the DIA CDM Annual Meeting last week in Orlando, Dr. Steve Wilson observed that, from the FDA’s perspective, the content, format and uses for data in regulatory submissions have evolved over the years. Initially, the push was to make the transition from paper to electronic submissions. With that, the need for structural standards became apparent, and hence the development of CDISC. We now recognize that the format of the data needs to be standardized, and CDISC is developing standard data collection modules (CDASH) and controlled terminology (e.g., code lists) to address this need. The FDA hopes to reap the benefit of this work in Janus, a data repository that will eventually house all submission data. This will allow them to monitor drug safety much more proactively.

These are all excellent developments, and bring us closer to an environment where standard data viewing and analysis tools are feasible and where databases and programs can be reused. I would argue, however, that there is one more step we need to take in order to be truly standardized. We are doing much to define the structure of data, but have very little in place to define its content. By this I don’t mean the terminology used to categorize data, but rather the processes and assumptions inherent in generating and collecting the data.

For example, suppose you want to analyze the emergence of adverse events in a particular drug class. You access the merged safety database, assign each AE to a time interval based on its start date, and then count the AEs and compare the intervals. It seems straightforward, until you realize that some studies started collected AEs when the informed consent was signed, some started at first dose, and some started collecting serious AEs when the IC was signed and all AEs at first dose. From the point of view of structure and terminology, the data are standard, but they are not suitable for this analysis, and while the collection starting point would be defined in the protocol, that information would rarely accompany the data.

This lack of definition about the processes and assumptions is, I believe, the greatest threat to data quality. In order for data to be truly comparable we must know not just its electronic characteristics but also the processes and assumptions made in generating it. Only then can we know when the data we have is fit to answer the questions we want to ask.