Basics of data validation

Hello all,

In clinical data management scope of work, we already discussed data collection basics; the next most important work is to validate the collected data.

The basic requirement of accurately validating the data is to specify the validating checks correctly. The accuracy of validation checks is directly related to the understanding the protocol. Though reading the entire protocol is imperative to get the basics right, one needs to comprehend the contents of the protocol as well.

Following sections of protocol determine the data validation checks that are required,

  • Inclusion and Exclusion criteria,
  • Primary and Secondary end points,
  • Visit Schedule,
  • Dose administration/dispensing,
  • Assessments,
  • Statistical analysis sections

Once critical information is obtained from these sections, the next task is to translate these bits of critical information into meaningful, unique data validation checks. The rule of thumb is to validate all the data points collected on the CRF. The attribute checks are often built at time of designing the CRF itself.

5WsThe specifications should address the 5 W’s, Where, What, When, Why and Who. The 5 W’s of specification are equally applicable to the programmed and manual checks.

The “Where” part of specification should indicate at least the Domain, CRF module, Variables to be checked, and the variable where query should be placed.

5Ws_3The “What” part of the specification should indicate the logical relationship between the variables. This is the part where the logic of the check will be defined. A best practice is to define the logic in simple English so that the data manager and the programmer interpret it in same way and avoid downstream impact. Another best practice is to specify the test cases at the time of specifying the check logic.

The “When” part of the specification unless specified separately implies that the check will trigger once the variables considered in logic are available/filled. If the checks must be triggered only after specific event/interval this must be specified explicitly. This section may also be utilized to specify the frequency of running the checks especially in case of manual checks.
Why QueryThe most important part of the specification is the “Why” part. In this part the Query message must be specified. The query message is a way of communicating the ambiguity in the data to the clinical trial site. The query message should be clear and crisp. It should not be ambiguous. The best practice should be to indicate in the query text why this query has been triggered and what action is required while keep the query text non-leading.

Group_people_iconThe “Who” part of the specification specifies to which role the query should be triggered. In most of the cases the query is triggered to the sites to get the ambiguous data either updated or clarified. It may also contain the information of which role is responsible for performing the check.

As a best practice the programmed checks and the manual checks should be part of one file/document. The manual checks can be further grouped per the role and compiled in a separate worksheets or tables within one file/document

There are some typical sanity checks that one needs to perform on the specifications which include but are not limited to, performing spell and grammar check on the query text, checking for duplication of logic, check for contradiction, check for logic non-conforming with the CRF filling instructions etc.

Specifying the validation checks is a humongous task. However, this task can be easily simplified by defining standard checks with standard test data/cases. Even the standard CRFs and standard checks can be bundled together as a package so that the data manager can very quickly compile the bare minimum checks required to validate the data to be collected.

While every data manager desires that there are no change requests needed once the checks are released in production, on many occasions it remains a distant goal.

We will see how one can achieve this elusive goal in my next post.

Until then…. think disruptive…think right…


I am a clinical data management professional with 13 years of experience in healthcare and clinical trial data management. I am focused on bringing disruption in the area of clinical trials by conceptualising break through data management practices.

Tagged with: ,
Posted in Best Practices
3 comments on “Basics of data validation
  1. hemantcdp says:

    Nice one ..! Very precise description of considerations must be made while writing edit checks. Thanks.


  2. Hemant Gawande says:

    Nice one ..! Very precise description of considerations must be made while writing edit checks. Thanks.


  3. […] a bigger role in creating an error free new standard by following the basics of CRF Creation and Data validation. This will ensure no […]


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Enter your email address to follow this blog and receive notifications of new posts by email.

%d bloggers like this: