Skip to main content
Close menu William & Mary

Management and Retention of Data

Summary

Researchers may refer to this guidance when determining how to handle the collected research data from human subjects, and what steps need to be taken after data collection is complete. Updated: 07.03.2025

Full Description

Definitions

  • Anonymous Data: Data that was collected without ANY identifiers and were never connected to the participant. Data can only be considered anonymous if even the investigator can’t identify the participants. Collecting indirect (e.g. race, sex, age, geographic location, etc.) and direct (e.g. IP address, student ID number, name, email, etc.) identifiers together as part of the study could lead to the participants being identified. If there is any probability the participant can be identified, then the data cannot be classified as “anonymous.”
  • Coded Data: Identifiable data has been assigned a code, and any identifying information is stored separately from the code. A “Master Enrollment Log” can be maintained if there is a need for a link between the identifiable data and code. Otherwise, coded data can become “indirectly identifiable” if the “link” is destroyed.
  • De-identified Data: A data record in which all identifying information has been removed and the participant cannot be identified from the data. Coded data is not “de-identified” until the information/file that links the code and identifiers have been removed.
  • HIPPA Privacy Rule: Specified 18 identifiers which mostly relate to the participant’s demographics. Inclusion of even one of the identifiers makes a data set identifiable.
  • Identifiable Data: The identity of the subject is or can be readily ascertained by the investigator or associated information. Examples include age, zip code, phone number, gender, date of birth. Age, ethnicity/race, gender may be identifiers under the Common Rule if fewer than 5 individuals possess a particular cluster of traits.
  • Protected Health Information (PHI): As defined in OHRP Guidance, PHI is any “’individually identifiable health information’ including demographic information that relates to:
    • the individual’s past, present, or future physical or mental health or condition,
    • the provision of health care to the individual, or
    • the past, present, or future payment for the provision of health care to the individual, and that identifies the individual or for which there is a reasonable basis to believe can be used to identify the individual. Protected health information includes many common identifiers (e.g., name, address, birth date, Social Security Number) when they can be associated with the health information listed above.

Management of Data

Principal Investigators (PIs) are responsible for the maintenance of data and ensuring that data is stored on a secure, William & Mary approved platform. If PIs want to use a server that is NOT approved by W&M IT Security, then approvals must be sought via Procurement. Questions related to data management should be directed to W&M IT Security (Policy).

W&M Libraries is also available to assist you with managing your data.
Please visit their website to learn more information.

When entering protocols, PIs should provide specifics about:

  1. Who will have access to the identifiable, coded, and/or de-identified data
  2. How the data will be collected (e.g. surveys, interviews, etc.)
  3. Where the data will be stored (e.g. OneDrive, Box, etc.)
  4. How the results of the will be shared/disseminated and whether data will be maintained for future research
  5. How long the data will be stored after the completion of the research. OHRP requires that records must be retained in some form for a minimum of 3 years. VA law requires data to be retained for a minimum of 5 years. (See Library of Virginia, p. 15)
  6. If the data is maintained for future research, what type of data will be accessible and how will researchers access the data.
  7. If sharing data outside of W&M, please contact IT Security as identifiable data should be transmitted via a secure service, such as Office365, Box, a secure website, or by using secure protocols. 
  8. If collecting audio, video, or photographic recordings, you must:
    1. Explicitly state what you will collect and justify the use of the recording (versus taking field notes.)
    2. State what will be used to make the recordings and who owns the device
    3. Provide in the protocol and consent form what will be recorded and allow for participants to opt out of the recording. For example, asking them to initial next to a statement that asks whether they agree or not to being recorded.
      1. Participants should be made aware what data will appear in final publications
    4. Explain how they will be maintained
    5. When/if they will be destroyed
    6. How they are coded to protect the participant’s identity
    7. Audio/Video Transcriptions: Transcriptions should be made when possible and the video/audio should be deleted once the transcription has been verified and/or member checked
      1. If you will not be transcribing the data, a strong justification must be provided to why you must retain the recordings and what measures are in place to protect the recordings.

The Research Compliance team and IRB will review submitted protocols and provide comments if any details are missing.

Data Collection 

The IRB strongly recommends using W&M-owned equipment and licensed services to collect and store research data. Data should be held on personal devices only for the time necessary to be promptly moved to a secure university managed location. It is strongly recommended to use Qualtrics for collecting online survey data and to store research data on W&M secure drives or authorized cloud services like a W&M office 365 (OneDrive/SharePoint) or Box.   University devices must be used when research involves collection or storage of photographic images or voice recordings of research participants, and data protected under HIPAA and FERPA.  

Wearable Devices and other data-collecting technology  

When using wearable devices, such as an activity trackers, a smartwatch, voice recording devices, location trackers, or other technology to collect research data, information must be included in the informed consent form that states participants will be required to download and agree to terms of service or other agreements applicable to the app if the participant is using their own device and not one provided to them by the researchers.  If an app meets the regulatory definition of a mobile medical application as defined by the FDA, additional regulatory determinations may need to be made depending on its intended use. 

NIH Guidance and Sample Language

Informed Consent 

The signatures of informed consent can be signed in-person or signed via DocuSign if administering the consent in an electronic platform. If the consent process takes place remotely and is not personally witnessed by study personnel, it may be required to have a method to ensure that the person electronically signing the informed consent is the subject who will be participating in the research or the subject’s legally authorized representative (LAR) (see OHRP FAQs). However, exempt and minimal risk social behavioral research may not require a verification process. FDA-regulated clinical studies must comply with 21 CFR part 11 

Additionally, anonymous internet-based surveys or studies with a waiver of documentation of consent, are required to include a “I agree” or “I do not agree” check boxes or consent form for participants to indicate their active choice of whether or not they consent to participate.  

Please be sure to use the most updated forms found on the IRB’s Forms & Templates page. These forms are periodically updated and include other applicable required statements. 

All studies processed in SPARCS must be closed or renewed. PIs can no longer let their protocols “expire.”

Data Analysis Phase 

When the data collection has been completed, and PIs are ready to begin the “data analysis,” the study will need to be either annually renewed or closed depending on the type of data being analyzed.  

The study is “closed” once active data collection has ended.  

Data must be destroyed according to Virginia State Policy and institutional policy.  

As a reminder, federal regulations require HSR records be retained for at least 3 years after completion of the research. Virginia law (p. 15) require that data are kept for at least 5 years, and HIPAA-protected sponsored programs and research projects must be kept on file for 6 years.  

Researchers must keep the collected data after the "last action"  for at least 5 years and possibly longer, depending on the longest applicable standard. The term "last action" can be interpreted to mean the last time the researcher access/use the materials that were created from their research.

Additionally, Virginia law requires the final scientific or research report of result be kept on file permanently 

During the retention period, data, signed consent forms and other documentation related to human subjects must be stored in the manner described in the IRB-approved protocol. Access must be limited to those identified in the approved protocol as having access to study data. 

  1. De-identified or anonymous data: If the analysis is only using data that cannot in any manner reveal the identity or be linked the participant, then your study may be submitted for “Closure.” De-identified data may be retained indefinitely.
  2. Coded data: With coded data, your study may need to be renewed or closed depending on if the “link” has been destroyed.
    1. Closure Necessary: If you have destroyed the link (e.g. “Master Enrollment Log”) between the identifiable data and code, then the study can be submitted for “Closure.”
    2. Renewal Necessary: If you are retaining the link between the data during any part of the data analysis phase, you will need to keep the study open by submitting a “Renewal” in SPARCS.
  3. Identifiable Data: If you are using identifiable data during your data analysis, you will need to keep the study open by submitting a “Renewal” in SPARCS. Identifiable data includes videos or audio recordings.

If you have any questions, please reach out to the Research Compliance office.