Data Collection

Determining data to be collected

We have found that determining exactly which data will be collected aids our ability to design and implement data collection protocols. Patient data collected in the EMR is used both in research and to inform clinical care at PIH sites. As mentioned previously, it also contributes to internal and donor reports. Therefore, important clinical data, the information required for donor and government reporting, and for research analysis are included in all PIH forms.

We recommend the following tools and methods for determining which data are needed:

Meet with clinicians and other staff/researchers

Clinicians, researchers and other staff often know exactly what information they need in order to do appropriate data analysis. While is it tempting to include everything, we have found that this is simply not practical. We have extensive interaction with the staff in order to decide which data are vital to our work, which are important but not essential, and which can be omitted. This is often a lengthy process, but it is extremely helpful. From there, data collection forms are designed and customized.

While meeting with clinicians and staff, the EMR team tries to clarify who will be responsible for completing the forms. At some of our sites, non-health professionals are doing so; those forms include more detailed instructions than when people more familiar with medical terminology are assigned this task.

Use a Master Patient List

All PIH sites all start with a Master Patient List (or Master List) that includes demographic information and disease diagnosis. That Master List can then be used to generate a variety of lists that are useful in organizing clinical priorities. It can also help the procurement team predict medication needs, as it contains prescription information for each patient.

The Master List also helps clinicians organize work assignments, as it can generate sub-lists of treatments and consultations scheduled for each day. The EMR can also generate patient summary sheets, making lists of high risk patients (for example, all HIV-positive patients with a low CD4 count but are not on ARVs) or those needing special attention. These lists can help prevent losing patients to follow-up.

Use a unique patient ID number for each individual

Unique patient IDs help prevent duplicate records or misidentification. For example, in PIH Lesotho, each patient is given a unique EMR number when baseline forms are filled out. Each of the rural sub-sites has a numeric code, which is included in patients’ EMR numbers.

In our site in Malawi, on the other hand, PIH collaborates with Baobab and uses an identification number than can be used at the national level. The ID number is written on the patient's medical passport so that it will become permanently linked to that patient. In the Baobab system, the ID number is supplemented with bar coded ID cards, which have proven a low cost and easy to use solution for over 500,000 records.

Data storage


While some questions will require descriptive text, coded fields that will appear in a drop-down list are preferable to free text. Developing a form that is user friendly is essential; program staff operating in the field should be included in the process.

Data collection protocols


Once the information the EMR will store has been identified, PIH designs a system to collect this information. This system is customized to the specific technological, clinical, and cultural needs of each project site. In order to be as least disruptive to patient care as possible, we recommend that data be collected continuously and by role-appropriate staff, including care providers. See an example of how the data is collected and maintained in Zanmi Lasante, our sister organization in Haiti.

We have had success in developing this system by basing it on the various steps of a patient visit to the hospital. By doing so, we are able to establish protocols for data collection. By examining the stages of a patient's visit to the clinic (arrival and registration, clinical evaluation, treatment, discharge, etc.), we can determine the order in which data should be collected. For example, when a patient arrives, and registers for the first time, demographic information should be collected and an ID number assigned. The person performing the clinical evaluation should then record relevant clinical data.

If all relevant staff members are comfortable with the system, and are aware of its advantages, they will be more likely regard it as a tool rather than an administrative burden. See the Personnel section for more information.

Once the data has been collected, it should be entered in to the EMR database as quickly as possible, in order to ensure all patient records are up to date and accurate.

Maintaining confidentiality

While it is important to maintain patients’ names on both paper and electronic files so that they can be monitored over time, confidentiality of all records must be maintained: only staff related to the direct care of patients or who are involved with monitoring and evaluation should have access to the records. Using a code or a unique identifier can aid confidentiality, but errors can occur and identifying information such as name, address should be kept to ensure adequate follow-up.

A central data collection site

PIH has found that identifying the first and/or largest site (at a multi-site project) as the central data collection site is useful. There, data managers can convene if there are power outages or internet connection problems at the smaller sites. The central site can also be used for data clerk trainings and retreats.

Clean, ergonomic work stations for data entry clerks, a secure and organized filing area for paper records and general upkeep of all facilities will make set up and implementation of protocols far easier.