In our last post, we shared four reasons why educators should be measuring implementation: here we’ll look at four common challenges to strong implementation measurement.
1. Differential definitions. What happens when different units of your program operate with different working definitions of a measure?
Take tutoring, for example, in a multi-site program, where each site is asked to report the number of hours per week a participant is tutored. Site A takes attendance and acknowledges that, although the after school program runs for 1.5 hours, only .5 hours are spent tutoring. So Site A reports the number of days a student attends, multiplied by .5: e.g., if Jose attends for 3 days, Site A reports 1.5 hours of tutoring. Site B calculates 1.5 hours of tutoring per day times 5 days per week, per participant: So if Jose is a participant that week, regardless of how often he attends, Site B reports 7.5 hours of tutoring.
Depending on the complexity of the measure and the level of precision needed for your program, increasing implementation measurement consistency may involve providing contextual definitions in your online or paper data collection forms; issuing written guidance for all data collection procedures; conducting online or face-to-face trainings that include data collection scenarios; and establishing standard review procedures that flag potential outlier data in order to request appropriate clarification.
2. Aggregate, rather than individual, data. Sometimes the “quick” method of collecting and reporting aggregate data can take more time and introduce more error than collecting individual data.
Let’s say your chosen implementation measure is “percent of students receiving at least 80 hours of instruction,” and each local site is asked to report on the percent of their students meeting the requirement. Sites report in the aggregate, often without the total number of students served. To get a statewide or program-wide figure, you’ll have to combine these aggregate data. If you don’t know how many students each site served, you’ll be introducing error by simply averaging the results.
Unless sites are estimating this indicator, they should have the individual student level data that forms the basis for reported percentages. Work toward getting that, as you may eventually want to examine other related questions, such as whether students below the target service level performed differently than students above the target. If you don’t have individual level data, you won’t be able to do this.
3. Measurement error. Some degree of data imperfection is normal, but you can avoid major problems by attending to common sources of error, including:
a) Time Lag. The longer you wait to record data, the more prone it is to error. Try estimating how you spent your time three days ago in 30 minute increments. Now try doing that for this morning. Your recollection of this morning’s time will be much more accurate. The same applies to program data. If you are recording attendance later in the week (i.e., not during the event), your data will be less accurate. If you record what you did with a student during an after school program or while tutoring a full week after it happened, that recording will be less accurate than if done closer to the event. That applies to all of us. Many years ago I recreated three months of time and effort logs from handwritten notes, archived emails, date-stamps on files, and a mishmash of other digital records. I’d guess those logs to be +/- 30% accuracy — not that great. But lesson learned: I now record time and effort daily, in real time.
b) Estimates. Estimates can strike an appropriate balance between a need for information and the time and expense of obtaining it. For example, we may want to know approximately how much time elementary school teachers spent on an engineering design process in class. We could observe each classroom and record time spent, ask teachers to log their time spent on each activity weekly, or survey teachers toward the end of the process and ask them to estimate how much time they spent. But doing any two of these demonstrates how significantly their results can differ. So we recommend keeping requests for information as specific as possible and focusing on concrete actions to reduce estimate errors.
c) Data entry error. We assume records used for analysis will be digital, and we strongly advocate that the persons who performed a service or obtained the primary data be responsible for recording it digitally. Why? Because those are the individuals most likely to recognize mistakes as data get entered. Additionally, validation rules can force appropriate values, and there is less lag time between service provision and data entry. Entering data on paper, then having another person record it on a spreadsheet or into a digital system later on introduces multiple opportunities for error.
4. Keeping it simple. All data collection should balance the time and effort it takes to collect it against the value of having it.
While we do recommend that our clients try to keep it simple, that won’t always mean keeping it “light.” For the Kentucky Migrant Education Program, keeping it simple meant adding fields to their existing online service data collection that allowed number of hours to be input for each student served. Before this, each site had been asked to report aggregate numbers through a separate process that was simple but separated from the regular data collection, providing program data of limited real use.
For a one-to-one computing project we evaluated, keeping it simple meant extracting as much data as possible from existing digital curriculum program logs rather than surveying or observing teachers about how they used the programs. Although this involved considerable work on our part, the logs recorded usage data as it happened, from all users. This eliminated three things: the need for teacher time to respond to surveys, error introduced by survey sample bias, and error introduced by user estimates.
While not an exhaustive list, the above four challenges illustrate the need to gather implementation data with focus and care so that it will provide useful information to your program. Our next post will offer additional advice on types and sources of meaningful implementation measures.