Understanding Algorithms

Regardless of where or how patient information is shared, accurately linking patient records from multiple databases is critical for effective clinical decision-making—and for avoiding system-clogging duplicates and overlays.

 It’s a daunting task. One patient can have multiple identifiers within a single organization, especially when there are multiple locations offering different types of services. And when those identifiers flow into an HIE or ACO, any data integrity issues are greatly exacerbated.

To prevent a dirty data domino effect, hospitals and information exchange initiatives must put in place strong algorithms that can identify with pinpoint accuracy which records belong to which patient so they can be linked into a single record with a single unique identifier for use across the initiative. The challenge is that these algorithms are not created equal. Performance depends largely upon sophistication.

1. Basic Algorithms: The simplest technique for matching records, basic algorithms make comparisons based on selected data elements, typically name, birth date, Social Security Number (SSN) and gender. They typically utilize exact match or deterministic matching tools, the latter of which are slightly more sophisticated in that partial matches or matches from phonetic encoding systems may also be used. Basic algorithms also deploy wild-card linking techniques, which return every record that matches a limited number of characters entered into a search string as well as any other data element specified to refine the search.

2. Intermediate Algorithms: Intermediate algorithms incorporate “fuzzy logic” and arbitrary or subjective scoring systems with exact match and deterministic tools. A field match weight is arbitrarily assigned to specific identification attributes and records must reach a minimum scoring threshold to qualify for consideration. Fuzzy logic utilizes nickname tables and rules to address transposed names, characters or digits and other typographical errors within the database. Intermediate algorithms may also include an automated frequency adjustment, which decreases the field match score across two records if the actual field value (i.e. a common last name or birth date) is present in a significant number of records.

3. Advanced Algorithms: The most sophisticated set of record-matching tools, advanced algorithms rely on mathematical theory (bipartite graph theory, probabilistic theory and mathematical and statistical models) to determine the likelihood of a match. They also include machine learning and neural networks, which use forms of artificial intelligence that simulate human problem solving. These systems “learn” as more data is processed and automatically redefine field weights based upon that learning.

It is important to understand that even advanced algorithms cannot completely eliminate false alarms and false negatives. In particular, when algorithms are solely relied upon to make record-matching decisions, “auto-linking” routines can create errors such as linking two closely related people with similar names and birth dates who live near each other. Another common error is linking two individuals with the same name and birth date who share an address, such as can happen in large apartment complexes or other multi-family residential buildings.

As such, regardless of the algorithms in place, results must always be verified using well-established record-matching validity procedures. Skipping this critical step could result in overlaid records, potentially violating privacy laws and, more significantly, impacting care coordination, quality and safety.