Deduping Categories

Overview

The deduping category (or “dupe group category”) value that the system assigns to any potential duplicate households, people or organizations is helpful information both in identifying the likelihood of an actual duplicate record that should be merged, and the details of how Veracross identified them as likely duplicates. By and large, Veracross tends toward the side of caution, rather than merge, if there is uncertainty.

The leading number (01, 03, etc.) roughly estimates the likelihood of two records being duplicates; the lower the number, the higher odds that they are duplicates to be merged in many cases. (This guideline should not be blindly followed as there are plenty of exceptions as described below). The other characters (“EN,” “PA,” etc.) explain what data on the records was identified as a potential indicator of the records being actual duplicates. This article explains all the details of the dupe group category values.

Read more about duplicate record management.

Household Dupe_Group

00 = Highest probability of a duplicate

  • 00.USER. = USER SPECIFIED surviving_record present
  • 01a.EN.EA.EP. = Household Identical Name + Identical Address + Same Phone
  • 01b.HN.EA.EP. = Hyphenated Name that one half matches + Identical Address + Same Phone
  • 01c.EA.WF1. = Identical Address + WF1 input date/legacy ID
  • 01d.EA.WF1. = Partial Address + WF1 input date/legacy ID
  • 02a.EN.EA. = Identical Name + Identical Address + at least one phone is blank
  • 02b.EN.EA. = Identical Name + Identical Address
  • 02c.HN.EA. = Household Hyphenated Name + Identical Address
  • 02d.SN.EA.EP. = Same Name + Identical Address + Same Phone
  • 02e.SN.EA. = Same Name + Identical Address
  • 03a.EN.PA.EP. = Identical Name + Partial Address + Same Phone
  • 03b.EN.PA. = Identical Name + Partial Address
  • 04a.EN.EP. = Identical Name + Same Phone
  • 04b.EM. = Identical Email (primary Household email)
  • 04c.EN.SE. = Identical Name, Similar Email (for Household members – hoh.fk, spouse.fk)
  • 05a.EN.2DM. = Identical Name + At least two duplicate members
  • 05b.2DM. = At least two duplicate members
  • 05c.1DM. = One duplicate member
  • 06a.EN.EC.P. = Identical Name, Identical City, Same Phone
  • 06b.EN.P. = Identical Name, Same Phone
  • 07a.EA.EP. = Identical Address, Same Phone, Same Input Date. Commonly occurs when uploading a spreadsheet.
  • 08a.EA.EP. = Identical Address, Same Phone
  • 09a.EA. = Identical Address — not in use
  • 09b.EN.EPC. = Identical Address + Identical Postal Code
  • 10a.EP. = Identical Phone
  • 10b.EP. = Identical Phone “no area code”

Nightly Process Automated Merging

The records in the following groups are merged if the “auto merge” system parameter value is set to the default value of 2. If the system parameter value is set to 1, only the 00.USER dupe group value will be merged.

  • Groups 00, 01a, 01b, 01c, 01d, and 02a are merged
  • Group 02b is merged if the household does not contain Alums or Grandparents (to avoid merging Alums with their Parents, or Grandparents and Parents)
  • Group 02c, 02d, and 02e are merged if input_user = ‘vc.web’ and input dates are the same.
  • Note: No households containing grandparents are merged.

Exceptions:

  • Allow Alums/Former Students to have households whose address/phone matches their parents
  • Allow Grandparents to have households whose address/phone matches their grandchildren
  • Allow Grandparents to have households whose address/phone matches their children
  • Remove dupe pairs where neither record has been touched in the last 45 days
  • Remove records whose matches are MIA

Person Dupe_Group

00 = Highest probability of a duplicate

  • 00.USER. = USER SPECIFIED surviving_record present
  • –00a.SSN.H. = Same SSN + Same Household (deprecated – there is no ability to store SSN Veracross)
  • –00b.SSN. = Same SSN (deprecated – there is no ability to store SSN Veracross)
  • 01a.H.EL.EF. = Identical Household + Last + First; withdrawn dates are identical; birth dates are identical
  • 01b.H.EL.EF. = Identical Household + Last + First “comparing nick name and first name”; withdrawn dates are identical; birth dates are identical
  • 01c.H.EL.TF. = Identical Household + Last + Thesaurus First
  • 01d.H.EL.SF. = Identical Household + Last + Similar sounding First
  • 01e.H.EL.SF. = Identical Household, Last + Similar First
  • 01f.H.SL.EF. = Identical Household + Similar Last + Identical First
  • 01g.H.SL.SF. = Identical Household + Similar Last + Similar First
  • 01i.H.EL.EF.  = Same as 01a and 01b (see above), but withdrawn date values are different, and one of the duplicates has the role Student or Future Student. These are obvious duplicates, but someone should update withdrawn date(s) on the dupe records to be the same before merging. (A Student or Future Student normally shouldn’t have a withdrawn date, so that is avoided by stopping an auto-merge for people with those roles if withdrawn date values are suspect.)
  • 01j.H.EL.EF. = Same as 01a and 01b (see above), but birth date values are different. This prevents people from being auto-merged if the birth date is populated on both records and is also different on both records. This helps prevent instances where parents and children have the same name.
  • 02a.EL.EF.EA. = Identical Last + First, Address
  • 02c.EL.TF.EA. = Identical Last + Thesaurus First + Address
  • 02d.EL.SF.EA. = Identical Last + Similar sounding First + Identical Address
  • 03a.EL.EF.PA. = Identical Last + Identical First + Partial Address
  • 03c.EL.TF.PA. = Identical Last + Thesaurus First + Partial Address
  • 03d.EL.SF.PA. = Identical Last, Similar sounding First + Similar Partial Address
  • 04.EL.EF.EMAIL. = Identical Last + Identical First + Identical Email
  • 05a.EL.EF.B. = Identical Last + Identical First + Birthday
  • 05b.EL.TF.B. = Identical Last + Thesaurus First + Birthday — not in use
  • 06a.EL.EF.P. = Identical Last, Identical First + Same Phone
  • 06b.EL.TF.P. = Exact Last + Thesaurus First + Same Phone
  • 06c.EL.SF.P. = Identical Last + Similar sounding First + Same Phone
  • 06d.SL.SF.P. = Similar sounding Last + Similar sounding First + Same Phone
  • 07.EL.TF.CS. = Identical Last + Thesaurus First + City + State — not in use
  • 08a.SME-SPSE. = Same Spouse, Same first name, last name, household “including ex-husband, ex-wife”
  • 08b.SME-SPSE. = Same Spouse, Same household
  • 08c.SME-SPSE. = Same Spouse
  • 09a.SME-CHLD. = Same Child, Same Household, Same First Name, Last Name
  • 09b.SME-CHLD. = Same Child, Same First Name, Same Last Name
  • 09c.SME-CHLD. = Same Child, Same Household, Same Last Name (excludes people with a domestic partnership relaionship to each other)
  • 09d.SME-CHLD. = Same Child, Same Last Name (excludes people with a domestic partnership relaionship to each other)
  • 10a.FATHER.EF.EL. = Same Parent + Identical First + Identical Last
  • 10a.FATHER.EFN.EL. = Same Parent + Identical First + Identical Last + “nick name”
  • 10a.MOTHER.EF.EL. = Same Parent + Identical First + Identical Last
  • 10a.MOTHER.EFN.EL. = Same Parent + Identical First + Identical Last + “nick name”
  • 10b.SM-REL.EF.EL. = Same Relationship + Identical First + Identical Last
  • 10c.SM-REL.EFN.EL. = Same Relationship + Identical First + Identical Last + “nick name”
  • 11a.EL.EF.GY. = Identical Last + Identical First + Alum role + matching graduation year
  • 11b.EL.EF.GY. = Identical Last + Identical First + Former Student role + matching graduation year
  • 11c.ELFGY. = Identical Last + Identical First + Stud/Frm-Stud/Alum roles + Graduation Year
  • 12a.ELFMdn. = Identical Last + Identical First + Identical Maiden
  • 12b.ELFSps. = Identical Last + Identical First + Identical Spouse
  • 12c.ELFMIR. = Identical Last + Identical First + Identical Middle Initial + Matching Roles
  • 12d.ELFMI. = Identical Last + Middle Initial + Identical First
  • 13.EMAIL. = Identical Email
  • 14a.EL.EF.MR. = Identical Last + Identical First + matching roles
  • 14b.EL.EF.1R. = Identical Last + Identical First (with one record having key roles)
  • 14c.EL.EF.NHH. = Identical Last + Identical First (with at least one household_fk = 0)
  • 14d.EL.EF.CS. = Identical Last + Identical First + City + State
  • 14e.EL.EF. = Identical Last + Identical First
  • 15.H.EL.NF. = Identical Household + Last + Missing First — not in use

Thesaurus vs. Similar vs. Similar-sounding

For Person Dupe_Group, the distinction between “Thesaurus”, “Similar”, and “Similar-sounding” is as follows:

  • Thesaurus is a common alternate name.
    Example: David and Dave
  • Similar are names that are spelled similarly.
    Example: David and Davik
  • Similar-sounding are names that sound similar.
    Example: David and Dhaphed

Additional Requirements for Merging

As Veracross tends to err on the side of caution, in addition to the criteria outlined in the Person Dupe_Groups above, the following must also be true in almost all circumstances:

  • both people must have the same Person Roles
  • both people must have the same birth date

Nightly Process Automated Merging

The records in the following groups are merged if the “auto merge” system parameter value is set to the default value of 2. If the system parameter value is set to 1, only the 00.USER dupe group value will be merged.

  • Groups 00 and 01a are merged.
  • Groups 04, 06a, and 06b will be merged for people with the following roles:
    • Emergency Contact
    • Doctor/Dentist
    • Medical Provider
    • Organization Contact
    • Admissions Ref
    • Other

Exceptions:

  • Break any matches between children and non-children
  • Break any matches when gender does not match
  • Break any matches when graduation year is off by more than one
  • A valid relationship = no dupe
  • Remove dupe pairs where neither record has been touched in the last 45 days

Organization Dupe_Group

00 = Highest probability of a duplicate

  • 00.USER. = USER SPECIFIED surviving_record present
  • 01a.EN+EA.P. = Identical Name + Address + Phone
  • 01b.EN+EA. = Identical Name + Address (at least one blank phone)
  • 01d.EN+EA.EC. = Identical Name + Address + City
  • 01e.EN.P. = Identical Name + Phone (no address data)
  • 01f.EN.P. =  Identical Name + Phone + Address
  • 02a.DM. = Identical Name, multiple records in Person_Organization
  • 02b.DM. = Similar Name, multiple records in Person_Organization — not in use
  • 02c.DM. = Similar Name, multiple records in Person_Organization
  • 03a.EN+EZ. = Identical Name + Zip
  • 03b.EN.EC. = Identical Name + City, no address
  • 04a.EN. = Identical Name, no address info
  • 04b.EN. = Identical Name, one record with Address, one record without Address
  • 04c.EN. = Identical Name
  • 04d.EN. = Identical Name (one with a leading “the”)
  • 05.SN+EA. = Similar Name + Identical Address
  • 06.SN+PA. = Similar Name + first 8 of address
  • 07.PN. = Same first n characters (of Organization’s Name)
  • 08.ACR. = Names and their Acronyms — not in use
  • 09.SN. = Similar Name

Nightly Process Automated Merging

The records in the following groups are merged if the “auto merge” system parameter value is set to the default value of 2. If the system parameter value is set to 1, only the 00.USER dupe group value will be merged.

  •  Groups 00, 01a, 01b, 01c, 01d, 01e, 02a, 03a, 03b, 04a, 04b, 04c are merged

Exceptions:

  • Break matches between middle & high schools
  • Break matches in different countries
  • Remove dupe pairs where neither record has been touched in the last 3 months
  • Remove records whose matches are MIA