Thoughts on Merging Datasets

(Note: This was written for TMG v4x and earlier although the principles also apply to TMG d v6.x)

The merging of datasets involve two basic parts. One part is the combining of two separate datasets into a single dataset. This just appends the contents of one dataset to the end of the other -- the dataset merge itself. The other part of the process is the actual melding of the contents of the combined dataset such that any duplicate persons are merged and the duplicate(s) deleted.

For example consider that a user might have two datasets -- one for themself and their ancestry, and the second for their spouse and their spouse’s ancestry. Ideally, neither dataset has anyone in it that is in other dataset except the user and the spouse. During the first part of the merge process, the user would join the two datasets by appending one dataset to the other. This would result in a single dataset with all of the persons from the two original datasets in it. This, of course, means that the user is shown twice and the spouse shown twice -- that is, two duplicate persons. During the second part of the process, these four persons will be merged into two persons and the duplicates deleted. The duplicate persons representing the user must be merged, the duplicate persons representing the spouse must be merged, and both of the duplicate persons would be deleted. Following this, there will be just one dataset with no duplicate persons.

The above is about the simplest merge process possible. The usual merging of datasets results in many more duplicate persons than the above example. However, other than strict numbers, the process is the same. This also applies to a situation of merging more than two datasets. After each dataset merge, any duplicate persons must then be merged.

Other factors may be involved in the merge depending on the authorship of the two datasets. If the same person created both datasets, there will be fewer other factors. But if different persons originated the datasets, there will likely be many minor details that the user may wish to change so that the merged dataset reflect the users own style. This may involve using abbreviation for state names or using full names for state rather than abbreviation, or some other Master Place List difference. It may involve reviewing Source Definitions and merging duplicate Sources. Duplicate Repositories is another factor. Certain custom tags may be something that could be merged or changed. There may also be other factors, but these are probably the more prominent ones.

Dataset Merge. The initial process is done as follows:

• Click on File=>Merge in the top menu.

• The currently open dataset will be shown in the Source A field. Assuming this is one of the datasets to be merged, go on to the next item, else click on the Search button, find and select the desired dataset.

• Click on the Source B field and click on the Search button, find and select the other desired dataset.

• Now select the Destination. This may be that dataset B will be appended to dataset A and physically change dataset A with no change to dataset B. Or you may choose the reverse -- appending dataset A to dataset B with no change to dataset A. Or you may wish to save both of the original datasets by using the third option in which dataset A is copied to a new dataset (C) and dataset B is then appended to it. This last is much the safest way to go, especially if you have not first done a backup of the original datasets.

• When you have selected the desired destination, click on the OK button and the two datasets will be combined into the single dataset that you have chosen as the Destination.

• You may now select to open the new dataset if it is not already open.

Person Merge. Following the above, you will want to find any duplicate persons and merge them.

• This may be easy if you know who they will be (like the example above). Open the Picklist, find one instance of a duplicate and display it in a Person View. Which one you choose will be up to you although there really isn’t a lot of difference. I usually select the instance with the lower ID#, but that is just a personal preference.

• Click on the Tools=>Merge Two People option from the top menu.

• The Merge Two People screen will display with the current person entered in the bottom or Right Person field. You may then enter the ID# of the other instance (i.e., the duplicate of the person whose ID# is shown in the botton field).

• Click on the OK button to display the Merge Two People - Split Display screen.

• At this point, I usually click on the Flip button to swap the two sides. The person on the Left will be the ID# that is retained for the merged person. Remember that the Person View when you started this process is initially shown in the right-hand side. If you do the merge, the person ID# shown in the right-hand side will (after the merge) be deleted. Thus TMG will not have anyone to display by that ID# and will revert to the first person ID# in your dataset. But if you click on the Flip button and swap the persons shown on the two sides, then the Person View ID# will be retained and will still display after the merge. This is not a big problem, but might save you a step or two.

• At this point, you will want to review the individual tags displayed on the two sides of the screen. For those that are apparently duplicate tags, you will want to reveiew each one and determine if they are exact duplicates or if one of the tags has information and/or citations that are different from that shown in the duplicate tag. To review any one tag, double-click on that tag to display the Merge Tag Preview screen for that tag. After reviewing that tag, click on the OK button and move to the duplicate tag and do the same. If the tags are true duplicates, you may double-click on the "X" column for the tag that is to be dropped (that is unmark the tag that is to be dropped). If the duplicate tag has added or different data or citations, then I would suggest waiting until the merge process is complete including any possible merging of other persons.

• You may click on the >> Combined Display << button to display the two persons in a wider format. This will show the two persons in different colors as they were shown on the split screen to allow you to know which is which.

• After you have reviewed, and unmarked/marked the desired tags, then click on the Merge button to combine the tags to be retained, drop the undesired tags, and delete the right-hand person ID#.

Editing a Merged Person.   If the merged person is not displayed, select the Person View of that person and review any remaining duplicate tags. If there are any, you may want to "merge" or combine these duplicates.

• Select one of the duplicate tags to retain if one is to be deleted -- if one is marked Primary, I usually select it. I may change the Primary marker and then select it. Review both tags and copy/paste to the tag to be retained any data and/or citations that you wish to keep.

• Check that there are not duplicate Name tags and ensure that the Subject name is the one desired as Primary.

• Check that any non-Primary parents (parent tags shown in the Tag Box) are not duplicates of the Primary parents (shown in the Father/Mother fields). If there are duplicates, review each one and copy/paste to the one to be retained and delete the other. If you decide to delete the present Primary parent, I suggest that you highlight the non-Primary parent and click on the asterisk (*) icon to swap the two and then delete the non-Primary tag. You may delete the Primary first and then mark the non-primary to Primary, but TMG will display a warning meaning an extra key click or mouse click.

• If you have duplicate children, I suggest that you go to the Person View of each child and review, edit, and delete the parent there as above. This will take care of the duplicate children on the parent’s Person View.

Final Clean-up. After all the above has been done, you are essentially finished with the dataset merge. As mentioned above, there may be other areas that you will want to review and change. This includes the Master Place List, the Master Repository List, the Master Source List, the Exhibit Log, and/or the Research Log. Note that you may have eliminated duplicate research tasks during the person merge, but you may want to edit the remaining tasks to ensure that each is as you wish it.

Be careful about merging Sources. This is particularly so with Sources for the same document where one Source Definition is more generic than the other. The Citation Detail for the Source Definition to be merged and deleted should be examined and updated so that the resulting citation is correctly presented. In some cases, you may modify the present Citation Detail and delete information that is in the "to-be-retained" Source Definition. But in most cases, you will want to add information to the Citation Detail of tags for the "to-be-deleted" Source. Then when you merge the Sources and delete the one, the resulting citations will be as you want them.

The Master Place List should be reviewed. You may then change all the duplicate place entries such that they are the same. After you have made the correcting entries, you should click on the top menu File=>Maintenance=>Optimize option to do the final clean-up of the duplications.

Return to the TMG Tips Tutorial Page

Last revised:

Hit Counter