Explanation-based Failure Recovery

This data set is used for studying name disambiguation in digital library. It contains 110 author names and their disambiguation results (ground truth). Each author name corresponds to a raw file in the "raw-data" folder and an answer file (ground truth) in the "Answer" folder. - Raw file: the raw file is formatted as a XML file. In the XML file, the author name is associated with a number of publications. An example of a publication is as follow: " Explanation-based Failure Recovery 1987 Ajay Gupta AAAI 13048 0 null " where denotes the title of the publication; <year> denotes the publication year; <jconf> denotes the publication venue; <id> denotes the publication id; <label> denotes the labeled person, e.g., all publications with "<label>0</label>" can be considered as published by the same person; <organization> denotes the affiliation of the author(s). - Answer file: the answer file is the ground truth. It is actually extracted from the raw-file by viewing publications with the same "<label>0</label>" as a person. The format is in plain text. The following is an example: " #Ajay Gupta #1:13048 388794 596099 1265282 1179332 675629 39153 258611 #2:988870 1490190 #3:1393934 #4:1398544 #5:1739014 #6:1671104 515636 1678096 #7:1126381 1205032 275987 277587 276300 1549674 1034401 #8:600181 846439 149270 175996 264268 264291 299548 1384744 300057 302056 545651 1212517 #9:1316053 " where the first line denotes the author name and each of the following line indicates a disambiguate person. For example the first line indicates that an author published 8 papers. The corresponding IDs of those papers are respectively 13048, 388794, 596099, 1265282, 1179332, 675629, 39153, 258611.