Self-citation and contribution indices are needed in academia but it is hard to implement
So far, Google Scholar is one of the “good” source of advertising our scientific impact and identifying scientists who are contributors in scientific community. Google provides 3 different indices for scoring your contribution: total citations, h-index and i10 index. As argued before in Nature, they are limited indices to understand real impact. Self-citation is one of the problem with these indices. It is not a totally bad thing since you need to cite your work for prior work or literature. However, it can be exploited for pumping personal citation scores. (This has been an issue for journals too, cross-citing low-quality journals to increase impact factors). My comment on understanding contribution of scientist is including self-citation and contribution indices into the equation. For sure, these are hard to implement, but they can depict better representation of personal contributions to the community (self-citation score) and to the paper (contribution score). I have seen similar proposals in the literature for contribution index and self-citation index [1][2] . For starters, I have a simple proposition. Let’s look at the following example: Adam’s Google Scholar score is as followings:
Citations: 45 h-index: 3 (From highest to smallest number in citations, 3th has 3 citations)i10-index: 1 (publications with over 10 citations)
These will tell you that Adam has in total 55 citations for his publications as of 2018 (including self-citations), and his scores have been growing in quantitative impact of publications (h and i10 indices).
Self-citation score. My proposition is to provide an adjusted total citation score by weighing self-citations. Let’s say 6 papers of Adam were self-cited 12 times in total. Here, what we need it categorizing the self-citations. For me, the major categories are follow-up papers (which need to cite prior work) and literature references (citing for literature source) and Unnecessary/Excessive citations. Weights are subjectively granted:
Follow-up researches: Adam has 8 out of 12 citations in follow-up papers. Since it is necessary, the weight could be high (w = 0.8). To identify connected papers as to understand follow-up papers, one suggestion is to use IRB or grant ID as key identifiers.
Literature references: Adam has 3 out of 12 citations were cited as a reference to relevant literature he has been presented in prior publications (w= 0.6). To identify this connection, MeSH keyword matching could be used. To be more precise, the sentence or paragraph used with citation could be parsed and analyzed with matching words in the source publication (NLP methods are required). Unnecessary/Excessive citations: Adam has 1 unnecessary citation, which means he cited one of his work for the sake of extra citation score, which was not essentially necessary (w = 0.01). This is the hardest one to identify, that may require expert review of both publications. Adjusted total citations = ((55-12) *1) + (8*0.8) + (3*0.6) + (1*0.01) = 51.21
Contribution score – This score will adjust personal contribution in each paper. This has been implemented verbally in many journals as outlining in a section, yet it is not quantified and scored. The following is a simple adjustment with Adam being first author in each publication.
First authorship Total publications Contribution score (=F/T)
In indexed journals 1 3 0.33
In proceedings 8 9 0.88
In book chapters 4 4 1
In books 1 2 0.5
This is very simplest form. So, in an advanced scoring, the weights of publication types (e.g. journal, proceedings) could be assigned, Scientific domains (e.g. healthcare, informatics, management) could be weighted based on volume and density of publication numbers and journals. Adjustments based on total authors for each paper could be necessary (e.g. differentiating being first author in over 2-authors paper to 10 -authors paper). Finally, other authorship ranks (e.g. second, third) could be weighted.
Thanks for reading.