Data

The CWTS Leiden Ranking 2023 is based on bibliographic data from the Web of Science database produced by Clarivate. Below we discuss the Web of Science data that is used in the Leiden Ranking. We also discuss the enrichments made to this data by CWTS.

Web of Science

Web of Science

The Web of Science database consists of a number of citation indices. The Leiden Ranking uses data from the Science Citation Index Expanded, the Social Sciences Citation Index, and the Arts & Humanities Citation Index. The Leiden Ranking is based on Web of Science data because Web of Science offers a good coverage of the international scientific literature and generally provides high quality data.

The Leiden Ranking does not take into account conference proceedings publications and book publications. This is an important limitation in certain research fields, especially in computer science, engineering, and the social sciences and humanities.

Enriched data

CWTS enriches Web of Science data in a number of ways. First of all, CWTS performs its own citation matching (i.e., matching of cited references to the publications they refer to). Furthermore, in order to calculate the various indicators included in the Leiden Ranking, CWTS identifies publications by industrial organizations in Web of Science, CWTS performs geocoding of the addresses listed in publications, CWTS assigns open access labels (gold, hybrid, bronze, green) to publications, and CWTS disambiguates authors and attempts to determine their gender. Most importantly, CWTS puts a lot of effort in assigning publications to universities in a consistent and accurate way. This is by no means a trivial issue. Universities may be referred to using many different name variants, and the definition and delimitation of universities is not obvious at all. The methodology employed in the Leiden Ranking to assign publications to universities is discussed here.

More information

More information on the citation matching that is performed by CWTS is provided in a paper by Olensky, Schmidt, and Van Eck (2016). For more information on the geocoding of addresses, we refer to a paper by Waltman, Tijssen, and Van Eck (2011). The author disambiguation algorithm used by CWTS is documented in a paper by Caron and Van Eck (2014).

  • Caron E., & Van Eck, N.J. (2014). Large scale author name disambiguation using rule-based scoring and clustering. In E. Noyons, editor, Proceedings of the 19th International Conference on Science and Technology Indicators (pp. 79-86).
  • Olensky, M., Schmidt, M., & Van Eck, N.J. (2016). Evaluation of the citation matching algorithms of CWTS and iFQ in comparison to Web of Science. Journal of the Association for Information Science and Technology, 67(10), 2550–2564. (paper, preprint)
  • Waltman, L., Tijssen, R.J.W., & Van Eck, N.J. (2011). Globalisation of science in kilometres. Journal of Informetrics, 5(4), 574–582. (paper, preprint)