Data collection

The CWTS Leiden Ranking 2015 ranks the 750 universities worldwide with the largest publication output in international scientific journals in the period 2010–2013. The ranking is based on data from the Web of Science database. A sophisticated data collection methodology is employed to assign publications to universities.

Web of Science

Web of Science

The Leiden Ranking is based exclusively on bibliographic data from the Web of Science database produced by Thomson Reuters. The ranking uses data from the Science Citation Index Expanded, the Social Sciences Citation Index, and the Arts & Humanities Citation Index. The Leiden Ranking is based on Web of Science data because Web of Science offers a good coverage of the international scientific literature and generally provides high quality data.

The Leiden Ranking does not take into account conference proceedings publications and book publications. This is an important limitation in certain research fields, especially in computer science, engineering, and the social sciences and humanities.

Share this page


Enriched data

CWTS enriches Web of Science data in a number of ways. First of all, CWTS performs its own citation matching (i.e., matching of cited references to the publications they refer to). Furthermore, in order to calculate the more advanced collaboration indicators included in the Leiden Ranking, CWTS performs geocoding of the addresses listed in publications in Web of Science and CWTS identifies addresses belonging to the business sector. Most importantly, CWTS puts a lot of effort in assigning publications to universities in a consistent and accurate way. This is by no means a trivial issue. Universities may be referred to using many different name variants, and the definition and delimitation of universities is not always obvious. The methodology employed in the Leiden Ranking to assign publications to universities is discussed in detail below.

Identification of universities

The criteria that define universities are not internationally set, thus presenting a challenge in identifying them. Typically, a university is characterized by a combination of education and research tasks in conjunction with a doctorate-granting authority. However, these characteristics do not mean that the universities are particularly homogeneous entities that allow for international comparison on every aspect. As a result of its focus on scientific research, the Leiden Ranking presents a list of institutions that have a high degree of research intensity in common. Nevertheless, the ranking scores for each institution should be evaluated in the context of its particular mission and responsibilities which are strongly linked to national and regional academic systems. Academic systems - and the role of universities therein - differ substantially from one another and are constantly changing. Inevitably, the outcomes of the Leiden Ranking reflect these differences and changes.

The international variety in the organization of academic systems also poses difficulties in terms of identifying the proper unit of analysis. In many countries, there are collegiate universities, university systems, or federal universities. Instead of applying formal criteria, when possible we followed common practice based on the way these institutions are perceived locally. Consequently, we treated the University of Cambridge and the University of Oxford as entities but in the case of the University of London, we distinguished between the constituent colleges. For the United States, university systems (e.g. the University of California) were split up into separate universities. The higher education sector in France, like in many other countries, has gone through several reorganizations in recent years. Many French institutions of higher education have been grouped together in Pôles de Recherche et d'Enseignement Supérieur (PRES)–and the more recent Communautés d'Universités et Etablissements (COMUEs) –or in consortia. In most cases, the Leiden Ranking still distinguishes between the different constituent institutions but in particular cases of very tight integration, consortia were treated as if they were a single university (e.g. Grenoble INP).

Publications are assigned to universities based on their most recent configuration. Changes in the organizational structures of universities up to 2014 have been taken into account. For example, in the Leiden Ranking 2015, the University of Bordeaux encompasses all publications previously assigned to the University of Bordeaux I, University of Bordeaux Segalen II, and Montesquieu University Bordeaux IV.

Affiliated institutions

A key challenge in the compilation of a university ranking is the handling of publications originating from research institutes and hospitals associated with universities. Among academic systems a wide variety exists in the types of relations maintained by universities with these affiliated institutions. Usually, these relationships are shaped by local regulations and practices affecting the comparability of universities on a global scale. As there is no easy solution for this issue, it is important that producers of university rankings employ a transparent methodology in their treatment of affiliated institutions.

CWTS distinguishes three different types of affiliated institutions:

  1. component
  2. joint research facility or organization
  3. associated organization

In the case of components the affiliated institution is actually part of a university or so tightly integrated with it or with one of its faculties that the two can be considered as a single entity. The University Medical Centres in the Netherlands which combine the medical faculties and the university hospitals are examples of components. In these cases, all teaching and research tasks in the field of medicine–traditionally the responsibility of the universities–have been delegated to these medical centres.

Joint research facilities or organizations are the same as components except for the fact that they are administered by more than one organization. The Brighton & Sussex Medical School (the joint medical faculty of the University of Brighton and the University of Sussex) and Charité (the medical school for both the Humboldt University and Freie Universität Berlin) are examples of this type of affiliated institution.

The third type of affiliated institution is the associated organization which is more loosely connected to a university. This organization is an autonomous institution that collaborates with one or more universities based on a joint purpose but at the same time has separate missions and tasks. In many countries, hospitals that operate as teaching or university hospitals fall into this category. Massachusetts General Hospital, one of the teaching hospitals of Harvard Medical School, is an example of an associated organization.

The treatment of university hospitals is of substantial consequence given that medical research has a strong presence in the Web of Science. The importance of associated organizations is growing as universities present themselves more and more frequently as network organizations. As a result, researchers formally employed by the university but working at associated organizations may not always mention the university in publications. On the other hand, as universities become increasingly aware of the significance of their visibility in research publications, they actively exert pressure on researchers to mention their affiliation with the university in their publications.

In the Leiden Ranking 2015, publications from affiliated institutions of the first two types are considered as output from the university. A different procedure has been followed for publications from associated organizations. A distinction is made between publications from associated organizations that also mention the university and publications from associated organizations that do not contain such a university affiliation. In the latter case, publications are not counted as publications originating from the university. In the event that a publication contains affiliations from a particular university as well as affiliations from its associated organization(s), both types of affiliations are credited to the contribution of that particular university to the publication in the fractional counting method.

Selection of universities included in the ranking

The 750 universities included in the Leiden Ranking 2015 were selected based on their publication output in the period 2010–2013. Only so-called core publications were counted, which are publications in international scientific journals. Also, only research articles and review articles were taken into account. Other types of publications were not considered. Furthermore, collaborative publications were counted fractionally. For instance, if a publication includes three addresses of which two belong to a particular university, the publication was counted with a weight of 2 / 3 = 0.67 for that university. About 1100 fractionally counted publications were required for a university to be included in the Leiden Ranking 2015.

It is important to note that universities do not need to apply to be included in the Leiden Ranking. The universities included in the Leiden Ranking are selected by CWTS according to the procedure described above. Universities do not need to provide any input themselves.

Data quality

The assignment of publications to universities is not free of errors, and it is important to emphasize that in general universities do not verify and approve the results of the Leiden Ranking data collection methodology. Two types of errors are possible. On the one hand, there may be false positives, which are publications that have been assigned to a university when in fact they do not belong to the university. On the other hand, there may be false negatives, which are publications that have not been assigned to a university when in fact they do belong to the university. The data collection methodology of the Leiden Ranking can be expected to yield substantially more false negatives than false positives. In practice, it turns out to be infeasible to manually check all addresses occurring in Web of Science. Because of this, many of the 5% least frequently occurring addresses in Web of Science have not been manually checked. This can be considered a reasonable upper bound for errors, since most likely the majority of these addresses do not belong to universities.