cjeu-py
The Court of Justice of the European Union has produced thousands of decisions since its creation. Understanding how it builds legal doctrine – which cases it cites, how it treats its own precedent, which areas of law are connected and which remain siloed – requires processing a corpus too large for manual analysis. cjeu-py is a Python toolkit that transforms CJEU case law into research-ready datasets, from raw SPARQL queries to interactive citation networks.
The tool queries the EU Publications Office's CELLAR endpoint directly – the authoritative source for EU legal data – downloads full judgment texts from EUR-Lex, extracts citations using 14 regex patterns plus the Court's own typographic conventions, and optionally classifies each citation along four dimensions using LLMs. The result is a complete data pipeline that takes a researcher from zero to interactive network visualisation in a few commands.
I. The Pipeline
cjeu-py organises CJEU research into seven stages. Each stage produces inspectable intermediate output – Parquet tables, JSONL files, HTML visualisations – and can be run independently. The pipeline is resumable: interrupted downloads pick up where they left off, and classification checkpoints prevent re-processing.
Figure 1: The cjeu-py data pipeline. Seven stages from SPARQL query to interactive network.
The design philosophy is pragmatic. Data comes from official sources (CELLAR SPARQL, EUR-Lex REST API), not scraping. Downloads are cached to disk and resumable. The tool is pip-installable (pip install cjeu-py) and runs entirely from the command line. LLM classification is optional – the extraction and network stages work without it.
II. What It Collects
For the Grand Chamber alone – the CJEU's most authoritative formation – the tool collects 1,020 decisions spanning 2004 to 2025. Each decision carries rich metadata: CELEX and ECLI identifiers, date, court formation, judge-rapporteur, advocate general, procedure type, subject matter, and the full text of the judgment.
Figure 2: Grand Chamber decisions by year, 2004–2025. Colour indicates the dominant procedure type each year.
Preliminary references dominate the Grand Chamber docket (64%), followed by annulment actions (10%) and consultations (8%). But the procedure mix tells only part of the story. The subject-matter distribution reveals which areas of EU law the Grand Chamber considers important enough for its attention.
Figure 3: Top 10 subject-matter categories across Grand Chamber decisions. Cases may be assigned multiple subjects.
III. Citation Extraction
The heart of the pipeline is citation extraction. The tool identifies case references in judgment text using 14 regex patterns targeting different citation formats: ECLI identifiers, case numbers (Case C-xxx/xx), European Court Reports references, joined cases, and paragraph pinpoints. A dual fallback mechanism supplements regex with the Court's own typographic convention – cases cited in italics – and a party-name gazetteer built from all collected case names.
Each citation is anchored to its source paragraph, preserving the context needed for downstream classification. Across the Grand Chamber corpus, the extractor identifies 12,673 case-to-case citations – an average of roughly 12 citations per decision, though the distribution is heavily right-skewed.
IV. How the Court Uses Precedent
Raw citation counts reveal who cites whom, but not how. The optional classification stage uses LLMs to categorise each citation along four dimensions drawn from the framework in Marc Jacob's Precedents and Case-Based Reasoning in the European Court of Justice (Cambridge, 2014).
Figure 4: Citation classification across three dimensions. Based on 2,855 classified citations from AG opinions and judgments.
Precision captures depth of engagement. Nearly 79% of citations involve substantive engagement with the cited case – the Court discusses its reasoning, applies its test, or extends its logic. Only 8% are string citations (bare references with no discussion). This matters: it means the CJEU's citation practice is not decorative. When it cites a case, it means it.
Use captures the functional role. Over half (56%) of citations establish or invoke a legal principle. Interpretation (15%) and application of legal tests (8%) follow. The Court cites its own precedent primarily to anchor doctrinal propositions, not to resolve factual analogies.
Treatment is the most revealing dimension. 76% of citations follow the cited case; 19% are neutral references. Explicit departures occur in only 0.8% of citations. The Court's self-image as a stable, coherent legal system is borne out by the data – or at least, the Court is careful to present it that way.
V. The Citation Network
The extracted citations form a directed graph: each node is a decision, each edge a citation from one case to another. Network analysis reveals structure invisible to close reading.
Figure 5: The ten most-cited Grand Chamber decisions by in-degree (number of times cited by other Grand Chamber cases).
Kadi (2008) leads with 33 citations – the Court's landmark judgment on fundamental rights review of Security Council sanctions has become a touchstone across EU law. Opinion 2/13 (2014), on the EU's accession to the ECHR, follows with 32. Åkerberg Fransson (2013), which defined the scope of the Charter of Fundamental Rights, comes third with 28.
Louvain community detection identifies 51 distinct clusters in the network – doctrinal communities where cases cite each other densely but cite outside the cluster rarely. These clusters map recognisably onto areas of EU law: competition, free movement, fundamental rights, institutional questions, state aid.
The full interactive network – 500 Grand Chamber nodes, filterable by year, procedure, subject, and court – is available as a standalone visualisation.
View Interactive Grand Chamber Network →
VI. What It Enables
cjeu-py is not a finished analysis – it is infrastructure for analyses. The data it produces enables questions that were previously intractable at scale:
Which areas of EU law are most interconnected? Do competition and free movement cases cite each other, or do they develop in isolation? How does the Court's citation practice change when the Grand Chamber, rather than a chamber of five, decides? Do advocate general opinions systematically differ from judgments in how they treat precedent? When the Court distinguishes rather than follows a case, is the distinction genuine or cosmetic? Which judges sit on which cases, and does composition correlate with citation patterns?
The tool is open source and designed for researchers who may not be programmers. A five-command pipeline from zero to interactive network. Parquet tables that open in any data tool. CSV export with codebooks. Interactive HTML visualisations that run in any browser with no server.
Install: pip install cjeu-py
Code: github.com/niccoloridi/cjeu-py
PyPI: pypi.org/project/cjeu-py