Corpus research

Welcome to this Semlab demo. This page contains a brief description of the purpose of the demo application plus some background information. Click the button to try the demo for yourself.

Language technology is ideally suited to quickly process large amounts of text.

In our corpus research demo, this is illustrated with the use case of a legal file audit. This due diligence is traditionally executed by a team of legal personnel and can take several days.

With the help of language technology, the entire audit is performed in seconds. The lawyer only has to approve the findings.

Start the demo to use the due diligence user interface. If necessary, read the manual for more explanation about the application.

Good luck!


To find the relevant content in the documents, we use a hybrid system consisting of the latest transformer-based language models, more traditional multi-feature neural networks and pattern-based recognition. The latter is maintained in our ontology of legal terminology. In addition, our software offers the possibility to add additional logic. This allows us to reason about the detected content. Even when this comes from different documents.

An example (for insiders): to draw the conclusion that the “large company regime” applies, not only must “Non-executives be appointed at the company” (can be found in the Chamber of Commerce extract or the deed of incorporation), but “the works council must also be authorized to nominate non-execs” AND “the non-execs are entitled to appoint or dismiss the statutory directors” (the latter can be found in the Articles of Association or in an amendment to the Articles of Association).

Our software automatically detects the individual passages and presents them  for checking. When all three necessary parts are approved, the software draws the conclusion that the large company regime applies. This conclusion is then added to the review report.

Click here for more specific information about the due diligence application.

Practical applications

The use case of this demo application is the legal due diligence. But really, only the ontology with legal terms and some aspects of the user interface are specific to this use case. All underlying technology is completely generic and can be used directly in any other knowledge domain. This approach is particularly suitable for situations where conclusions need to be drawn quickly from a multi-document dossier. For example, in the judiciary when collecting facts for a defense, in healthcare when processing historical findings or in the government when assessing license applications.


Semlab language technology opportunities are endless!

We would like to invite you for a conversation about the possibilities of language technology within your organization. We often propose a quick pilot study to demonstrate the feasibility of our approach. This is also the fastest way to a reliable estimate of the resulting efficiency benefits. You can reach us via or call +31-172-494777.

Raoul Wallenbergplein 33 Alphen aan den Rijn – The Netherlands +31 172 49477 –
© Semlab 2024