Data extraction

Welcome to this Semlab demo. This page contains a brief description of the purpose of the demo application plus some background information. Click the button to try the demo for yourself.

This application demonstrates the power of data extraction for filtering an archive of documents. Because specific relevant data mentioned in the documents has been extracted, it can be used to filter the archive. In this way, relevant documents can be retrieved very quickly. Moreover, this data can directly be used to generate statistical reports about the documents in the archive.

The demo contains all published Dutch court rulings in the field of violent crime and the Opium Act article 2 (hard drugs).

Start the demo, filter the statements and then plot one of the extracted attributes to quickly visualize the distribution in the archive.

Good luck!


The language technology used for this demo application is very similar to that used for the “Semantic Search and Filtering” demo. Again, the corpus is annotated automatically according to our Jurisprudence ontology, but this demo application does not contain a search box. It does, however, have many more features that can be used to filter the data set, and an important part of these features are quantitative in nature.

Also in this demo, the section in which the features are found determines the context and thus the meaning. In the user interface, these are divided into three groups: the entire file/document, only the indictment and the statement of evidence. The application also contains a widget with which a characteristic can be displayed in a graph to quickly display the distribution over the (filtered) documents.

Practical applications

Data extraction from documents containing “plain text” (so-called natural language) has many applications. Think, for example, of finding the claim amount in e-mail complaints to prioritize the treatment, or extracting locations of specific news event in order to plot them geographically.

By extracting the data from the documents, it becomes quantifiable and therefore suitable for all kinds of automatic processing. In fact, data extraction converts a text into a model that can be stored in a database. This allows the text documents to be further processed in the same way as other structured forms of data.


Semlab language technology opportunities are endless!

We would like to invite you for a conversation about the possibilities of language technology within your organization. We often propose a quick pilot study to demonstrate the feasibility of our approach. This is also the fastest way to a reliable estimate of the resulting efficiency benefits. You can reach us via or call +31-172-494777.

Raoul Wallenbergplein 33 Alphen aan den Rijn – The Netherlands +31 172 49477 –
© Semlab 2024