Internships at Spinque for NLP, ML and IR students

Spinque (co-founded by Radboud’s own prof. Arjen P. de Vries) is always looking for talented research interns to join their team of developers and scientists. The usual projects at Spinque are hands-on and therefore especially suited as research internships, but MSc thesis projects can be formulated with practical use cases as starting point. Below is one such suggested thesis project.

About Spinque

Spinque originated in 2010 at the Dutch National Research Centre in Computer Science and Mathematics (Centrum Wiskunde & Informatica) in Amsterdam. The founders had worked on the integration of ‘high-performance database engines’ and ‘information retrieval’ to improve the flexibility of search engines, and subsequently were the first to introduce a scalable implementation of a probabilistic graph database to the market. In 2014, support for linked data and knowledge graphs was added to the technology. Today, Spinque technology answers millions of queries on a daily basis in domains such as eCommerce, government, enterprise search and cultural heritage. With close ties to academia, we are able to drive innovation backed by the latest research output.

Master thesis opportunity

Problem context:

When searching over small collections, say documents that belong to one organization, it can be difficult to create a stable ranking that depends on term distributions of those collections. There are efforts in open web search to release partial indexes of the web, through a common index file format (CIFF), that can be used to develop search systems. As these indexes are created from larger collections, the distribution of terms in these indexes is more stable, and they might provide additional useful information when they are integrated in search solutions for smaller collections.

How this data should be integrated is not clear however, smaller collections have more skewed distributions, more the frequency of certain terms is and should be different than that of a larger web collection. We are interested if we can integrate larger collections, and by doing this increase the effectiveness of our search solutions.

Research Objective:

The assignment is to investigate how data from external indexes can be integrated in existing search solutions, and increase the effectiveness of the ranking methods in these solutions. We are particularly interested to do this in the context of raadzaam, a search engine developed by Spinque for council information for municipalities in the Netherlands. To achieve this, we have defined the following assignments:

Integrate data from CIFF indexes into Spinque search solutions Compare strategies that use CIFF statistics to those that do not, and investigate if strategies become significantly better when CIFF is used.

Expectations

  • MSc student in computer science, artificial intelligence or related field
  • Knowledge of information retrieval, databases and/or machine learning (depending on internship topic)
  • Programming experience; preferably familiar with java
  • Ability to work independently
  • Participate actively in discussions

How to apply?

If you think that you match the expectations regarding potential interns, contact Wouter Alink to apply. After acceptance at Spinque, contact Arjen P. de Vries to arrange the internal supervision at Radboud University.