New SDK integrations to Rasa, Hugging Face and Sklearn

We've built our open-source IDE for data-centric NLP with the belief that data scientists and engineers know best what kind of framework they want to use for their model building. Today, we'll show you three new adapters for the SDK.

September 7, 2022
May 24, 2022

Let's jump right in.

Hugging Face

Transformer models are arguably one of the most interesting breakthroughs in natural language processing. With Hugging Face, you have access to an abundance of pre-trained models. Ultimately, however, you want to finetune them to your task at hand. This is where Hugging Face datasets come into play, and you can generate them with ease using our adapter.

The dataset is a Hugging Face-native object, which you can use as follows to finetune your data. This could look as follows:

The documentations of Hugging Face have some further examples on how to finetune your models.


The swiss army knife of machine learning. You most likely have already worked with it, and if so, you surely love the richness of algorithms to select from. We do so as well, and so we decided to add an integration to Sklearn. You can pull the data and train a model as easy as follows:

The data object already contains train and test splits derived from the weakly supervised and manually labeled data. You now can fully focus on hyperparameter tuning and model selection. We also highly recommend to check out Truss, an open-source library to quickly serve these models. We'll cover this in a separate article.


If you want to build a chatbot or conversational AI, Rasa arguably is one of the first choices with its strong framework and community. We love building chatbots with Rasa, and already covered a YouTube series on how to do so. Here, we'll now introduce you the Rasa adapter for our SDK, with which you can build the training data for your chatbot with ease.

This will directly pull your training data into a YAML format, off-the-shelf ready for your chatbots to learn from. This way, you can manage and maintain all your chat data within refinery, and have it ready for your chatbot training at hand.

You see, we already cover some integrations to open source NLP frameworks. If you're missing one, please let us know. We're happy to add them continuously, to bridge the gap between building training data and building models.

If you haven't tried out refinery yet, make sure to check out our GitHub. Also, feel free to join our Discord community - we're happy to meet you there!

Become a data pioneer now

We are building tools for the age of data-centric AI.
Let's build great use cases together.