refinery

Database for your NLP training data + algorithms to scale, assess and maintain that data = refinery.

This is the flagship of our NLP stack. refinery is both database and the application logic editor; it allows you to scale, assess and maintain your data. It automates the process of data cleaning and labeling, and shows you where improvements can be made. It also allows you to easily work together with inhouse or external annotators, and leverages the power of large language models to help you with your data.

Generally, refinery works on a few but simple principles:

Enabling ideas of one-person-armies

We believe that developers can have crazy ideas, and we want to lower the barrier for them to go for that idea. refinery is designed to build labeled training data much faster, so that it takes you very little time to prototype an idea. We've received much love for exactly that, so make sure to give it a try for your next project.

Extending your existing labeling approach

Yeah, refinery isn't primarily a labeling tool. It has a built-in labeling editor, that is true, but its main advantages come with automation and data management. You can integrate any kind of heuristic to label what is possible automatically, and to then focus on headache-causing subsets afterwards. If you do the labeling in refinery or any other tool (even crowd labeled) doesn't matter!

Pushing collaboration

While doing so, we aim to improve the collaboration between engineers and subject matter experts (SMEs). In the past, we've seen how our application was being used in meetings to discuss label patterns in form of labeling functions and distant supervisors. We believe that data-centric AI is the best way to leverage collaboration.

Open-source, and treating training data as a software artifact

We hate the idea that there are still use cases in which the training data is just a plain CSV-file. That is okay if you really just quickly want to prototype something at hand with a few records, but any serious software should be maintainable. We believe an open-source solution for training data management is what's needed here. refinery is the tool helping you to document your data. That's how you treat training data as a software artifact.

Extensive documentation

On the leftern sidebar, you will find extensive documentation for refinery and its related products bricks, gates and workflow. If you have any further questions, please don't hesitate reaching out to us.