The data scientist's choice to scale, assess + maintain natural language data.

Automate labeling where possible, find headache-causing subsets to collaborate with (inhouse) domain experts or crowd workers, and manage your data and workflows in one single application. And even better: refinery is open-source!

pip install kern-refinery

refinery start

Used by data scientists at AI-driven organizations, both small and large

SamsungCohereDocuSign
CrowdBarmeniaSAP

How it works.

Designed to put data scientists into control of workflows, and not the heavy lifting.

You can scroll to the right

Annotate reference data using our built-in editor with role-based access, or push annotations from tools like Labelstudio or our Python SDK. Role-based access allows you to also share a simplified labeling session with e.g. crowd workers.

What refinery can do for you.

The central application for your data scientists.

  • Optimize annotation spend

    Save time and money, especially when time of domain experts is a scarce resource. Automate what is possible, and find rare cases that need human attention.

  • Shorten model development time

    Our users have been able to prototype complex models within an afternoon, just by scaling their training data. Bring your models to market faster with us.

  • Debug and improve your model

    Modern algorithms are blackboxes. Find their weaknesses in a data-centric manner, and improve your model by fixing that data or creating new slices for re-training.

  • Collaborate with domain experts and annotators

    It has never been easier to integrate domain expertise into your work specifically on the data you need help for. Just send them a link, or tell them to sign in.

  • Slice and explore your training data

    Find insights about your data, integrate your existing models for benchmarking, and use neural search to find both similar examples and outliers. This is your navigation system.

  • Integrate with your existing workflow

    If you have an annotation workflow up-and-running, you do not need to kill it. Just use our Python SDK to integrate it into your existing workflow, and gain all the benefits of refinery.

Control your labeling workflow.

refinery is designed to work both as a standalone tool, or as the operating system for your workflows.

Independent of your existing labeling workflow, refinery is adding further intelligence to your workflow.

Build large-scale, high-quality training datasets with ease.

Loved by practitioners + companies.

Simple to start with, and customizable to fit every use case. Helping data scientists scale and orchestrate their natural language data all round the world.

    • With refinery we were able to drastically simplify and organize a process of automated, standardized content classification, while simultaneously creating an implicit labeling documentation to rely on. The rich and versatile Python SDK allowed for a seamless integration with our pre-existing architecture.

      Finn Schmidt
      Data Scientist at Büro Bardohn
    • I have worked very closely with Johannes and the Kern AI team and it was always struck by excellent expertise, deep analytical skills and customer orientation. I am sure Johannes and his team will have a great future.

      Martin Raab
      CTO at Evolution Time Critical
    • Groundbreaking work. An IDE for NLP with extensive data management.

      Theophano Mitsa
      Data Scientist at Aretisoft
    • We will be able to further improve the automatic selection of conspicuous medical bills, which are routed to the final expert check at Barmenia.

      Gerhard Hausmann
      Lead AI Architect at Barmenia
    • I used refinery to label a dataset of 200K radiology reports. The UI is incredibly simple to use, and felt like what I wanted to build. NLP is ready for automation and refinery provides you the tools to do it.

      George Pearson
      Data Scientist at Behold AI
    • The example of Kern AI shows how closely research and entrepreneurship are intertwined at HPI. The system, which emerged from a research project, helps data scientists to implement precise AI models significantly faster and to integrate business departments into the development process.

      Prof. Dr. Falk Uebernickel
      Chair of Design Thinking and Innovation Research at HPI
    • refinery enables us to create high-quality training data within hours instead of weeks. You can tell that it is designed by and for Data Scientists, and we can use it for several use cases. I would not want to miss it.

      Joan Reyero
      CTO at Crowd.dev
    • I took the time to try the framework, honestly the framework is the best I have used so far. So easy and intuitive to use, happy that I've got the chance to know about and use it

      Abdessamad
      Data Scientist in our community
    • A little while ago, I took the Kern AI refinery for a spin and it was a very good experience! I'm happy to see that Henrik Wenck and team is taking their effort open source. I'm looking forward to see your next steps :)

      Fredrik Olsson
      Head of Data Science, Product Owner at Gavagai

Tailored pricing, fitting your requirements.

refinery comes both via managed cloud or as an on-prem enterprise solution. Also, you can let us manage your crowdlabeling tasks in the managed cloud.

Open-source

Designed for single-user workflows.

For free

Install it on your local machine.

Go to GitHub

What's included

  • Free forever.
  • Ideal for smaller projects, e.g. side-projects or Proof-of-Concepts.

Managed cloud

We do all the heavy-lifting for you.

Starting at 300 /mo.

Depending on workload, dedicated vs. shared instances, and labeling services.

14-day trial
Request a demo

What's included

  • Managed and monitored by us.
  • Work together with your team.
  • GPU-acceleration of large language model (LLM)-services.
  • On-demand managed labeling services.

On-premise deployment

You run refinery, we help you make it a success.

3,000 /mo.

Running on your own infrastructure.

Early bird offer
Request a demo

What's included

  • Deployed on your own infrastructure.
  • Work together with your team.
  • Custom API.
  • Dedicated engineer support.

Frequently asked questions.

If you have any further questions, please don't hesitate to reach out to us.

Which data formats are supported?
We’ve structured our data formats around JSON, so you can upload most file types natively. This includes spreadsheets, text files, CSV data, generic JSON and many more.
Which labeling tasks are supported?
We currently offer classifications and span labelings. This includes e.g. muti- or binary-class text classifications, entity extraction and similarity labeling.
How secure is this?
Data security is our top priority. Please look into our security documentation for more information.
Where can I find more details about the features?
You can: 1) Look into our docs and our GitHub repository. It has extensive details about how to use the app. 2) Look into our community spaces, e.g. our YouTube channel or Discord server. 3) We're always happy to show you the product in a demo. Just reach out to us.
We already have labeling tools. What are the benefits here?
To gain benefits from refinery, it doesn't matter whether you labeled the data in our built-in editor or some other tool. You can use refinery as the operating system, pushing your data into it, and orchestrating your labeling workflows as well as assessing quality.
What integrations do you offer?
We build on top of primarily HuggingFace, qdrant and spaCy - those integrations are seamless. To build downstream models, we offer integrations to export the data e.g. for Rasa, PyTorch and so on. You can integrate 3rd-party tools in our labeling functions, and to integrate labeling workflows, you can use our Python SDK.
Do we need to use your crowd labelers?
You can either just work with the crowd labeling service of your choice, or tell us to look for fitting annotators for your project.
Do you offer support?
Yes, we can happily assist you with any questions you might have. We offer support via email, Discord and GitHub.
Is this suitable for on-premises deployments?
Yes. Please reach out to us to discuss your requirements.

Become a data-centric pioneer.

Let's look into your requirements and see, if refinery can help you.

Request a demo