The data scientist's choice to scale, assess + maintain natural language data.
Automate labeling where possible, find headache-causing subsets to collaborate with (inhouse) domain experts or crowd workers, and manage your data and workflows in one single application. And even better: refinery is open-source!
pip install kern-refinery
Used by data scientists at AI-driven organizations, both small and large
How it works.
Designed to put data scientists into control of workflows, and not the heavy lifting.
Annotate reference data using our built-in editor with role-based access, or push annotations from tools like Labelstudio or our Python SDK. Role-based access allows you to also share a simplified labeling session with e.g. crowd workers.
What refinery can do for you.
The central application for your data scientists.
Optimize annotation spend
Save time and money, especially when time of domain experts is a scarce resource. Automate what is possible, and find rare cases that need human attention.
Shorten model development time
Our users have been able to prototype complex models within an afternoon, just by scaling their training data. Bring your models to market faster with us.
Debug and improve your model
Modern algorithms are blackboxes. Find their weaknesses in a data-centric manner, and improve your model by fixing that data or creating new slices for re-training.
Collaborate with domain experts and annotators
It has never been easier to integrate domain expertise into your work specifically on the data you need help for. Just send them a link, or tell them to sign in.
Slice and explore your training data
Find insights about your data, integrate your existing models for benchmarking, and use neural search to find both similar examples and outliers. This is your navigation system.
Integrate with your existing workflow
If you have an annotation workflow up-and-running, you do not need to kill it. Just use our Python SDK to integrate it into your existing workflow, and gain all the benefits of refinery.
Control your labeling workflow.
refinery is designed to work both as a standalone tool, or as the operating system for your workflows.
Independent of your existing labeling workflow, refinery is adding further intelligence to your workflow.
Build large-scale, high-quality training datasets with ease.
Loved by practitioners + companies.
Simple to start with, and customizable to fit every use case. Helping data scientists scale and orchestrate their natural language data all round the world.
Tailored pricing, fitting your requirements.
refinery comes both via managed cloud or as an on-prem enterprise solution. Also, you can let us manage your crowdlabeling tasks in the managed cloud.
Designed for single-user workflows.
Install it on your local machine.
- Free forever.
- Ideal for smaller projects, e.g. side-projects or Proof-of-Concepts.
We do all the heavy-lifting for you.
Starting at €300 /mo.
Depending on workload, dedicated vs. shared instances, and labeling services.14-day trial
- Managed and monitored by us.
- Work together with your team.
- GPU-acceleration of large language model (LLM)-services.
- On-demand managed labeling services.
You run refinery, we help you make it a success.
Running on your own infrastructure.Early bird offer
- Deployed on your own infrastructure.
- Work together with your team.
- Custom API.
- Dedicated engineer support.
Frequently asked questions.
If you have any further questions, please don't hesitate to reach out to us.
- Which data formats are supported?
- We’ve structured our data formats around JSON, so you can upload most file types natively. This includes spreadsheets, text files, CSV data, generic JSON and many more.
- Which labeling tasks are supported?
- We currently offer classifications and span labelings. This includes e.g. muti- or binary-class text classifications, entity extraction and similarity labeling.
- How secure is this?
- Data security is our top priority. Please look into our security documentation for more information.
- Where can I find more details about the features?
- You can: 1) Look into our docs and our GitHub repository. It has extensive details about how to use the app. 2) Look into our community spaces, e.g. our YouTube channel or Discord server. 3) We're always happy to show you the product in a demo. Just reach out to us.
- We already have labeling tools. What are the benefits here?
- To gain benefits from refinery, it doesn't matter whether you labeled the data in our built-in editor or some other tool. You can use refinery as the operating system, pushing your data into it, and orchestrating your labeling workflows as well as assessing quality.
- What integrations do you offer?
- We build on top of primarily HuggingFace, qdrant and spaCy - those integrations are seamless. To build downstream models, we offer integrations to export the data e.g. for Rasa, PyTorch and so on. You can integrate 3rd-party tools in our labeling functions, and to integrate labeling workflows, you can use our Python SDK.
- Do we need to use your crowd labelers?
- You can either just work with the crowd labeling service of your choice, or tell us to look for fitting annotators for your project.
- Do you offer support?
- Yes, we can happily assist you with any questions you might have. We offer support via email, Discord and GitHub.
- Is this suitable for on-premises deployments?
- Yes. Please reach out to us to discuss your requirements.
Become a data-centric pioneer.
Let's look into your requirements and see, if refinery can help you.Request a demo