Designed with an open-core available on GitHub. Build from templates or build completely from scratch.
refinery is the editor for data-centric natural language processing. It combines training data and algorithms in a way that you can easily build NLP automations, e.g. to prototype ideas within an afternoon or to build quality assurance for your labeling workflow.
You can click on the below feature cards to jump into the documentation.
refinery comes with a built-in editor (incl. role-based access) supporting classifications, span-extraction and text generation. Further, you can export data to other annotation tools like Labelstudio.
Use our modular data management to find e.g. records with below 30% confidence and mismatching manual and automated labels, sorted by confidence. Assign that data either to an inhouse expert or a crowdlabeler.
You love Hugging Face, GPT-X or cohere for their large language models? We do too. That is why we integrated them into refinery. You can use them for embeddings (and neural search), active transfer learning, or even to create the training data for finetuning these LLMs on your data.
refinery is shipped with a Monaco editor, enabling you to write heuristics in plain Python. Use them for e.g. rules, API calls, regex, active transfer learning or zero-shot predictions.
In the project dashboard, you can find distribution statistics and a confusion matrix showing you where your project needs improvement. Every analysis can be filtered down to atomic level.
Use our Python SDK (also available for the CLI) to export and import data with ease. For instance, you can use it to batch-export data from refinery to your favorite data science framework or to batch-import data from your data sources into refinery.
Yes, you read that right. Our flagship product is open-sourced under the Apache 2.0 license. You can find the code on GitHub. We are also happy to accept contributions.
refinery comes with a built-in editor (incl. role-based access) supporting classifications, span-extraction and text generation. Further, you can export data to other annotation tools like Labelstudio.
Use our modular data management to find e.g. records with below 30% confidence and mismatching manual and automated labels, sorted by confidence. Assign that data either to an inhouse expert or a crowdlabeler.
You love Hugging Face, GPT-X or cohere for their large language models? We do too. That is why we integrated them into refinery. You can use them for embeddings (and neural search), active transfer learning, or even to create the training data for finetuning these LLMs on your data.
refinery is shipped with a Monaco editor, enabling you to write heuristics in plain Python. Use them for e.g. rules, API calls, regex, active transfer learning or zero-shot predictions.
In the project dashboard, you can find distribution statistics and a confusion matrix showing you where your project needs improvement. Every analysis can be filtered down to atomic level.
Use our Python SDK (also available for the CLI) to export and import data with ease. For instance, you can use it to batch-export data from refinery to your favorite data science framework or to batch-import data from your data sources into refinery.
Yes, you read that right. Our flagship product is open-sourced under the Apache 2.0 license. You can find the code on GitHub. We are also happy to accept contributions.
You need some Python knowledge to build an application on our platform, but you do not need a PhD. You have the full flexibility, but can develop fast and easy.
Data lives longer than code. With our data-centric approach, you build your intellectual data property, allowing you to stay flexible when it comes to your requirements.
Let's be honest here, models change. GPT-4 will follow GPT-3, and Huggingface releases new models incredibly fast. But your data is produced by your applications. Choose the right stack to build your data strategically.
There are use cases like the [email protected] one, which is recurring quite often. But there are also use cases that are very specific to your business. We help you build both with our low-code, data-centric NLP approach.