News

Hugging Face: New Open-Source RAG Models

April 28, 2026
7 min read
Ailog Team

Hugging Face releases a new family of models optimized for RAG: embeddings, rerankers, and specialized LLMs. Complete overview.

Hugging Face Enriches Open-Source RAG Ecosystem

Hugging Face announces the release of a new family of models specially optimized for RAG applications. This release includes embedding models, rerankers, and LLMs adapted for augmented generation.

"Our goal is to democratize enterprise-grade RAG," explains Clement Delangue, CEO of Hugging Face. "These models offer performance comparable to proprietary solutions, in open-source."

The New Models

Embeddings: HF-RAG-Embed

A new family of RAG-optimized embedding models:

ModelDimensionsContextMTEB ScoreLicense
hf-rag-embed-small38451262.1Apache 2.0
hf-rag-embed-base768204865.8Apache 2.0
hf-rag-embed-large1024819268.4Apache 2.0
hf-rag-embed-xl20481638470.2Apache 2.0

Features:

  • Specifically trained for document retrieval
  • Native support for asymmetric queries (query vs document)
  • Optimized for multilingual (100 languages)
DEVELOPERpython
from sentence_transformers import SentenceTransformer model = SentenceTransformer("huggingface/hf-rag-embed-large") # Document embeddings doc_embeddings = model.encode( documents, prompt_name="document" # Automatic prefix ) # Query embeddings query_embedding = model.encode( query, prompt_name="query" )

Check our guide on choosing embedding models.

Rerankers: HF-RAG-Rerank

Performant and lightweight reranking models:

ModelParametersLatency (P50)nDCG@10
hf-rag-rerank-tiny33M5ms58.2
hf-rag-rerank-small110M12ms64.7
hf-rag-rerank-base330M28ms68.9
hf-rag-rerank-large560M45ms71.3
DEVELOPERpython
from transformers import AutoModelForSequenceClassification, AutoTokenizer model = AutoModelForSequenceClassification.from_pretrained( "huggingface/hf-rag-rerank-base" ) tokenizer = AutoTokenizer.from_pretrained( "huggingface/hf-rag-rerank-base" ) # Reranking pairs = [(query, doc) for doc in candidate_docs] inputs = tokenizer(pairs, padding=True, return_tensors="pt") scores = model(**inputs).logits.squeeze()

These models perfectly complement our guide on reranking.

LLMs: HF-RAG-LLM

LLMs specially adapted for RAG generation:

ModelParametersContextRAGBench Score
hf-rag-llm-7b7B32K72.4
hf-rag-llm-13b13B64K76.8
hf-rag-llm-34b34B128K81.2

Unique features:

  • Trained to systematically cite sources
  • Reduced sensitivity to hallucinations
  • Instruction-following optimized for RAG
DEVELOPERpython
from transformers import pipeline generator = pipeline( "text-generation", model="huggingface/hf-rag-llm-13b" ) response = generator( f"""<context> {retrieved_documents} </context> <question> {user_question} </question> Answer by citing your sources with [1], [2], etc.""" )

Benchmarks

Comparison with Competition

Embeddings (MTEB Retrieval)

ModelScoreLatencyOpen-source
hf-rag-embed-large68.415msYes
Cohere Embed v571.245msNo
text-embedding-3-large67.440msNo
BGE-M364.812msYes

Rerankers

ModelnDCG@10LatencyOpen-source
hf-rag-rerank-base68.928msYes
Cohere Rerank 372.135msNo
ms-marco-MiniLM64.28msYes

LLMs (RAGBench)

ModelScoreHallucinationsOpen-source
hf-rag-llm-34b81.22.8%Yes
GPT-4 Turbo84.52.4%No
Claude 3 Opus86.11.8%No
Mixtral 8x22B78.44.1%Yes

Deployment

Deployment Options

1. Hugging Face Inference Endpoints

DEVELOPERpython
from huggingface_hub import InferenceClient client = InferenceClient(model="huggingface/hf-rag-embed-large") embeddings = client.feature_extraction(texts)

Price: $0.06/hour (GPU) to $0.60/hour (high-perf GPU)

2. Self-hosted with vLLM

DEVELOPERbash
pip install vllm python -m vllm.entrypoints.openai.api_server \ --model huggingface/hf-rag-llm-13b \ --port 8000

3. Optimization with ONNX

DEVELOPERpython
from optimum.onnxruntime import ORTModelForSequenceClassification model = ORTModelForSequenceClassification.from_pretrained( "huggingface/hf-rag-rerank-base", export=True )

Performance gain: 2-3x on CPU

For production configurations, check our guide on production deployment.

Quantization

Models are available in quantized versions:

QuantizationSizeQuality Loss
FP16100%0%
INT850%-0.5%
INT425%-2%
GPTQ25%-1.5%

Integration with Frameworks

LangChain

DEVELOPERpython
from langchain_huggingface import HuggingFaceEmbeddings embeddings = HuggingFaceEmbeddings( model_name="huggingface/hf-rag-embed-large" )

LlamaIndex

DEVELOPERpython
from llama_index.embeddings.huggingface import HuggingFaceEmbedding embed_model = HuggingFaceEmbedding( model_name="huggingface/hf-rag-embed-large" )

Ailog

HF-RAG models are integrated as an option in the Ailog configuration.

Recommended Use Cases

When to Use HF-RAG

Ideal for:

  • Data sovereignty constraints
  • Limited budget (self-hosting)
  • Need for customization/fine-tuning
  • High request volume

Less suitable for:

  • Teams without ML expertise
  • Need for absolute best quality
  • Rapid prototypes

Fine-tuning

Models are designed for fine-tuning:

DEVELOPERpython
from transformers import Trainer, TrainingArguments training_args = TrainingArguments( output_dir="./fine-tuned-rag-embed", per_device_train_batch_size=32, num_train_epochs=3 ) trainer = Trainer( model=model, args=training_args, train_dataset=domain_dataset ) trainer.train()

Check our guide on fine-tuning embeddings.

Our Take

This release represents a major advance for open-source:

Strengths:

  • Performance close to proprietary
  • Permissive license (Apache 2.0)
  • Models optimized for RAG
  • Excellent documentation

Points of attention:

  • Requires expertise to deploy
  • Infrastructure costs if self-hosted
  • No commercial support

For organizations with sovereignty constraints or high volumes, HF-RAG becomes a credible alternative to proprietary solutions.

Platforms like Ailog allow using these models without managing infrastructure, combining open-source with simplicity.

Check our RAG introduction guide to get started.

Tags

RAGHugging Faceopen-sourceembeddingsLLM

Related Posts

Ailog Assistant

Ici pour vous aider

Salut ! Pose-moi des questions sur Ailog et comment intégrer votre RAG dans vos projets !