Fine-Tune Embeddings for Your Domain
Boost retrieval accuracy by 30%: fine-tune embedding models on your specific documents and queries.
Why Fine-Tune?
Generic embeddings work well, but domain-specific fine-tuning gives 30-50% accuracy boost:
Before (generic):
- Medical query: "MI treatment" → ❌ matches "Michigan"
After (fine-tuned):
- Medical query: "MI treatment" → ✅ matches "Myocardial Infarction protocols"
When to Fine-Tune
✅ Fine-tune when:
- Domain-specific jargon (legal, medical, technical)
- 1000+ labeled query-document pairs
- Base model underperforms (< 70% recall)
❌ Skip fine-tuning when:
- General domain
- < 500 training examples
- Base model already works well
Training Data Format
DEVELOPERpython# Positive pairs (query → relevant document) train_data = [ { "query": "What causes diabetes?", "positive": "Type 2 diabetes is caused by insulin resistance...", "negative": "Diabetic retinopathy affects the eyes..." # Optional }, { "query": "How to lower blood pressure?", "positive": "Lifestyle changes like diet and exercise reduce BP...", "negative": "High blood pressure symptoms include headaches..." } ]
Method 1: Sentence Transformers
DEVELOPERpythonfrom sentence_transformers import SentenceTransformer, InputExample, losses from torch.utils.data import DataLoader # Load base model model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2') # Prepare training data train_examples = [ InputExample(texts=[item['query'], item['positive']]) for item in train_data ] # Create dataloader train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=16) # Define loss (contrastive learning) train_loss = losses.MultipleNegativesRankingLoss(model) # Fine-tune model.fit( train_objectives=[(train_dataloader, train_loss)], epochs=3, warmup_steps=100, output_path='./fine-tuned-model' )
Method 2: OpenAI Fine-Tuning
DEVELOPERpythonimport openai # Prepare data in JSONL format with open('training_data.jsonl', 'w') as f: for item in train_data: f.write(json.dumps({ "input": item['query'], "output": item['positive'] }) + '\n') # Upload training file file = openai.File.create( file=open("training_data.jsonl", "rb"), purpose='fine-tune' ) # Create fine-tuning job job = openai.FineTuningJob.create( training_file=file.id, model="text-embedding-3-small" ) # Wait for completion status = openai.FineTuningJob.retrieve(job.id) print(status.status) # 'succeeded' # Use fine-tuned model embeddings = openai.Embedding.create( input="your query", model=f"ft:{job.fine_tuned_model}" )
Method 3: Hard Negative Mining
Improve contrastive learning with hard negatives:
DEVELOPERpythonfrom sentence_transformers import losses # Generate hard negatives (similar but irrelevant documents) def mine_hard_negatives(query, candidates, model, k=5): query_emb = model.encode(query) cand_embs = model.encode(candidates) # Find most similar but incorrect documents scores = cosine_similarity([query_emb], cand_embs)[0] hard_neg_indices = np.argsort(scores)[-k:] return [candidates[i] for i in hard_neg_indices] # Training with hard negatives train_examples = [] for item in train_data: hard_negs = mine_hard_negatives( item['query'], all_documents, base_model ) for neg in hard_negs: train_examples.append( InputExample(texts=[ item['query'], item['positive'], neg # Hard negative ]) ) # Use TripletLoss train_loss = losses.TripletLoss(model)
Evaluation
DEVELOPERpythonfrom sklearn.metrics import ndcg_score def evaluate_model(model, test_queries, test_docs, relevance_labels): predictions = [] for query in test_queries: query_emb = model.encode(query) doc_embs = model.encode(test_docs) scores = cosine_similarity([query_emb], doc_embs)[0] predictions.append(scores) # nDCG@10 ndcg = ndcg_score(relevance_labels, predictions, k=10) return ndcg # Compare base vs fine-tuned base_model = SentenceTransformer('all-MiniLM-L6-v2') fine_tuned_model = SentenceTransformer('./fine-tuned-model') print(f"Base model nDCG@10: {evaluate_model(base_model, ...)}") print(f"Fine-tuned nDCG@10: {evaluate_model(fine_tuned_model, ...)}")
Incremental Fine-Tuning
Update model as new data arrives:
DEVELOPERpython# Load previously fine-tuned model model = SentenceTransformer('./fine-tuned-model') # Add new training data new_train_examples = [...] # Continue training (warm start) model.fit( train_objectives=[(new_dataloader, train_loss)], epochs=1, warmup_steps=50, output_path='./fine-tuned-model-v2' )
Distillation (Fast Inference)
Fine-tune large model, then distill to small one:
DEVELOPERpythonfrom sentence_transformers import models, SentenceTransformer, losses # Teacher: large fine-tuned model teacher = SentenceTransformer('fine-tuned-large-model') # Student: small base model student = SentenceTransformer('all-MiniLM-L6-v2') # Distillation loss train_loss = losses.MSELoss(student, teacher) # Train student to mimic teacher student.fit( train_objectives=[(train_dataloader, train_loss)], epochs=3 ) # Now student is fast but performs like teacher
Fine-tuning embeddings is the secret weapon for domain-specific RAG. Invest in it early.
Tags
Related Guides
Embeddings: The Foundation of Semantic Search
Deep dive into embedding models, vector representations, and how to choose the right embedding strategy for your RAG system.
Multilingual Embeddings for Global RAG
Build RAG systems that work across languages using multilingual embedding models and cross-lingual retrieval.
Choosing Embedding Models for RAG
Compare embedding models in 2025: OpenAI, Cohere, open-source alternatives. Find the best fit for your use case.