Weaviate : Base de Données Vectorielle Alimentée par GraphQL

Configurez Weaviate pour le RAG de production avec les requêtes GraphQL, la recherche hybride et les modules génératifs.

Auteur
Équipe de Recherche Ailog
Date de publication
Temps de lecture
12 min de lecture
Niveau
intermediate
Étape du pipeline RAG
Storage

Pourquoi Weaviate ? • API GraphQL (requêtes flexibles) • Modules de vectorisation intégrés • Recherche hybride (vecteur + BM25) • Recherche générative (RAG intégré) • Open-source + cloud géré

Configuration Docker

``bash docker run -p 8080:8080 -p 50051:50051 semitechnologies/weaviate:latest `

Ou avec docker-compose :

`yaml version: '3.8' services: weaviate: image: semitechnologies/weaviate:1.24.6 ports: • "8080:8080" • "50051:50051" environment: QUERY_DEFAULTS_LIMIT: 25 AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true' PERSISTENCE_DATA_PATH: '/var/lib/weaviate' DEFAULT_VECTORIZER_MODULE: 'text2vec-openai' ENABLE_MODULES: 'text2vec-openai,generative-openai' OPENAI_APIKEY: ${OPENAI_API_KEY} `

Client Python

`python import weaviate

client = weaviate.Client("http://localhost:8080")

Create schema schema = { "class": "Document", "vectorizer": "text2vec-openai", "moduleConfig": { "text2vec-openai": { "model": "text-embedding-3-small" } }, "properties": [ { "name": "content", "dataType": ["text"], "moduleConfig": { "text2vec-openai": { "skip": False, "vectorizePropertyName": False } } }, { "name": "title", "dataType": ["text"] }, { "name": "category", "dataType": ["text"] } ] }

client.schema.create_class(schema) `

Insertion de Documents

`python Auto-vectorization client.data_object.create( class_name="Document", data_object={ "content": "Weaviate is a vector database...", "title": "Introduction to Weaviate", "category": "tutorial" } )

Batch import (faster) with client.batch as batch: batch.batch_size = 100

for doc in documents: batch.add_data_object( class_name="Document", data_object={ "content": doc['text'], "title": doc['title'], "category": doc['category'] } ) `

Recherche Sémantique (GraphQL)

`python nearText search result = ( client.query .get("Document", ["content", "title", "category"]) .with_near_text({"concepts": ["vector database tutorial"]}) .with_limit(5) .do() )

print(result["data"]["Get"]["Document"]) `

Recherche Hybride

Combiner vecteur + recherche par mots-clés :

`python result = ( client.query .get("Document", ["content", "title"]) .with_hybrid( query="machine learning models", alpha=0.5 0=BM25, 1=vecteur, 0.5=équilibré ) .with_limit(10) .do() ) `

Filtrage

`python Filter by category result = ( client.query .get("Document", ["content", "title"]) .with_near_text({"concepts": ["python tutorial"]}) .with_where({ "path": ["category"], "operator": "Equal", "valueText": "programming" }) .with_limit(5) .do() ) `

Recherche Générative (RAG Intégré)

`python Generate answer from retrieved documents result = ( client.query .get("Document", ["content", "title"]) .with_near_text({"concepts": ["how to use embeddings"]}) .with_generate( single_prompt="Résume ce document : {content}" ) .with_limit(3) .do() )

Access generated text for doc in result["data"]["Get"]["Document"]: print(doc["_additional"]["generate"]["singleResult"]) `

Multi-Tenancy

`python Create tenants client.schema.add_class_tenants( class_name="Document", tenants=[ {"name": "tenant_a"}, {"name": "tenant_b"} ] )

Query specific tenant result = ( client.query .get("Document", ["content"]) .with_tenant("tenant_a") .with_near_text({"concepts": ["query"]}) .do() ) `

Réplication

`yaml docker-compose.yml with 3 nodes services: weaviate-node1: image: semitechnologies/weaviate:latest environment: CLUSTER_HOSTNAME: 'node1' CLUSTER_GOSSIP_BIND_PORT: '7100' CLUSTER_DATA_BIND_PORT: '7101'

weaviate-node2: image: semitechnologies/weaviate:latest environment: CLUSTER_HOSTNAME: 'node2' CLUSTER_JOIN: 'weaviate-node1:7100' `

Pipeline RAG Python

`python def weaviate_rag(query): Retrieve with generative search result = ( client.query .get("Document", ["content", "title"]) .with_near_text({"concepts": [query]}) .with_generate( grouped_task=f"Réponds à cette question : {query}", grouped_properties=["content"] ) .with_limit(5) .do() )

Extract answer answer = result["data"]["Get"]["Document"][0]["_additional"]["generate"]["groupedResult"]

return answer

Usage answer = weaviate_rag("What is machine learning?") print(answer) ``

L'interface GraphQL de Weaviate et le RAG intégré le rendent idéal pour le prototypage rapide.

Tags

  • weaviate
  • base-de-données-vectorielle
  • graphql
  • storage
4. StorageIntermédiaire

Weaviate : Base de Données Vectorielle Alimentée par GraphQL

16 novembre 2025
12 min de lecture
Équipe de Recherche Ailog

Configurez Weaviate pour le RAG de production avec les requêtes GraphQL, la recherche hybride et les modules génératifs.

Pourquoi Weaviate ?

  • API GraphQL (requêtes flexibles)
  • Modules de vectorisation intégrés
  • Recherche hybride (vecteur + BM25)
  • Recherche générative (RAG intégré)
  • Open-source + cloud géré

Configuration Docker

DEVELOPERbash
docker run -p 8080:8080 -p 50051:50051 semitechnologies/weaviate:latest

Ou avec docker-compose :

DEVELOPERyaml
version: '3.8' services: weaviate: image: semitechnologies/weaviate:1.24.6 ports: - "8080:8080" - "50051:50051" environment: QUERY_DEFAULTS_LIMIT: 25 AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true' PERSISTENCE_DATA_PATH: '/var/lib/weaviate' DEFAULT_VECTORIZER_MODULE: 'text2vec-openai' ENABLE_MODULES: 'text2vec-openai,generative-openai' OPENAI_APIKEY: ${OPENAI_API_KEY}

Client Python

DEVELOPERpython
import weaviate client = weaviate.Client("http://localhost:8080") # Create schema schema = { "class": "Document", "vectorizer": "text2vec-openai", "moduleConfig": { "text2vec-openai": { "model": "text-embedding-3-small" } }, "properties": [ { "name": "content", "dataType": ["text"], "moduleConfig": { "text2vec-openai": { "skip": False, "vectorizePropertyName": False } } }, { "name": "title", "dataType": ["text"] }, { "name": "category", "dataType": ["text"] } ] } client.schema.create_class(schema)

Insertion de Documents

DEVELOPERpython
# Auto-vectorization client.data_object.create( class_name="Document", data_object={ "content": "Weaviate is a vector database...", "title": "Introduction to Weaviate", "category": "tutorial" } ) # Batch import (faster) with client.batch as batch: batch.batch_size = 100 for doc in documents: batch.add_data_object( class_name="Document", data_object={ "content": doc['text'], "title": doc['title'], "category": doc['category'] } )

Recherche Sémantique (GraphQL)

DEVELOPERpython
# nearText search result = ( client.query .get("Document", ["content", "title", "category"]) .with_near_text({"concepts": ["vector database tutorial"]}) .with_limit(5) .do() ) print(result["data"]["Get"]["Document"])

Recherche Hybride

Combiner vecteur + recherche par mots-clés :

DEVELOPERpython
result = ( client.query .get("Document", ["content", "title"]) .with_hybrid( query="machine learning models", alpha=0.5 # 0=BM25, 1=vecteur, 0.5=équilibré ) .with_limit(10) .do() )

Filtrage

DEVELOPERpython
# Filter by category result = ( client.query .get("Document", ["content", "title"]) .with_near_text({"concepts": ["python tutorial"]}) .with_where({ "path": ["category"], "operator": "Equal", "valueText": "programming" }) .with_limit(5) .do() )

Recherche Générative (RAG Intégré)

DEVELOPERpython
# Generate answer from retrieved documents result = ( client.query .get("Document", ["content", "title"]) .with_near_text({"concepts": ["how to use embeddings"]}) .with_generate( single_prompt="Résume ce document : {content}" ) .with_limit(3) .do() ) # Access generated text for doc in result["data"]["Get"]["Document"]: print(doc["_additional"]["generate"]["singleResult"])

Multi-Tenancy

DEVELOPERpython
# Create tenants client.schema.add_class_tenants( class_name="Document", tenants=[ {"name": "tenant_a"}, {"name": "tenant_b"} ] ) # Query specific tenant result = ( client.query .get("Document", ["content"]) .with_tenant("tenant_a") .with_near_text({"concepts": ["query"]}) .do() )

Réplication

DEVELOPERyaml
# docker-compose.yml with 3 nodes services: weaviate-node1: image: semitechnologies/weaviate:latest environment: CLUSTER_HOSTNAME: 'node1' CLUSTER_GOSSIP_BIND_PORT: '7100' CLUSTER_DATA_BIND_PORT: '7101' weaviate-node2: image: semitechnologies/weaviate:latest environment: CLUSTER_HOSTNAME: 'node2' CLUSTER_JOIN: 'weaviate-node1:7100'

Pipeline RAG Python

DEVELOPERpython
def weaviate_rag(query): # Retrieve with generative search result = ( client.query .get("Document", ["content", "title"]) .with_near_text({"concepts": [query]}) .with_generate( grouped_task=f"Réponds à cette question : {query}", grouped_properties=["content"] ) .with_limit(5) .do() ) # Extract answer answer = result["data"]["Get"]["Document"][0]["_additional"]["generate"]["groupedResult"] return answer # Usage answer = weaviate_rag("What is machine learning?") print(answer)

L'interface GraphQL de Weaviate et le RAG intégré le rendent idéal pour le prototypage rapide.

Tags

weaviatebase-de-données-vectoriellegraphqlstorage

Articles connexes

Ailog Assistant

Ici pour vous aider

Salut ! Pose-moi des questions sur Ailog et comment intégrer votre RAG dans vos projets !