enhance: Add Hugging Face inference provider support#50818
enhance: Add Hugging Face inference provider support#50818junjiejiangjjj wants to merge 1 commit into
Conversation
Add Hugging Face Inference Providers client support for feature extraction and sentence similarity APIs, and wire it into text embedding and rerank model providers. The new provider supports: - text embedding via feature-extraction - rerank scoring via sentence-similarity - Hugging Face router provider selection with hf_provider - MILVUS_HUGGINGFACE_API_KEY credential fallback - provider config entries for text embedding and rerank Also add focused tests for the Hugging Face client, rerank provider, and paramtable provider docs. Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: junjiejiangjjj The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
[ci-v2-notice] To rerun ci-v2 checks, comment with:
If you have any questions or requests, please contact @zhikunyao. |
| return provider.fieldDim | ||
| } | ||
|
|
||
| func (provider *HuggingFaceEmbeddingProvider) CallEmbedding(_ context.Context, texts []string, _ models.TextEmbeddingMode) (any, error) { |
There was a problem hiding this comment.
CallEmbedding(_ context.Context, texts []string, _ models.TextEmbeddingMode) ignores the embedding mode and sends the single configured prompt_name on every request, so inserts and searches are encoded with the same prompt. Asymmetric retrieval models (E5/BGE/GTE/Qwen) need different query vs document prompts — the sibling TEI provider already switches ingestion_prompt/search_prompt by mode and Gemini uses RETRIEVAL_DOCUMENT vs RETRIEVAL_QUERY — so a config like prompt_name=query applies the query prompt to documents at insert time, silently degrading retrieval quality with no knob to fix it. Add mode-specific HF prompt params by mapping the existing ingestion/search prompt concepts into the request before calling FeatureExtraction.
There was a problem hiding this comment.
Hugging Face Inference Providers expose a single feature-extraction pipeline API and do not define separate query/document embedding endpoints or mode-specific prompt fields.
| if err := json.Unmarshal(raw, &tokenLevel); err == nil && len(tokenLevel) > 0 { | ||
| return nil, merr.WrapErrFunctionFailedMsg("Hugging Face feature-extraction returned token-level embeddings; please use a sentence embedding model or configure pooling") | ||
| } | ||
| return nil, merr.WrapErrFunctionFailedMsg("unsupported Hugging Face feature-extraction response format") |
There was a problem hiding this comment.
A 200 response whose body is not a recognized embedding array (e.g. an HF error envelope like {"error":"..."}) falls through to a generic unsupported Hugging Face feature-extraction response format error that discards the body, so HF's actual message is lost and the failure is opaque to diagnose. Because FeatureExtractionResponse is a json.RawMessage, PostRequest's unmarshal accepts the envelope and it reaches here; the rerank path similarly collapses such a body into unmarshal response failed. Surface the response body in the error (and, if a known transient envelope, treat it as retriable) instead of dropping it.
✅ CI Loop Results
|
| Stage | Result | Duration | Tests |
|---|---|---|---|
| ✅ Build | SUCCESS | 9.4min | - |
| ✅ Code-Check | SUCCESS | 6.3min | - |
| ✅ UT-GO | SUCCESS | 20.0min | 1071 total, 1071 passed, 0 failed |
| ✅ UT-Integration | SUCCESS | 24.0min | 46 total, 46 passed, 0 failed |
| ✅ UT-CPP-Cov | SUCCESS | 41.9min | 8000 total, 8000 passed, 0 failed |
Total: 68min | Pipeline | Artifacts
Overall Coverage: 72.2%
Diff Coverage: Go 81.2% (208 hit, 48 miss, 256 measurable lines, 206 unmeasured)
Diff Coverage HTML: view changed lines
Total Patch Coverage: 81.3% (208/256 measurable lines, 206 unmeasured)
issue: #50816
Add Hugging Face Inference Providers client support for feature extraction and sentence similarity APIs, and wire it into text embedding and rerank model providers.
The new provider supports:
Also add focused tests for the Hugging Face client, rerank provider, and paramtable provider docs.