Skip to content

enhance: Add Hugging Face inference provider support#50818

Open
junjiejiangjjj wants to merge 1 commit into
milvus-io:masterfrom
junjiejiangjjj:hf-infer
Open

enhance: Add Hugging Face inference provider support#50818
junjiejiangjjj wants to merge 1 commit into
milvus-io:masterfrom
junjiejiangjjj:hf-infer

Conversation

@junjiejiangjjj

Copy link
Copy Markdown
Contributor

issue: #50816

Add Hugging Face Inference Providers client support for feature extraction and sentence similarity APIs, and wire it into text embedding and rerank model providers.

The new provider supports:

  • text embedding via feature-extraction
  • rerank scoring via sentence-similarity
  • Hugging Face router provider selection with hf_provider
  • MILVUS_HUGGINGFACE_API_KEY credential fallback
  • provider config entries for text embedding and rerank

Also add focused tests for the Hugging Face client, rerank provider, and paramtable provider docs.

Add Hugging Face Inference Providers client support for feature extraction
and sentence similarity APIs, and wire it into text embedding and rerank
model providers.

The new provider supports:
- text embedding via feature-extraction
- rerank scoring via sentence-similarity
- Hugging Face router provider selection with hf_provider
- MILVUS_HUGGINGFACE_API_KEY credential fallback
- provider config entries for text embedding and rerank

Also add focused tests for the Hugging Face client, rerank provider, and
paramtable provider docs.

Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>
@sre-ci-robot sre-ci-robot added the size/XL Denotes a PR that changes 500-999 lines. label Jun 26, 2026
@sre-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: junjiejiangjjj
To complete the pull request process, please assign yanliang567 after the PR has been reviewed.
You can assign the PR to them by writing /assign @yanliang567 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mergify mergify Bot added dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement labels Jun 26, 2026
@sre-ci-robot

Copy link
Copy Markdown
Contributor

[ci-v2-notice]
Notice: New ci-v2 system is enabled for this PR.

To rerun ci-v2 checks, comment with:

  • /ci-rerun-code-check // for ci-v2/code-check
  • /ci-rerun-code-check-macos // for Code Checker MacOS (GitHub Actions)
  • /ci-rerun-build // for ci-v2/build
  • /ci-rerun-build-all // for ci-v2/build-all (multi-arch builds)
  • /ci-rerun-buildenv // for ci-v2/build-env (build milvus-env builder images; update .env after the new tag is ready)
  • /ci-rerun-ut-integration // for ci-v2/ut-integration, will rerun ci-v2/build
  • /ci-rerun-ut-go // for ci-v2/ut-go, will rerun ci-v2/build
  • /ci-rerun-ut-cpp // for ci-v2/ut-cpp
  • /ci-rerun-ut // for all ci-v2/ut-integration, ci-v2/ut-go, ci-v2/ut-cpp, will rerun ci-v2/build
  • /ci-rerun-e2e-default // for ci-v2/e2e-default
  • /ci-rerun-e2e-amd // for ci-v2/e2e-amd (e2e pool dispatcher)
  • /ci-rerun-build-ut-cov // for ci-v2/build-ut-cov (build + unit tests in one pipeline)
  • /ci-rerun-gosdk // for ci-v2/go-sdk (Go SDK E2E tests, ARM)

If you have any questions or requests, please contact @zhikunyao.

return provider.fieldDim
}

func (provider *HuggingFaceEmbeddingProvider) CallEmbedding(_ context.Context, texts []string, _ models.TextEmbeddingMode) (any, error) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CallEmbedding(_ context.Context, texts []string, _ models.TextEmbeddingMode) ignores the embedding mode and sends the single configured prompt_name on every request, so inserts and searches are encoded with the same prompt. Asymmetric retrieval models (E5/BGE/GTE/Qwen) need different query vs document prompts — the sibling TEI provider already switches ingestion_prompt/search_prompt by mode and Gemini uses RETRIEVAL_DOCUMENT vs RETRIEVAL_QUERY — so a config like prompt_name=query applies the query prompt to documents at insert time, silently degrading retrieval quality with no knob to fix it. Add mode-specific HF prompt params by mapping the existing ingestion/search prompt concepts into the request before calling FeatureExtraction.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hugging Face Inference Providers expose a single feature-extraction pipeline API and do not define separate query/document embedding endpoints or mode-specific prompt fields.

if err := json.Unmarshal(raw, &tokenLevel); err == nil && len(tokenLevel) > 0 {
return nil, merr.WrapErrFunctionFailedMsg("Hugging Face feature-extraction returned token-level embeddings; please use a sentence embedding model or configure pooling")
}
return nil, merr.WrapErrFunctionFailedMsg("unsupported Hugging Face feature-extraction response format")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A 200 response whose body is not a recognized embedding array (e.g. an HF error envelope like {"error":"..."}) falls through to a generic unsupported Hugging Face feature-extraction response format error that discards the body, so HF's actual message is lost and the failure is opaque to diagnose. Because FeatureExtractionResponse is a json.RawMessage, PostRequest's unmarshal accepts the envelope and it reaches here; the rerank path similarly collapses such a body into unmarshal response failed. Surface the response body in the error (and, if a known transient envelope, treat it as retriable) instead of dropping it.

@sre-ci-robot

Copy link
Copy Markdown
Contributor

✅ CI Loop Results c6b76a2

Stage Result Duration Tests
✅ Build SUCCESS 9.4min -
✅ Code-Check SUCCESS 6.3min -
✅ UT-GO SUCCESS 20.0min 1071 total, 1071 passed, 0 failed
✅ UT-Integration SUCCESS 24.0min 46 total, 46 passed, 0 failed
✅ UT-CPP-Cov SUCCESS 41.9min 8000 total, 8000 passed, 0 failed

Total: 68min | Pipeline | Artifacts

Overall Coverage: 72.2%
Diff Coverage: Go 81.2% (208 hit, 48 miss, 256 measurable lines, 206 unmeasured)
Diff Coverage HTML: view changed lines
Total Patch Coverage: 81.3% (208/256 measurable lines, 206 unmeasured)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement size/XL Denotes a PR that changes 500-999 lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants