Skip to content

Interconnecting with the UBShmTransport Based on the LD/ST Shared Memory Semantics.#3290

Open
zchuango wants to merge 29 commits into
apache:masterfrom
zchuango:ubshm_transport_dev
Open

Interconnecting with the UBShmTransport Based on the LD/ST Shared Memory Semantics.#3290
zchuango wants to merge 29 commits into
apache:masterfrom
zchuango:ubshm_transport_dev

Conversation

@zchuango

@zchuango zchuango commented May 9, 2026

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: #3226 #3167 #3217

Problem Summary:
After recent efforts, the UB-Ring framework has been successfully integrated with the BRPC transport framework. Currently, high-performance and low-latency communication based on the load/store (LD/ST) semantics is supported. I feel happy be able to contribute this to the community and look forward to receiving feedback and reviews. @wwbmmm @chenBright

What is changed and the side effects?

Changed:

  1. The ubring framework is added. This framework implements low-latency data communication based on the shared memory LD/ST semantics.
  2. Currently, the ubring framework supports two modes: POSIX IPC shared memory and ubs-mem remote shared memory.
  3. The ub_shm_type parameter is used to control whether to use the IPC or ubs-mem capability. Currently, ubs-mem can run on the Kunpeng 950 supernode that supports the ub protocol.
    Side effects:
  • Performance effects: NAN

  • Breaking backward compatibility:


Check List:

Comment thread src/brpc/ubshm_transport.h Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new UBRing-based shared-memory transport mode to brpc (IPC + optional ubs-mem backend) and wires it into the Socket/Transport framework, along with docs and a performance example.

Changes:

  • Introduce UBRing transport (SOCKET_MODE_UBRING) with endpoint handshake, polling, and ring manager infrastructure.
  • Add shared-memory backend abstraction (POSIX IPC + ubs-mem via dlopen’d SDK stubs/headers) plus timer utilities.
  • Update build/docs/examples to expose the feature and provide a basic performance harness.

Reviewed changes

Copilot reviewed 43 out of 43 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
src/brpc/ubshm/ubs_mem/ubshmem_stub.cpp Adds stub implementations of ubs-mem APIs for non-ubs environments/UT.
src/brpc/ubshm/ubs_mem/ubs_mem.h Introduces ubs-mem C API header used by the UBS backend integration.
src/brpc/ubshm/ubs_mem/ubs_mem_def.h Defines ubs-mem types/constants used by the UBS backend integration.
src/brpc/ubshm/ubs_mem/declare_shm_ubs.h Declares the dynamically loaded ubs-mem function pointer table.
src/brpc/ubshm/ubr_trx.h Defines core UBR transaction structures and states.
src/brpc/ubshm/ubr_msg.h Defines UBR message chunk format used by the ring transport.
src/brpc/ubshm/ub_ring.h Declares UBRing read/write and lifecycle APIs used by the endpoint.
src/brpc/ubshm/ub_ring_manager.h Declares global manager for UBR transactions and link bookkeeping.
src/brpc/ubshm/ub_ring_manager.cpp Implements UBR transaction manager and UB event callback plumbing.
src/brpc/ubshm/ub_helper.h Declares UBRing global init/availability helpers.
src/brpc/ubshm/ub_helper.cpp Implements global init/fini, availability flags, and polling init.
src/brpc/ubshm/ub_endpoint.h Declares UB shared-memory endpoint and polling infrastructure.
src/brpc/ubshm/ub_endpoint.cpp Implements handshake, polling loop, and I/O integration with Socket/InputMessenger.
src/brpc/ubshm/timer/timer_mgr.h Declares timer module used by UBS cleanup/recovery flows.
src/brpc/ubshm/timer/timer_mgr.cpp Implements epoll/kqueue-based timer dispatch for UBRing subsystems.
src/brpc/ubshm/shm/shm_ubs.h Declares UBS backend shared-memory operations.
src/brpc/ubshm/shm/shm_ubs.cpp Implements UBS backend via dynamically loaded ubs-mem SDK.
src/brpc/ubshm/shm/shm_mgr.h Declares backend-agnostic SHM manager interface.
src/brpc/ubshm/shm/shm_mgr.cpp Implements SHM manager selecting IPC vs UBS backend via flag.
src/brpc/ubshm/shm/shm_ipc.h Declares POSIX IPC SHM backend operations.
src/brpc/ubshm/shm/shm_ipc.cpp Implements POSIX IPC SHM backend operations.
src/brpc/ubshm/shm/shm_def.h Adds SHM structs/constants used across SHM backends and UBRing.
src/brpc/ubshm/common/thread_lock.h Adds RAII-style mutex/spin/rwlock/semaphore guard macros.
src/brpc/ubshm/common/common.h Adds common macros/types/constants used throughout UBRing code.
src/brpc/ubshm_transport.h Declares UBShmTransport implementing the Transport interface.
src/brpc/ubshm_transport.cpp Implements transport selection between UBRing and TCP fallback paths.
src/brpc/transport_factory.cpp Wires SOCKET_MODE_UBRING into transport creation/context init.
src/brpc/socket.h Adds UB endpoint/connect friend declarations for Socket integration.
src/brpc/socket_mode.h Adds SOCKET_MODE_UBRING enum value.
src/brpc/rdma_transport.cpp Adjusts RDMA transport’s TCP fallback member initialization (currently broken).
src/brpc/input_messenger.h Adds UB endpoint friend declaration to support message processing hooks.
src/brpc/input_messenger.cpp Extends RDMA-special message queuing behavior to UBRing sockets.
src/brpc/controller.h Guards latency_us() against unset begin time.
README.md Adds docs link for UBRing.
README_cn.md Adds docs link for UBRing (CN).
example/ubring_performance/test.proto Adds proto for UBRing performance test example.
example/ubring_performance/server.cpp Adds UBRing-capable perf test server example.
example/ubring_performance/client.cpp Adds UBRing-capable perf test client example.
example/ubring_performance/CMakeLists.txt Adds standalone CMake build for the performance example.
docs/en/ubring.md Documents build/run/configuration and backend selection for UBRing.
docs/cn/ubring.md Chinese documentation for UBRing build/run/configuration.
CMakeLists.txt Adds WITH_UBRING option and compile definition wiring.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/brpc/rdma_transport.cpp Outdated
Comment thread src/brpc/ubshm_transport.cpp
Comment thread src/brpc/ubshm/ub_endpoint.cpp
Comment thread src/brpc/ubshm/ub_endpoint.cpp Outdated
Comment thread src/brpc/ubshm/ub_endpoint.cpp Outdated
Comment thread src/brpc/ubshm/ub_ring_manager.cpp
Comment thread src/brpc/ubshm/shm/shm_ubs.cpp Outdated
Comment thread src/brpc/ubshm/shm/shm_mgr.cpp
Comment thread CMakeLists.txt Outdated
Comment thread src/brpc/ubshm_transport.cpp
Comment thread docs/cn/ubring.md
g_last_time.store(0, butil::memory_order_relaxed);

brpc::ServerOptions options;
options.socket_mode = FLAGS_use_ubring? brpc::SOCKET_MODE_UBRING : brpc::SOCKET_MODE_TCP;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

brpc::ServerOptions socket_mode default use tcp mode is better。

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it reference example/rdma_performance code style,switching to the default TCP mode also works fine.

return -1;
}
ubring::GlobalUBInitializeOrDie();
if (!ubring::InitPollingModeWithTag(bthread_self_tag())) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does ubring only support polling mode?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The LD/ST shared memory has this limitation. Currently, only the polling mode is supported. The time waiting mode requires the support of the OS kernel or hardware.

Comment thread docs/cn/ubring.md

### 2. UBS-Mem 远端共享内存 (ub\_shm\_type = 2)

此模式使用 ubs-mem(Unified Block Storage Memory),这是来自 openEuler 的开源远端共享内存框架。它支持机架内节点之间的共享内存通信,类似于 RDMA 但部署要求更简单。

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you list the libraries that need to be used?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll list the depends libraries later

@wwbmmm

wwbmmm commented May 18, 2026

Copy link
Copy Markdown
Contributor

LGTM

@yanglimingcn

Copy link
Copy Markdown
Contributor

The issue has been communicated, and subsequent PR efforts will proceed in stages.
LGTM

@chenBright

Copy link
Copy Markdown
Contributor

There is a compilation error on macOS:

In file included from brpc/src/brpc/ubshm/ub_endpoint.cpp:31:
brpc/src/brpc/ubshm/ub_endpoint.h:87:27: error: use of undeclared identifier 'EPOLLOUT'
   87 |         uint32_t events = EPOLLOUT | EPOLLET;
|                           ^
brpc/src/brpc/ubshm/ub_endpoint.h:89:56: error: use of undeclared identifier 'EPOLLIN'
89 |             PollerRegisterEvent(CqSidOp::MOD, events | EPOLLIN);
|                                                        ^
brpc/src/brpc/ubshm/ub_endpoint.h:96:27: error: use of undeclared identifier 'EPOLLIN'
96 |         uint32_t events = EPOLLIN | EPOLLET;

@chenBright

Copy link
Copy Markdown
Contributor

Please update cmake ci to compile UBShmTransport:

- name: clang with all options
run: |
export CC=clang && export CXX=clang++
mkdir clang_build_all && cd clang_build_all
cmake -DWITH_MESALINK=OFF -DWITH_GLOG=ON -DWITH_THRIFT=ON -DWITH_RDMA=ON -DWITH_DEBUG_BTHREAD_SCHE_SAFETY=ON -DWITH_DEBUG_LOCK=ON -DWITH_BTHREAD_TRACER=ON -DWITH_ASAN=ON -DCMAKE_POLICY_VERSION_MINIMUM=3.5 ..
make -j ${{env.proc_num}} && make clean

- name: compile with cmake
run: |
echo "CMAKE_PREFIX_PATH=$(brew --prefix protobuf@21)"
mkdir build && cd build && cmake -DCMAKE_POLICY_VERSION_MINIMUM=3.5 -DCMAKE_PREFIX_PATH=$(brew --prefix protobuf@21) ..
make -j ${{env.proc_num}} && make clean

- name: compile with cmake
run: |
mkdir build && cd build && cmake -DCMAKE_POLICY_VERSION_MINIMUM=3.5 -DCMAKE_PREFIX_PATH=$(brew --prefix protobuf@29) ..
make -j ${{env.proc_num}} && make clean

@zchuango

zchuango commented May 21, 2026

Copy link
Copy Markdown
Contributor Author

Please update cmake ci to compile UBShmTransport:

okay, good suggestion ! I will check CI/Testing pipeline later.

@zchuango

Copy link
Copy Markdown
Contributor Author

@chenBright I have resolve the macOS compilation error and add updated cmake ci to compile UBShmTransport, recheck it please,The current CI testing error seems to be intermittent; I can pass CI tests in my own repository.

@chenBright

chenBright commented May 26, 2026

Copy link
Copy Markdown
Contributor

@zchuango When I run the ubring demo on Ubuntu, the server crashes.

./ubring_performance_client -use_ubring=true -echo_attachment=true -attachment_size=1048576

./ubring_performance_server -use_ubring=true
Using host libthread_db library "/usr/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./ubring_performance_server -use_ubring=true'.
--Type <RET> for more, q to quit, c to continue without paging--
Program terminated with signal SIGBUS, Bus error.
#0  __memset_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:228

warning: 228	../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory
[Current thread is 1 (Thread 0x7f69dcff96c0 (LWP 19284))]
(gdb) bt
#0  __memset_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:228
#1  0x0000558a87ac463b in memset (__len=<optimized out>, __ch=0, __dest=<optimized out>)
at /usr/include/x86_64-linux-gnu/bits/string_fortified.h:59
#2  brpc::ubring::ShmLocalCalloc (shm=shm@entry=0x7f68f06f7e90) at /brpc/src/brpc/ubshm/shm/shm_mgr.cpp:117
#3  0x0000558a879e925c in brpc::ubring::UBRing::UbrAllocateServerShm (this=0x7f69ac058900, remote_trx_shm=remote_trx_shm@entry=0x7f68f06f7e40,
local_trx_shm=local_trx_shm@entry=0x7f68f06f7e90) at /brpc/src/brpc/ubshm/ub_ring.cpp:796
#4  0x0000558a879e32e5 in brpc::ubring::UBShmEndpoint::AllocateServerResources (this=this@entry=0x7f66c4023a40,
remote_trx_shm=remote_trx_shm@entry=0x7f68f06f7e40, local_trx_shm=local_trx_shm@entry=0x7f68f06f7e90)
at /brpc/src/brpc/ubshm/ub_endpoint.cpp:712
#5  0x0000558a879e3e6b in brpc::ubring::UBShmEndpoint::ProcessHandshakeAtServer (arg=0x7f66c4023a40)
at /brpc/src/brpc/ubshm/ub_endpoint.cpp:548
#6  0x0000558a877d11b7 in bthread::TaskGroup::task_runner (skip_remained=<optimized out>) at /brpc/src/bthread/task_group.cpp:388
#7  0x0000558a8786d6c1 in bthread_make_fcontext ()
#8  0x0000000000000000 in ?? ()

@zchuango

zchuango commented May 26, 2026

Copy link
Copy Markdown
Contributor Author

@zchuango When I run the ubring demo on Ubuntu, the server crashes.

./ubring_performance_client -use_ubring=true -echo_attachment=true -attachment_size=6291456

./ubring_performance_server -use_ubring=true
Using host libthread_db library "/usr/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./ubring_performance_server -use_ubring=true'.
--Type <RET> for more, q to quit, c to continue without paging--
Program terminated with signal SIGBUS, Bus error.
#0  __memset_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:228

warning: 228	../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory
[Current thread is 1 (Thread 0x7f69dcff96c0 (LWP 19284))]
(gdb) bt
#0  __memset_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:228
#1  0x0000558a87ac463b in memset (__len=<optimized out>, __ch=0, __dest=<optimized out>)
at /usr/include/x86_64-linux-gnu/bits/string_fortified.h:59
#2  brpc::ubring::ShmLocalCalloc (shm=shm@entry=0x7f68f06f7e90) at /brpc/src/brpc/ubshm/shm/shm_mgr.cpp:117
#3  0x0000558a879e925c in brpc::ubring::UBRing::UbrAllocateServerShm (this=0x7f69ac058900, remote_trx_shm=remote_trx_shm@entry=0x7f68f06f7e40,
local_trx_shm=local_trx_shm@entry=0x7f68f06f7e90) at /brpc/src/brpc/ubshm/ub_ring.cpp:796
#4  0x0000558a879e32e5 in brpc::ubring::UBShmEndpoint::AllocateServerResources (this=this@entry=0x7f66c4023a40,
remote_trx_shm=remote_trx_shm@entry=0x7f68f06f7e40, local_trx_shm=local_trx_shm@entry=0x7f68f06f7e90)
at /brpc/src/brpc/ubshm/ub_endpoint.cpp:712
#5  0x0000558a879e3e6b in brpc::ubring::UBShmEndpoint::ProcessHandshakeAtServer (arg=0x7f66c4023a40)
at /brpc/src/brpc/ubshm/ub_endpoint.cpp:548
#6  0x0000558a877d11b7 in bthread::TaskGroup::task_runner (skip_remained=<optimized out>) at /brpc/src/bthread/task_group.cpp:388
#7  0x0000558a8786d6c1 in bthread_make_fcontext ()
#8  0x0000000000000000 in ?? ()

@zchuango zchuango closed this May 26, 2026
@zchuango zchuango reopened this May 26, 2026
@zchuango

zchuango commented May 26, 2026

Copy link
Copy Markdown
Contributor Author

@zchuango When I run the ubring demo on Ubuntu, the server crashes.

./ubring_performance_client -use_ubring=true -echo_attachment=true -attachment_size=1048576

./ubring_performance_server -use_ubring=true

@chenBright Is the error occurring during startup or a runtime error? Could you please provide relevant environment information, including OS and CPU details, so I can try to reproduce the problem?

@chenBright

chenBright commented May 26, 2026

Copy link
Copy Markdown
Contributor

Is the error occurring during startup or a runtime error?

The error occurred at runtime.

Could you please provide relevant environment information, including OS and CPU details

Some environment information:

uname -r

5.10.134-16.3.al8.x86_64

lsb_release -a

No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 24.04.1 LTS
Release:	24.04
Codename:	noble
lscpu

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         52 bits physical, 57 bits virtual
  Byte Order:            Little Endian
CPU(s):                  192
  On-line CPU(s) list:   0-191
Vendor ID:               GenuineIntel
  BIOS Vendor ID:        Intel(R) Corporation
  Model name:            Intel(R) Xeon(R) Platinum 8469C
    BIOS Model name:     Intel(R) Xeon(R) Platinum 8469C  CPU @ 2.6GHz
    BIOS CPU family:     179
    CPU family:          6
    Model:               143
    Thread(s) per core:  2
    Core(s) per socket:  48
    Socket(s):           2
    Stepping:            8
    CPU(s) scaling MHz:  82%
    CPU max MHz:         3800.0000
    CPU min MHz:         800.0000
    BogoMIPS:            5200.00
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb r
                         dtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 mon
                         itor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c r
                         drand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 invpcid_single intel_ppin cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced
                          tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a avx512f avx512dq rdseed
                          adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm
                         _total cqm_mbm_local split_lock_detect avx_vnni avx512_bf16 wbnoinvd dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_r
                         eq hfi avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq rdpid bus_lock_det
                         ect cldemote movdiri movdir64b enqcmd fsrm uintr md_clear serialize tsxldtrk pconfig arch_lbr amx_bf16 avx512_fp16 amx_tile amx_int8 flush_l
                         1d arch_capabilities
Virtualization features:
  Virtualization:        VT-x
Caches (sum of all):
  L1d:                   4.5 MiB (96 instances)
  L1i:                   3 MiB (96 instances)
  L2:                    192 MiB (96 instances)
  L3:                    195 MiB (2 instances)
NUMA:
  NUMA node(s):          2
  NUMA node0 CPU(s):     0-47,96-143
  NUMA node1 CPU(s):     48-95,144-191
Vulnerabilities:
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl and seccomp
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
  Srbds:                 Not affected
  Tsx async abort:       Not affected

Complete runtime log:

./ubring_performance_client -use_ubring=true -echo_attachment=true -attachment_size=6291456
I0526 23:04:09.249178 98087     0 /workspace/cgm/brpc/src/brpc/server.cpp:1232 StartInternal] Server[DummyServerOf(./ubring_performance_client)] is serving on port=8001.
I0526 23:04:09.249319 98087     0 /workspace/cgm/brpc/src/brpc/server.cpp:1235 StartInternal] Check out http://k8s-al-sh-gpu-rdma-h20-0032:8001 in web browser.
[Threads: 1, Depth: 1, Attachment: 6291456B, UBRING: yes, Echo: yes]
I0526 23:04:09.257395 98087     0 /workspace/cgm/brpc/src/brpc/ubshm/shm/shm_mgr.cpp:72 ShmMgrInit] shm mgr init success, shm type=1
I0526 23:04:09.267279 98099     0 /workspace/cgm/brpc/src/brpc/ubshm/ub_ring.cpp:269 UbrTrxHBCallback] Heartbeat cannot be started, wait connected state.
Avg-Latency: 0, 90th-Latency: 0, 99th-Latency: 0, 99.9th-Latency: 0, Throughput: 29.9741MB/s, QPS: 0k, Server CPU-utilization: 0%, Client CPU-utilization: 101%
[Threads: 2, Depth: 1, Attachment: 6291456B, UBRING: yes, Echo: yes]
I0526 23:04:30.303327 98099     0 /workspace/cgm/brpc/src/brpc/ubshm/ub_ring.cpp:269 UbrTrxHBCallback] Heartbeat cannot be started, wait connected state.
Avg-Latency: 0, 90th-Latency: 0, 99th-Latency: 0, 99.9th-Latency: 0, Throughput: 0.299211MB/s, QPS: 0k, Server CPU-utilization: 0%, Client CPU-utilization: 102%
[Threads: 4, Depth: 1, Attachment: 6291456B, UBRING: yes, Echo: yes]
W0526 23:04:50.475469 98092 4294969093 /workspace/cgm/brpc/src/brpc/ubshm/ub_endpoint.cpp:385 ProcessHandshakeAtClient] Fail to get hello message from server:brpc::Socket{id=5 fd=14 addr=0.0.0.0:8002:57824} (0x564645f47910): Got EOF
W0526 23:04:50.475563 98087     0 /workspace/cgm/brpc/example/ubring_performance/client.cpp:131 Init] RPC call failed, retrying... (3 left): [E1014]Fail to complete ubring handshake from brpc::Socket{id=5 fd=14 addr=0.0.0.0:8002:57824} (0x564645f47910): Got EOF
W0526 23:04:51.475721 98087     0 /workspace/cgm/brpc/example/ubring_performance/client.cpp:131 Init] RPC call failed, retrying... (2 left): [E112]Not connected to 0.0.0.0:8002 yet, server_id=5
W0526 23:04:52.475883 98087     0 /workspace/cgm/brpc/example/ubring_performance/client.cpp:131 Init] RPC call failed, retrying... (1 left): [E112]Not connected to 0.0.0.0:8002 yet, server_id=5
E0526 23:04:53.476011 98087     0 /workspace/cgm/brpc/example/ubring_performance/client.cpp:135 Init] RPC call failed after multiple retries
./ubring_performance_server -use_ubring=true
I0526 23:00:15.982886 97452     0 /brpc/src/brpc/ubshm/shm/shm_mgr.cpp:72 ShmMgrInit] shm mgr init success, shm type=1
I0526 23:00:15.997779 97452     0 /brpc/src/brpc/server.cpp:1232 StartInternal] Server[test::PerfTestServiceImpl] is serving on port=8002.
I0526 23:00:15.998154 97452     0 /brpc/src/brpc/server.cpp:1235 StartInternal] Check out http://k8s-al-sh-gpu-rdma-h20-0032:8002 in web browser.
I0526 23:00:46.670268 97457 4294969601 /brpc/src/brpc/ubshm/ub_ring.cpp:1021 UbrTrxCloseCheck] Trx close skipped, already closing, trx local name=UBRING_127.0.0.1:35304_S
I0526 23:00:46.670297 97457 4294969601 /brpc/src/brpc/ubshm/ub_ring.cpp:62 UbrTrxClose] Trx close skipped, already closing, local name=UBRING_127.0.0.1:35304_S
I0526 23:00:56.666588 97464     0 /brpc/src/brpc/ubshm/shm/shm_ipc.cpp:185 IpcShmRemoteFree] IPC free remote shm=UBRING_127.0.0.1:35304_C success.
I0526 23:00:56.666952 97464     0 /brpc/src/brpc/ubshm/shm/shm_ipc.cpp:78 IpcShmMunmap] IPC unmap shm=UBRING_127.0.0.1:35304_S length=4194304 success.
I0526 23:00:56.667327 97464     0 /brpc/src/brpc/ubshm/shm/shm_ipc.cpp:185 IpcShmRemoteFree] IPC free remote shm=UBRING_127.0.0.1:35304_C success.
E0526 23:02:17.708842 97484 4294969601 /brpc/src/brpc/ubshm/common/common.h:173 HasTimedOut] task time out 5 seconds.
W0526 23:02:17.708876 97484 4294969601 /brpc/src/brpc/ubshm/ub_ring.cpp:85 UbrTrxClose] Local shm UBRING_127.0.0.1:41854_S wait for the peer to close timed out, force cleanup.
I0526 23:02:17.709291 97484 4294969601 /brpc/src/brpc/ubshm/shm/shm_ipc.cpp:185 IpcShmRemoteFree] IPC free remote shm=UBRING_127.0.0.1:41854_C success.
I0526 23:02:17.709631 97484 4294969601 /brpc/src/brpc/ubshm/shm/shm_ipc.cpp:78 IpcShmMunmap] IPC unmap shm=UBRING_127.0.0.1:41854_S length=4194304 success.
I0526 23:02:17.709974 97484 4294969601 /brpc/src/brpc/ubshm/shm/shm_ipc.cpp:185 IpcShmRemoteFree] IPC free remote shm=UBRING_127.0.0.1:41854_C success.
[1]    97452 bus error (core dumped)  ./ubring_performance_server -use_ubring=true

coredump:

Using host libthread_db library "/usr/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./ubring_performance_server -use_ubring=true'.
--Type <RET> for more, q to quit, c to continue without paging--
Program terminated with signal SIGBUS, Bus error.
#0  __memset_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:228

warning: 228	../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory
[Current thread is 1 (Thread 0x7f69dcff96c0 (LWP 19284))]
(gdb) bt
#0  __memset_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:228
#1  0x0000558a87ac463b in memset (__len=<optimized out>, __ch=0, __dest=<optimized out>)
at /usr/include/x86_64-linux-gnu/bits/string_fortified.h:59
#2  brpc::ubring::ShmLocalCalloc (shm=shm@entry=0x7f68f06f7e90) at /brpc/src/brpc/ubshm/shm/shm_mgr.cpp:117
#3  0x0000558a879e925c in brpc::ubring::UBRing::UbrAllocateServerShm (this=0x7f69ac058900, remote_trx_shm=remote_trx_shm@entry=0x7f68f06f7e40,
local_trx_shm=local_trx_shm@entry=0x7f68f06f7e90) at /brpc/src/brpc/ubshm/ub_ring.cpp:796
#4  0x0000558a879e32e5 in brpc::ubring::UBShmEndpoint::AllocateServerResources (this=this@entry=0x7f66c4023a40,
remote_trx_shm=remote_trx_shm@entry=0x7f68f06f7e40, local_trx_shm=local_trx_shm@entry=0x7f68f06f7e90)
at /brpc/src/brpc/ubshm/ub_endpoint.cpp:712
#5  0x0000558a879e3e6b in brpc::ubring::UBShmEndpoint::ProcessHandshakeAtServer (arg=0x7f66c4023a40)
at /brpc/src/brpc/ubshm/ub_endpoint.cpp:548
#6  0x0000558a877d11b7 in bthread::TaskGroup::task_runner (skip_remained=<optimized out>) at /brpc/src/bthread/task_group.cpp:388
#7  0x0000558a8786d6c1 in bthread_make_fcontext ()
#8  0x0000000000000000 in ?? ()

@chenBright

Copy link
Copy Markdown
Contributor

Another crash:

./ubring_performance_client -use_ubring=true -echo_attachment=true -attachment_size=6291456
I0526 23:17:51.313918 98707     0 /brpc/src/brpc/server.cpp:1232 StartInternal] Server[DummyServerOf(./ubring_performance_client)] is serving on port=8001.
I0526 23:17:51.314074 98707     0 /brpc/src/brpc/server.cpp:1235 StartInternal] Check out http://k8s-al-sh-gpu-rdma-h20-0032:8001 in web browser.
[Threads: 1, Depth: 1, Attachment: 6291456B, UBRING: yes, Echo: yes]
I0526 23:17:51.321939 98707     0 /brpc/src/brpc/ubshm/shm/shm_mgr.cpp:72 ShmMgrInit] shm mgr init success, shm type=1
I0526 23:17:51.332043 98719     0 /brpc/src/brpc/ubshm/ub_ring.cpp:269 UbrTrxHBCallback] Heartbeat cannot be started, wait connected state.
Avg-Latency: 0, 90th-Latency: 0, 99th-Latency: 0, 99.9th-Latency: 0, Throughput: 64.0254MB/s, QPS: 0k, Server CPU-utilization: 0%, Client CPU-utilization: 102%
[Threads: 2, Depth: 1, Attachment: 6291456B, UBRING: yes, Echo: yes]
[1]    98707 bus error (core dumped)  ./ubring_performance_client -use_ubring=true -echo_attachment=true
./ubring_performance_server -use_ubring=true
I0526 23:17:49.302722 98508     0 /brpc/src/brpc/ubshm/shm/shm_mgr.cpp:72 ShmMgrInit] shm mgr init success, shm type=1
I0526 23:17:49.318155 98508     0 /brpc/src/brpc/server.cpp:1232 StartInternal] Server[test::PerfTestServiceImpl] is serving on port=8002.
I0526 23:17:49.318282 98508     0 /brpc/src/brpc/server.cpp:1235 StartInternal] Check out http://k8s-al-sh-gpu-rdma-h20-0032:8002 in web browser.
W0526 23:18:15.647572 98668 8589934810 /brpc/src/brpc/ubshm/ub_endpoint.cpp:480 ProcessHandshakeAtServer] Fail to read Hello Message from client:brpc::Socket{id=234 fd=11 addr=127.0.0.1:51360:8002} (0x7f2c24025030) 127.0.0.1:51360: Got EOF
I0526 23:18:16.331656 98520     0 /brpc/src/brpc/ubshm/ub_ring.cpp:269 UbrTrxHBCallback] Heartbeat cannot be started, wait connected state.
E0526 23:18:20.648439 98660 8589934772 /brpc/src/brpc/ubshm/common/common.h:173 HasTimedOut] task time out 5 seconds.
W0526 23:18:20.648472 98660 8589934772 /brpc/src/brpc/ubshm/ub_ring.cpp:85 UbrTrxClose] Local shm UBRING_127.0.0.1:36514_S wait for the peer to close timed out, force cleanup.
I0526 23:18:21.332054 98660 8589934772 /brpc/src/brpc/ubshm/shm/shm_ipc.cpp:185 IpcShmRemoteFree] IPC free remote shm=UBRING_127.0.0.1:36514_C success.
I0526 23:18:21.332468 98660 8589934772 /brpc/src/brpc/ubshm/shm/shm_ipc.cpp:78 IpcShmMunmap] IPC unmap shm=UBRING_127.0.0.1:36514_S length=4194304 success.
I0526 23:18:21.332848 98660 8589934772 /brpc/src/brpc/ubshm/shm/shm_ipc.cpp:185 IpcShmRemoteFree] IPC free remote shm=UBRING_127.0.0.1:36514_C success.
warning: 228	../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory
[Current thread is 1 (Thread 0x7f7db0ff96c0 (LWP 98717))]
(gdb)
(gdb) bt
#0  __memset_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:228
#1  0x000055df0fec824b in memset (__len=<optimized out>, __ch=0, __dest=<optimized out>)
    at /usr/include/x86_64-linux-gnu/bits/string_fortified.h:59
#2  brpc::ubring::ShmLocalCalloc (shm=shm@entry=0x7f7d91af4e00) at /brpc/src/brpc/ubshm/shm/shm_mgr.cpp:117
#3  0x000055df0fea20de in brpc::ubring::UBRing::ApplyAndMapLocalShm (this=this@entry=0x7f7d8c031a00,
    localTrxShm=localTrxShm@entry=0x7f7d91af4e00, localName=localName@entry=0x7f7d91af4e50 "127.0.0.1:51360")
    at /brpc/src/brpc/ubshm/ub_ring.cpp:911
#4  0x000055df0fea25a2 in brpc::ubring::UBRing::UbrAllocateLocalShm (this=0x7f7d8c031a00,
    local_trx_shm=local_trx_shm@entry=0x7f7d91af4e00, shm_name=shm_name@entry=0x7f7d91af4e50 "127.0.0.1:51360")
    at /brpc/src/brpc/ubshm/ub_ring.cpp:827
#5  0x000055df0fe9aa35 in brpc::ubring::UBShmEndpoint::AllocateClientResources (this=this@entry=0x55df11984ee0,
    local_trx_shm=local_trx_shm@entry=0x7f7d91af4e00, shm_name=shm_name@entry=0x7f7d91af4e50 "127.0.0.1:51360")
    at /brpc/src/brpc/ubshm/ub_endpoint.cpp:687
#6  0x000055df0fe9ae6a in brpc::ubring::UBShmEndpoint::ProcessHandshakeAtClient (arg=0x55df11984ee0)
    at /brpc/src/brpc/ubshm/ub_endpoint.cpp:356
#7  0x000055df0fc09c97 in bthread::TaskGroup::task_runner (skip_remained=<optimized out>)
    at /brpc/src/bthread/task_group.cpp:388
#8  0x000055df0fde1571 in bthread_make_fcontext ()
#9  0x0000000000000000 in ?? ()

YChange01 and others added 2 commits May 29, 2026 19:21
Co-authored-by: 郭业昌 <lvpengfei@MacBook-Air.local>
@zchuango

zchuango commented May 30, 2026

Copy link
Copy Markdown
Contributor Author

@chenBright please try it again, I have add some refine code logical

@zchuango zchuango requested a review from chenBright May 30, 2026 11:23
@chenBright

chenBright commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

@zchuango Could you add some unit tests for UBShmTransport?

@zchuango

zchuango commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

@zchuango Could you add some unit tests for UBShmTransport?

Yes, I am writing some test cases for UbShmTransport, planning to submit them to unittest in the next phase. Please help write a review approve for the merge. @chenBright

@chenBright

Copy link
Copy Markdown
Contributor

@zchuango Could you add some unit tests for UBShmTransport?

Yes, I am writing some test cases for UbShmTransport, planning to submit them to unittest in the next phase. Please help write a review approve for the merge. @chenBright

I think it's best to submit unit tests in this PR.

@chenBright chenBright left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using the latest code and I'm encountering the same crash as before.

Comment thread src/brpc/ubshm/timer/timer_mgr.cpp Outdated
Comment on lines +32 to +37
int32_t g_epollFd = -1;
std::atomic<uint32_t> g_totalTimerNum;
TimerFdCtx *g_timerFdCtxMap = NULL;
uint32_t maxSystemFd;
static pthread_t g_epollExecuteThread;
static int32_t g_timerModuleInitialized;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. These variables need to be set with default values.
  2. The variable name should be snake_case.

Comment thread src/brpc/ubshm/timer/timer_mgr.cpp Outdated
maxSystemFd = (uint32_t)rlim.rlim_cur;

if (g_timerFdCtxMap == NULL) {
g_timerFdCtxMap = (TimerFdCtx *)malloc(sizeof(TimerFdCtx) * maxSystemFd);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

g_timerFdCtxMap may consume a lot of memory.

Comment thread src/brpc/ubshm/timer/timer_mgr.cpp Outdated
return atomic_load(&g_totalTimerNum);
}

void CloseTimerFd(uint32_t fd) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uint32_t -> int

Comment thread src/brpc/ubshm/common/common.h Outdated
Comment on lines +26 to +27
#define LIKELY(x) __builtin_expect(!!(x), 1)
#define UNLIKELY(x) __builtin_expect(!!(x), 0)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use BAIDU_LIKELY and BAIDU_UNLIKELY instead.

Comment thread src/brpc/ubshm/ubr_msg.h Outdated
uint8_t inner[UBR_MSG_PAYLOAD_LEN];
} UbrMsgPayload;

typedef struct __attribute__((aligned(64))) TagUbrMsgFormat {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use BAIDU_CACHELINE_ALIGNMENT instead.

Comment on lines +40 to +44
#define LOCK_GUARD(mtxPtr) \
pthread_mutex_t *__attribute__((cleanup(UnlockMutex))) _mtxPtr = ({ \
pthread_mutex_lock(&(mtxPtr)); \
&(mtxPtr); \
})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use BAIDU_SCOPED_LOCK or std::lock_guard insteal.

Comment thread src/brpc/ubshm/common/thread_lock.h Outdated
Comment on lines +55 to +59
#define SPIN_LOCK_GUARD(spinLockPtr) \
pthread_spinlock_t *__attribute__((cleanup(UnlockSpinLock))) _spinLockPtr = ({ \
pthread_spin_lock(&(spinLockPtr)); \
&(spinLockPtr); \
})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use BAIDU_SCOPED_LOCK or std::lock_guard insteal.

extern "C" {
#endif

static inline void UnlockMutex(pthread_mutex_t **mtx)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functions and macros defined in this file are not used; it is recommended to remove them.

Comment thread src/brpc/ubshm/shm/shm_ubs.cpp Outdated

RETURN_CODE UbsShmInit(void)
{
// 加载libubsm_sdk.so函数指针

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use English.

Comment thread src/brpc/ubshm/ub_ring.cpp Outdated
if (UNLIKELY(CheckTrxSendPreCheck(_trx) != UBRING_OK)) {
return UBRING_ERR;
}
// 1.2 计算空间

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use English.

@zchuango

Copy link
Copy Markdown
Contributor Author

@zchuango Could you add some unit tests for UBShmTransport?

Yes, I am writing some test cases for UbShmTransport, planning to submit them to unittest in the next phase. Please help write a review approve for the merge. @chenBright

I think it's best to submit unit tests in this PR.

Okay, no problem. I'll add it in the next couple of days. @chenBright

@zchuango

Copy link
Copy Markdown
Contributor Author

I'm using the latest code and I'm encountering the same crash as before.

Really? I haven't encountered this problem on my machine, but it's an ARM machine. I'll try running it on an x86 machine first.

* 修复ubring server端关闭连接coredump问题

* 修复PollIn/PollOut解引用已释放Socket指针的问题

PollIn/PollOut通过ep->_socket(裸指针)读取data socket,当data socket
被销毁时该指针悬空,导致Socket::Address读到垃圾id触发SIGSEGV。
改为存储_socket_id(SocketId),用Address获取引用计数的Socket,
并在整个回调期间持有该引用,避免解引用悬空指针。

* 修复client非正常退出导致UBRING shm残留的问题

client被强杀(SIGTERM/崩溃/OOM)时teardown没跑完,localShm(_C)的
shm_unlink未执行,导致/dev/shm残留_C文件。server的remoteShm只munmap
不unlink(正确),无法清理client的名字。

在握手ESTABLISHED时(client/server都确认对方已mmap自己的localShm)
立即unlink localShm名字。此时对端已持有mmap引用,unlink只删名字不
影响通信;进程任意时刻退出都不会残留文件名。

* Address chenBright's review: use English comments and BAIDU_CACHELINE_ALIGNMENT

- Convert all Chinese comments in ubshm to English (per chenBright's
  'Please use English' on ub_endpoint.cpp:723, ub_ring.cpp:337,
  shm_ubs.cpp:316, and similar)
- Replace __attribute__((aligned(64))) with BAIDU_CACHELINE_ALIGNMENT
  in ubr_msg.h (per chenBright's comment on ubr_msg.h:41)
- Remove unnecessary TODO comment in ub_ring.cpp:551 (per chenBright's
  'Unnecessary comments, please delete')

* Remove unused lock macros in thread_lock.h

Per chenBright's review, the functions and macros defined in
thread_lock.h are largely unused. Verified usage across ubshm:
- LOCK_GUARD / UnlockMutex: 8 call sites in shm_ubs.cpp and
  ub_ring_manager.cpp, kept.
- SPIN_LOCK_GUARD, R_LOCK_GUARD, W_LOCK_GUARD, SEMAPHORE_WAIT_GUARD,
  SEMAPHORE_WAIT_GUARD_WITH_CLOSE and their helper functions
  (UnlockSpinLock, UnlockRWLock, PostSem, PostSemWithClose): 0 call
  sites, removed.

* Apply chenBright's review on timer_mgr globals

Per chenBright's review on timer_mgr.cpp:32-37:
- Add explicit default values to uninitialized globals
  (g_total_timer_num=0, g_max_system_fd=0, g_epoll_execute_thread=0,
  g_timer_module_initialized=0)
- Rename globals to snake_case (g_epollFd -> g_epoll_fd,
  g_totalTimerNum -> g_total_timer_num, g_timerFdCtxMap ->
  g_timer_fd_ctx_map, maxSystemFd -> g_max_system_fd,
  g_epollExecuteThread -> g_epoll_execute_thread,
  g_timerModuleInitialized -> g_timer_module_initialized)
- maxSystemFd also gains the g_ prefix to match global naming style

Also fix the missing std:: qualifier on atomic_fetch_sub/add/load
(per chenBright's earlier comment on timer_mgr.cpp:80).

* Change CloseTimerFd fd type from uint32_t to int

Per chenBright's review on timer_mgr.cpp:399 (uint32_t -> int).
fd is a system file descriptor; POSIX APIs use int and -1 denotes an
invalid fd, which uint32_t cannot represent. Changed the CloseTimerFd
signature (header + definition) and removed the now-unnecessary
(uint32_t) casts at the two call sites.

* Use BAIDU_LIKELY/BAIDU_UNLIKELY instead of custom __builtin_expect

Per chenBright's review on common.h:27. Rather than redefine the
macros with __builtin_expect directly, forward LIKELY/UNLIKELY to
brpc's standard BAIDU_LIKELY/BAIDU_UNLIKELY (from butil/compiler_specific.h).
The 122 call sites keep using LIKELY()/UNLIKELY() unchanged; only the
macro bodies change, preserving semantics.

* Add unit tests for UBShmEndpoint

Per chenBright's request to add unit tests for UBShmTransport in this
PR (rather than a follow-up).

Adds test/brpc_ubring_unittest.cpp with tests covering the public
interface of UBShmEndpoint under the g_skip_ub_init=true mode (which
skips real shared-memory/poller setup):
- construct_and_destruct: lifecycle safety
- is_writable_false_when_skip_init: skip-mode behavior
- reset_is_idempotent: Reset() is safe to call repeatedly

The file follows the brpc_*_unittest.cpp naming convention so it is
auto-collected by test/CMakeLists.txt's file(GLOB). Verified: compiles,
links, and all 3 tests pass (g++ 15.2, C++17, gtest, BRPC_WITH_UBRING=ON).

* Rewrite UBShmEndpoint unit tests with real coverage

Per chenBright's feedback that the previous tests were too simple and
did not cover the main methods.

Source changes to enable testing:
- Move HelloMessage struct declaration from ub_endpoint.cpp to
  ub_endpoint.h so tests can access it
- Expose private members under #ifdef UNIT_TEST (precedent:
  butil/containers/stack_container.h) so tests can call
  AllocateClientResources without -Dprivate=public (which breaks
  GCC 15 + new libstdc++ <any>/<sstream>)

Tests (9, all passing on Ubuntu 26.04 g++ 15.2 C++17 gtest):
HelloMessageTest (5): serialize/deserialize roundtrip, network byte
order verification, uint64 max boundary, full shm_name, toString
UBShmEndpointTest (4): construct, real IPC shm
AllocateClientResources (g_skip_ub_init=false), reset cleanup, reset
idempotency
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants