-
Notifications
You must be signed in to change notification settings - Fork 11
WIP: OTEL context conformance with tlsdesc_v1_dev spec #347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Implement feature-flagged context storage with two modes: - profiler (default): existing TLS-based storage with checksum - otel: ring buffer storage discoverable via /proc/<pid>/maps Key components: - ContextApi: unified abstraction layer for both modes - OtelContexts: mmap-based ring buffer with in_use flag protocol - ctxstorage option: select mode at profiler startup 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Make libclang-rt-dev package conditional - only available on x64. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Update implementation to match ctx-sharing-demo reference. Note on naming: "V2" refers to the TLS record format version (struct layout with flexible array), while "tlsdesc_v1_dev" is the schema/ protocol version string. This matches the reference implementation which uses customlabels_v2.h but schema_version="tlsdesc_v1_dev". TLS Record (V2 format): - Fix struct layout: 28-byte header (removed root_span_id) - Use flexible array for attrs_data - Correct field ordering per tlsdesc_v1_dev schema Process Context: - schema_version: string "tlsdesc_v1_dev" (was int) - attribute_key_map: Array encoding (was KvList) - Mapping: writable (rw-p/rw-s) per PR #34, 1 page - Remove mprotect to read-only Fixes: - clear() properly invalidates V2 record - Reader accepts both r-- and rw- permissions - Tests updated for writable mappings Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
ivoanjo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gave it a pass! There were a few of "uuuuh what? 👀 moments". I guess AI?
| CASE("ctxstorage") | ||
| if (value != NULL) { | ||
| if (strcmp(value, "otel") == 0) { | ||
| _context_storage = CTX_STORAGE_OTEL; | ||
| } else { | ||
| _context_storage = CTX_STORAGE_PROFILER; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious -- why keep both around? E.g. is it more of a "while we're testing things out and we can eventually clean it up", or do we suspect the previous approach will stick around for the long run?
| void ContextApi::set(u64 span_id, u64 root_span_id) { | ||
| // Map Datadog format to storage | ||
| // In OTEL mode: trace_id = (0, root_span_id), span_id = span_id | ||
| setOtel(0, root_span_id, span_id); | ||
| } | ||
|
|
||
| void ContextApi::setOtel(u64 trace_id_high, u64 trace_id_low, u64 span_id) { | ||
| // Use atomic load for mode check - may be called from signal handlers | ||
| ContextStorageMode mode = __atomic_load_n(&_mode, __ATOMIC_ACQUIRE); | ||
|
|
||
| TEST_LOG("ContextApi::setOtel: tid=%d mode=%s trace_high=0x%llx trace_low=0x%llx span=0x%llx", | ||
| OS::threadId(), mode == CTX_STORAGE_OTEL ? "OTEL" : "PROFILER", | ||
| (unsigned long long)trace_id_high, (unsigned long long)trace_id_low, | ||
| (unsigned long long)span_id); | ||
|
|
||
| if (mode == CTX_STORAGE_OTEL) { | ||
| OtelContexts::set(trace_id_high, trace_id_low, span_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks weird -- root_span_id becomes trace_id_low... 👀
We can also keep root_span_id for otel, by putting it as an extra attribute (goes in the attrs-data section); this way we don't lose any functionality between the previous format and otel context.
| bool ContextApi::getByTid(int tid, u64& span_id, u64& root_span_id) { | ||
| // Use atomic load for mode check - may be called from signal handlers | ||
| ContextStorageMode mode = __atomic_load_n(&_mode, __ATOMIC_ACQUIRE); | ||
|
|
||
| if (mode == CTX_STORAGE_OTEL) { | ||
| u64 trace_high, trace_low; | ||
| if (OtelContexts::getByTid(tid, trace_high, trace_low, span_id)) { | ||
| root_span_id = trace_low; | ||
| return true; | ||
| } | ||
| return false; | ||
| } else { | ||
| // Profiler mode: cannot read other thread's TLS | ||
| // This is a limitation - JVMTI wall-clock needs OTEL mode for remote reads | ||
| // Fall back to returning false (no context available) | ||
| span_id = 0; | ||
| root_span_id = 0; | ||
| return false; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the otel code path it "looks like" we could make the read, but then it's hardcoded to always fail. So getByTid never succeeds -- maybe we should completely remove it? I'm not sure when a getByTid that never succeeds would be useful...
| // Tags still come from TLS Context (even in OTEL mode, for compatibility) | ||
| Context &context = Contexts::get(); | ||
| for (size_t i = 0; i < Profiler::instance()->numContextAttributes(); i++) { | ||
| Tag tag = context.get_tag(i); | ||
| buf->putVar32(tag.value); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could put them in attrs-data for otel. In particular trace endpoint would be cool to keep, but in general we could fully support custom attributes for otel.
| @@ -469,12 +486,82 @@ Java_com_datadoghq_profiler_OTelContext_setProcessCtx0(JNIEnv *env, | |||
| .telemetry_sdk_language = const_cast<char*>("java"), | |||
| .telemetry_sdk_version = const_cast<char*>(tracer_version_str.c_str()), | |||
| .telemetry_sdk_name = const_cast<char*>("dd-trace-java"), | |||
| .resources = NULL // TODO: Arbitrary tags not supported yet for Java | |||
| .resources = NULL, // TODO: Arbitrary tags not supported yet for Java | |||
| .tls_config = &default_tls_config | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: max_record_size can be removed now
| /** | ||
| * Represents TLS context sharing configuration. | ||
| * | ||
| * <p>This configuration is used to expose thread-local storage context information | ||
| * to external profilers. The key map maps indices to attribute names, allowing | ||
| * external readers to decode compact TLS records. | ||
| */ | ||
| public static final class TlsConfig { | ||
| /** Default schema version for TLS context sharing (tlsdesc_v1_dev) */ | ||
| public static final String DEFAULT_SCHEMA_VERSION = "tlsdesc_v1_dev"; | ||
|
|
||
| /** TLS schema version string (e.g. "tlsdesc_v1_dev") */ | ||
| public final String schemaVersion; | ||
| /** Maximum bytes per TLS record */ | ||
| public final int maxRecordSize; | ||
| /** Key names in index order (position = key index, e.g. ["method", "route"]) */ | ||
| public final String[] attributeKeyMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's kinda weird to see these details come up to the Java level (other than the attribute map) -- should they maybe be hidden in the cpp bits?
| // Verify mmap region naming in /proc/self/maps (informational) | ||
| // Note: PR_SET_VMA_ANON_NAME requires kernel 5.17+ and may not work in all environments | ||
| // The OTEL buffer still works for discovery via magic number scanning if naming fails | ||
| boolean hasNamedRegion = checkMapsContains("DD_OTEL_CTX"); | ||
| if (!hasNamedRegion) { | ||
| System.out.println("INFO: DD_OTEL_CTX mmap naming not available " + | ||
| "(requires kernel 5.17+ with PR_SET_VMA_ANON_NAME support)"); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks to be outdated?
| /** | ||
| * Finds the OTEL_CTX mapping in /proc/self/maps. | ||
| * Supports both memfd mappings (/memfd:OTEL_CTX) and named anonymous mappings ([anon:OTEL_CTX]). | ||
| */ | ||
| private OtelMappingInfo findOtelMapping() throws IOException { | ||
| Path mapsFile = Paths.get("/proc/self/maps"); | ||
| if (!Files.exists(mapsFile)) { | ||
| return null; | ||
| } | ||
|
|
||
| Pattern otelPattern = Pattern.compile("^([0-9a-f]+)-([0-9a-f]+)\\s+(\\S+)\\s+\\S+\\s+\\S+\\s+\\S+\\s*\\[anon:OTEL_CTX\\].*$"); | ||
|
|
||
|
|
||
| // Pattern for named anonymous mapping: [anon:OTEL_CTX] | ||
| Pattern anonPattern = Pattern.compile("^([0-9a-f]+)-([0-9a-f]+)\\s+(\\S+)\\s+\\S+\\s+\\S+\\s+\\S+\\s*\\[anon:OTEL_CTX\\].*$"); | ||
| // Pattern for memfd mapping: /memfd:OTEL_CTX (deleted) | ||
| Pattern memfdPattern = Pattern.compile("^([0-9a-f]+)-([0-9a-f]+)\\s+(\\S+)\\s+.*?/memfd:OTEL_CTX.*$"); | ||
|
|
||
| try (BufferedReader reader = Files.newBufferedReader(mapsFile)) { | ||
| String line; | ||
| while ((line = reader.readLine()) != null) { | ||
| Matcher matcher = otelPattern.matcher(line); | ||
| if (matcher.matches()) { | ||
| Matcher anonMatcher = anonPattern.matcher(line); | ||
| if (anonMatcher.matches()) { | ||
| return new OtelMappingInfo( | ||
| anonMatcher.group(1), | ||
| anonMatcher.group(2), | ||
| anonMatcher.group(3) | ||
| ); | ||
| } | ||
| Matcher memfdMatcher = memfdPattern.matcher(line); | ||
| if (memfdMatcher.matches()) { | ||
| return new OtelMappingInfo( | ||
| matcher.group(1), | ||
| matcher.group(2), | ||
| matcher.group(3) | ||
| memfdMatcher.group(1), | ||
| memfdMatcher.group(2), | ||
| memfdMatcher.group(3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest matching only on OTEL_CTX -- may be easier?
|
|
||
| The OTEL Context Storage system provides two distinct context sharing mechanisms: | ||
|
|
||
| 1. **Thread-Level Context**: Ring buffer storage for per-thread trace/span context (existing implementation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks to be outdated? In particular, it's not a ring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's very probably referred as a ring somewhere in the docs/reference impl - claude was very persistent on referring to that as a ring buffer :)
| ### Process Context | ||
|
|
||
| 1. **Mapping Permissions**: Use `rw-p` (anonymous) or `rw-s` (memfd) | ||
| - Do NOT use `mprotect()` to make read-only | ||
| - Writable mappings allow in-place updates (PR #34) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this got carried over from the time where the pages were stated to be read-only; it's kinda weird to mention it now since the spec says nothing about it anymore. Maybe worth giving a cleaning pass in general?
Yes. Ah, you shouldn't have spent a lot of time on this ... this is just a dirty AI generated scaffolding, hence still draft .. |
What does this PR do?:
Updates the OTEL context implementation to conform with the tlsdesc_v1_dev specification from the ctx-sharing-demo reference implementation. Fixes both TLS record format and process context encoding to match the expected format for external profiler discovery.
Motivation:
The current implementation had diverged from the reference specification in ctx-sharing-demo, causing compatibility issues with external profilers trying to read context. Key issues included:
Additional Notes:
Note on naming convention: "V2" refers to the TLS record format version (struct layout with flexible array member), while "tlsdesc_v1_dev" is the schema/protocol version string. This matches the reference implementation which uses customlabels_v2.h but schema_version="tlsdesc_v1_dev".
Changes include:
How to test the change?:
./gradlew buildDebug./gradlew :ddprof-test:test --tests "*OtelContext*"VALIDATE OK: [v2] thread=..., labels=[...]All existing tests pass, including ProcessContextTest and OtelContextStorageModeTest.
For Datadog employees:
credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
Unsure? Have a question? Request a review!