Skip to content

Conversation

@jbachorik
Copy link
Collaborator

What does this PR do?:
Updates the OTEL context implementation to conform with the tlsdesc_v1_dev specification from the ctx-sharing-demo reference implementation. Fixes both TLS record format and process context encoding to match the expected format for external profiler discovery.

Motivation:
The current implementation had diverged from the reference specification in ctx-sharing-demo, causing compatibility issues with external profilers trying to read context. Key issues included:

  • Extra root_span_id field in TLS record (34 bytes instead of 28)
  • Wrong process context format (int schema_version instead of string, KvList instead of Array)
  • Incorrect memory mapping permissions after recent PR remove dead code #34 updates

Additional Notes:
Note on naming convention: "V2" refers to the TLS record format version (struct layout with flexible array member), while "tlsdesc_v1_dev" is the schema/protocol version string. This matches the reference implementation which uses customlabels_v2.h but schema_version="tlsdesc_v1_dev".

Changes include:

  • TLS Record (V2 format): 28-byte header, flexible array for attrs_data
  • Process Context: schema_version as string "tlsdesc_v1_dev", Array encoding for attribute_key_map
  • Memory mappings: writable (rw-p/rw-s) per PR remove dead code #34, 1 page instead of 2
  • Fixed clear() to properly invalidate V2 records
  • Updated reader and tests to accept both r-- and rw- permissions

How to test the change?:

  1. Build the debug version: ./gradlew buildDebug
  2. Run OTEL context tests: ./gradlew :ddprof-test:test --tests "*OtelContext*"
  3. Manual validation with ctx-sharing-demo validator:
    cd ~/dd/ctx-sharing-demo
    cargo build --release
    # Run Java app with OTEL context enabled
    sudo ./target/release/validate <java-pid>
    Expected output: VALIDATE OK: [v2] thread=..., labels=[...]

All existing tests pass, including ProcessContextTest and OtelContextStorageModeTest.

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles
    credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.
  • JIRA: [JIRA-XXXX]

Unsure? Have a question? Request a review!

jbachorik and others added 3 commits January 27, 2026 20:08
Implement feature-flagged context storage with two modes:
- profiler (default): existing TLS-based storage with checksum
- otel: ring buffer storage discoverable via /proc/<pid>/maps

Key components:
- ContextApi: unified abstraction layer for both modes
- OtelContexts: mmap-based ring buffer with in_use flag protocol
- ctxstorage option: select mode at profiler startup

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Make libclang-rt-dev package conditional - only available on x64.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Update implementation to match ctx-sharing-demo reference.

Note on naming: "V2" refers to the TLS record format version (struct
layout with flexible array), while "tlsdesc_v1_dev" is the schema/
protocol version string. This matches the reference implementation
which uses customlabels_v2.h but schema_version="tlsdesc_v1_dev".

TLS Record (V2 format):
- Fix struct layout: 28-byte header (removed root_span_id)
- Use flexible array for attrs_data
- Correct field ordering per tlsdesc_v1_dev schema

Process Context:
- schema_version: string "tlsdesc_v1_dev" (was int)
- attribute_key_map: Array encoding (was KvList)
- Mapping: writable (rw-p/rw-s) per PR #34, 1 page
- Remove mprotect to read-only

Fixes:
- clear() properly invalidates V2 record
- Reader accepts both r-- and rw- permissions
- Tests updated for writable mappings

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
@jbachorik jbachorik added the AI label Jan 29, 2026
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave it a pass! There were a few of "uuuuh what? 👀 moments". I guess AI?

Comment on lines +377 to +384
CASE("ctxstorage")
if (value != NULL) {
if (strcmp(value, "otel") == 0) {
_context_storage = CTX_STORAGE_OTEL;
} else {
_context_storage = CTX_STORAGE_PROFILER;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious -- why keep both around? E.g. is it more of a "while we're testing things out and we can eventually clean it up", or do we suspect the previous approach will stick around for the long run?

Comment on lines +79 to +95
void ContextApi::set(u64 span_id, u64 root_span_id) {
// Map Datadog format to storage
// In OTEL mode: trace_id = (0, root_span_id), span_id = span_id
setOtel(0, root_span_id, span_id);
}

void ContextApi::setOtel(u64 trace_id_high, u64 trace_id_low, u64 span_id) {
// Use atomic load for mode check - may be called from signal handlers
ContextStorageMode mode = __atomic_load_n(&_mode, __ATOMIC_ACQUIRE);

TEST_LOG("ContextApi::setOtel: tid=%d mode=%s trace_high=0x%llx trace_low=0x%llx span=0x%llx",
OS::threadId(), mode == CTX_STORAGE_OTEL ? "OTEL" : "PROFILER",
(unsigned long long)trace_id_high, (unsigned long long)trace_id_low,
(unsigned long long)span_id);

if (mode == CTX_STORAGE_OTEL) {
OtelContexts::set(trace_id_high, trace_id_low, span_id);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks weird -- root_span_id becomes trace_id_low... 👀

We can also keep root_span_id for otel, by putting it as an extra attribute (goes in the attrs-data section); this way we don't lose any functionality between the previous format and otel context.

Comment on lines +139 to +158
bool ContextApi::getByTid(int tid, u64& span_id, u64& root_span_id) {
// Use atomic load for mode check - may be called from signal handlers
ContextStorageMode mode = __atomic_load_n(&_mode, __ATOMIC_ACQUIRE);

if (mode == CTX_STORAGE_OTEL) {
u64 trace_high, trace_low;
if (OtelContexts::getByTid(tid, trace_high, trace_low, span_id)) {
root_span_id = trace_low;
return true;
}
return false;
} else {
// Profiler mode: cannot read other thread's TLS
// This is a limitation - JVMTI wall-clock needs OTEL mode for remote reads
// Fall back to returning false (no context available)
span_id = 0;
root_span_id = 0;
return false;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the otel code path it "looks like" we could make the read, but then it's hardcoded to always fail. So getByTid never succeeds -- maybe we should completely remove it? I'm not sure when a getByTid that never succeeds would be useful...

Comment on lines +1461 to +1466
// Tags still come from TLS Context (even in OTEL mode, for compatibility)
Context &context = Contexts::get();
for (size_t i = 0; i < Profiler::instance()->numContextAttributes(); i++) {
Tag tag = context.get_tag(i);
buf->putVar32(tag.value);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could put them in attrs-data for otel. In particular trace endpoint would be cool to keep, but in general we could fully support custom attributes for otel.

Comment on lines 476 to 490
@@ -469,12 +486,82 @@ Java_com_datadoghq_profiler_OTelContext_setProcessCtx0(JNIEnv *env,
.telemetry_sdk_language = const_cast<char*>("java"),
.telemetry_sdk_version = const_cast<char*>(tracer_version_str.c_str()),
.telemetry_sdk_name = const_cast<char*>("dd-trace-java"),
.resources = NULL // TODO: Arbitrary tags not supported yet for Java
.resources = NULL, // TODO: Arbitrary tags not supported yet for Java
.tls_config = &default_tls_config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: max_record_size can be removed now

Comment on lines +43 to +59
/**
* Represents TLS context sharing configuration.
*
* <p>This configuration is used to expose thread-local storage context information
* to external profilers. The key map maps indices to attribute names, allowing
* external readers to decode compact TLS records.
*/
public static final class TlsConfig {
/** Default schema version for TLS context sharing (tlsdesc_v1_dev) */
public static final String DEFAULT_SCHEMA_VERSION = "tlsdesc_v1_dev";

/** TLS schema version string (e.g. "tlsdesc_v1_dev") */
public final String schemaVersion;
/** Maximum bytes per TLS record */
public final int maxRecordSize;
/** Key names in index order (position = key index, e.g. ["method", "route"]) */
public final String[] attributeKeyMap;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kinda weird to see these details come up to the Java level (other than the attribute map) -- should they maybe be hidden in the cpp bits?

Comment on lines +144 to +152
// Verify mmap region naming in /proc/self/maps (informational)
// Note: PR_SET_VMA_ANON_NAME requires kernel 5.17+ and may not work in all environments
// The OTEL buffer still works for discovery via magic number scanning if naming fails
boolean hasNamedRegion = checkMapsContains("DD_OTEL_CTX");
if (!hasNamedRegion) {
System.out.println("INFO: DD_OTEL_CTX mmap naming not available " +
"(requires kernel 5.17+ with PR_SET_VMA_ANON_NAME support)");
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to be outdated?

Comment on lines +52 to +83
/**
* Finds the OTEL_CTX mapping in /proc/self/maps.
* Supports both memfd mappings (/memfd:OTEL_CTX) and named anonymous mappings ([anon:OTEL_CTX]).
*/
private OtelMappingInfo findOtelMapping() throws IOException {
Path mapsFile = Paths.get("/proc/self/maps");
if (!Files.exists(mapsFile)) {
return null;
}

Pattern otelPattern = Pattern.compile("^([0-9a-f]+)-([0-9a-f]+)\\s+(\\S+)\\s+\\S+\\s+\\S+\\s+\\S+\\s*\\[anon:OTEL_CTX\\].*$");


// Pattern for named anonymous mapping: [anon:OTEL_CTX]
Pattern anonPattern = Pattern.compile("^([0-9a-f]+)-([0-9a-f]+)\\s+(\\S+)\\s+\\S+\\s+\\S+\\s+\\S+\\s*\\[anon:OTEL_CTX\\].*$");
// Pattern for memfd mapping: /memfd:OTEL_CTX (deleted)
Pattern memfdPattern = Pattern.compile("^([0-9a-f]+)-([0-9a-f]+)\\s+(\\S+)\\s+.*?/memfd:OTEL_CTX.*$");

try (BufferedReader reader = Files.newBufferedReader(mapsFile)) {
String line;
while ((line = reader.readLine()) != null) {
Matcher matcher = otelPattern.matcher(line);
if (matcher.matches()) {
Matcher anonMatcher = anonPattern.matcher(line);
if (anonMatcher.matches()) {
return new OtelMappingInfo(
anonMatcher.group(1),
anonMatcher.group(2),
anonMatcher.group(3)
);
}
Matcher memfdMatcher = memfdPattern.matcher(line);
if (memfdMatcher.matches()) {
return new OtelMappingInfo(
matcher.group(1),
matcher.group(2),
matcher.group(3)
memfdMatcher.group(1),
memfdMatcher.group(2),
memfdMatcher.group(3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest matching only on OTEL_CTX -- may be easier?


The OTEL Context Storage system provides two distinct context sharing mechanisms:

1. **Thread-Level Context**: Ring buffer storage for per-thread trace/span context (existing implementation)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to be outdated? In particular, it's not a ring?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very probably referred as a ring somewhere in the docs/reference impl - claude was very persistent on referring to that as a ring buffer :)

Comment on lines +127 to +132
### Process Context

1. **Mapping Permissions**: Use `rw-p` (anonymous) or `rw-s` (memfd)
- Do NOT use `mprotect()` to make read-only
- Writable mappings allow in-place updates (PR #34)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this got carried over from the time where the pages were stated to be read-only; it's kinda weird to mention it now since the spec says nothing about it anymore. Maybe worth giving a cleaning pass in general?

@jbachorik
Copy link
Collaborator Author

Gave it a pass! There were a few of "uuuuh what? 👀 moments". I guess AI?

Yes. Ah, you shouldn't have spent a lot of time on this ... this is just a dirty AI generated scaffolding, hence still draft ..
But, since you already did, thanks for the valuable input!

@jbachorik jbachorik changed the title OTEL context conformance with tlsdesc_v1_dev spec WIP: OTEL context conformance with tlsdesc_v1_dev spec Feb 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants