-
Notifications
You must be signed in to change notification settings - Fork 27
Description
Proposing to add a new multivariate version of the current SigLLM detector pipeline. This pipeline mainly uses the univariate SigLLM detector pipeline but changes the representation of scalars in LLM input strings. We provide code for an implementation of a few methods of encoding multivariate inputs into a 1D string. For example, given timesteps t₀ = [50, 30, 100] and t₁ = [55, 28, 104]:
- Value Concatenation – Simply flatten the values across time:
-50,30,100,55,28,104 - Value Interleave – Pad values to equal digit length and concatenate timestep by timestep:
-050030100,055028104 - Digit Interleave – Interleave digits positionally across dimensions:
-001530000,001520584 - JSON Format – Encode as dimension-labeled key:value pairs:
-d0:50,d1:30,d2:100,d0:55,d1:28,d2:104
LLMs have shown sensitivity to token structure and ordering. Thus, we also provide easy-to-use scaffolding code to implement any other multivariate formatting method. An end-user only needs to implement format_as_string and format_as_integer with the chosen method in mind. This pipeline runs a basic test to make sure that the format succeeds in encoding scalars into strings and decoding strings back into scalars before running on real data.