-
Notifications
You must be signed in to change notification settings - Fork 215
Data model
vg uses Protocol Buffers for internal data representation and serialization.
ProtoBuf is a schema language, much like Apache Avro, FlatBuffers, Cap'n Proto, or many others. In essence, it's a human-readable language developed by Google that allows one to describe objects ('messages' in protobuf-speak). One writes their data model (or 'schema') in this language and then compiles it with a special compiler, protoc, which generates source code in a variety of common programming languages (C++, Python, Java, etc.). The generated source code contains getters and setters for fields in the schema. In essence, protobuf just makes it really easy to add/remove fields from a data model and port it into other languages.
The vg protobuf schema has our messages and their nested fields. For example, there is a message called 'Alignment' that represents a read aligned to the graph - it's analogous to a SAM record. It has nested fields like 'sequence', 'quality', etc. We use bidirected sequence graphs which can be broken down into snarls and chains.