Everything about Yaml totally explained
YAML (
rhymes with
camel) is a
human-readable data serialization format that takes concepts from languages such as
XML,
C,
Python,
Perl, as well as the format for electronic mail as specified by
RFC 2822
. YAML was first proposed by Clark Evans in
2001, who designed it together with Ingy döt Net and Oren Ben-Kiki.
YAML is a
recursive acronym for "YAML Ain't a
Markup Language". Early in its development,
YAML was said to mean "Yet Another
Markup Language", but was
retronymed to distinguish its purpose as data-centric, rather than document markup.
Features
YAML syntax was designed to be easily mapped to data types common to most high-level languages:
list, hash, and scalar. Its familiar indented outline and lean appearance makes it especially suited for tasks where humans are likely to view or edit data structures, such as configuration files, dumping during debugging, and document headers (for example the headers found on most e-mails are very close to YAML). Although well-suited for hierarchical data representation, it also has a compact syntax for a relational data as well. Its line and whitespace delimiters make it friendly to
ad hoc grep/Python/Perl/Ruby operations. A major part of its accessibility comes from eschewing the use of enclosures like quotation marks, brackets, braces, and open/close-tags which can be hard for the human eye to balance in nested hierarchies.
Examples
Sample document
Data structure hierarchy is maintained by outline indentation.
---
receipt: Oz-Ware Purchase Invoice
date: 2007-08-06
customer:
given: Dorothy
family: Gale
items:
- part_no: A4786
descrip: Water Bucket (Filled)
price: 1.47
quantity: 4
- part_no: E1628
descrip: High Heeled "Ruby" Slippers
price: 100.27
quantity: 1
bill-to: &id001
street: |
123 Tornado Alley
Suite 16
city: East Westville
state: KS
ship-to: *id001
specialDelivery: >
Follow the Yellow Brick
Road to the Emerald City.
Pay no attention to the
man behind the curtain.
...
Notice that strings don't require enclosure in quotations. The specific number of spaces in the indentation is unimportant as long as parallel elements have the same left justification and the hierarchically nested elements are indented further. That sample document defines a hash with 7 top level keys: one of the keys, "items", contains a 2 element array (or "list"), each element of which is itself a hash with four keys. Relational data and redundancy removal are displayed: the "ship-to" hash content is copied from the "bill-to" hash's content as indicated by the anchor(&) and reference(*) labels. Optional blank lines can be added for readability. Multiple documents can exist in a single file/stream and are separated by "---". An optional "..." can be used at the end of a file (useful for signalling an end in streamed communications without closing the pipe).
Language elements
Basic components of YAML
YAML offers both an indented and an "in-line" style for denoting hashes and arrays. Here is a sampler of the components.
Lists
Conventional block format uses a dash to begin a new item in list
--- # Favorite movies
- Casablanca
- North by Northwest
- Notorious
Optional inline format is delimited by comma+space and enclosed in brackets (similar to
JSON)
--- # Shopping list
[milk,pumpkin pie, eggs, juice]
Hashes
--- # Block
name: John Smith
age: 33
--- # Inline
Relational trees
Data merge and references
Another advanced, less common topic. For clarity, compactness, and avoiding data entry errors, YAML provides node references(*) and hash merges(<<) that refer to a node labeled with an anchor (&) tag. References branch the tree to the anchor and work for all data types. (see the ship-to reference in the example above). Merges are for hashes only, and merge the keys at the anchor into the referring hashmap.
Merges and references are automatically expanded by the parser when the data structure is instantiated. This can greatly enhance readability and facilitate editing: below is an example of a queue in an instrument sequencer in which each subsequent step only lists the elements that are changed from the first step. When a YAML parser loads this array, all the "step" hashes will have the 5 keys specified in first step.
sequencer protocols for Laser eye surgery
---
- step: &id001 # defines anchor label &id001
instrument: Lasik 2000
pulseEnergy: 5.4
pulseDuration: 12
repetition: 1000
spotSize: 1mm
- step:
<<: *id001 # merges key:value pairs defined in step1 anchor
spotSize: 2mm # overrides "spotSize" key's value
- step:
<<: *id001 # merges key:value pairs defined in step1 anchor
pulseEnergy: 500.0 # overrides key
alert: > # adds additional key
warn patient of
audible pop
Comparison to other data structure format languages
While YAML shares similarities with
JSON,
XML and
SDL, it also has characteristics that are unique in comparison to many other similar format languages.
JSON
JSON syntax is
nearly a subset of YAML and most JSON documents can be parsed by a YAML parser. This is because JSON's semantic structure is equivalent to the optional "inline-style" of writing YAML. While extended hierarchies can be written in inline-style like JSON, this isn't a recommended YAML style except when it aids clarity. YAML has additional features lacking in JSON such as extensible data types, relational anchors, strings without quotation marks, and mapping types preserving key order.
XML and SDL
YAML lacks the notion of tag attributes that are found in XML and SDL.
For data structure serialization, tag attributes are, arguably, a feature of questionable utility since the separation of data and meta-data adds complexity when represented by the natural data structures (hashes, arrays) in common languages. Instead YAML has extensible type declarations (including class types for objects). YAML itself doesn't have XML's language-defined document schema descriptors that allow, for example, a document to self validate. However, a YAML schema descriptor language
exists
, and
YAXML, which represents YAML data structures in XML, allows XML schema importers and output mechanisms like
XSLT to be applied to YAML. Moreover, in typical use, the semantics provided by rich language-defined type-declarations in the YAML document itself eliminates the need for an additional validator.
Indented delimiting
Because YAML primarily relies on outline indentation for structure, it's especially resistant to
delimiter collision. YAML's insensitivity to quotes and braces in scalar values means one may embed XML, SDL, JSON or even YAML documents inside a YAML document by simply indenting it in a block literal. Conversely, to place YAML in XML or SDL content requires converting all whitespace and sigils (like <,> and &) to entity syntax. To place YAML in JSON requires quoting it, and escaping all interior quotes.
---
example: HTML goes into YAML without modification
message: !xml |
"Three is always greater than
two, even for large values of two"
--Author Unknown
date: 2007-06-01
Non-hierarchical data models
Unlike SDL, and JSON, which can only represent data in a hierarchical model with each child node having a single parent, YAML also offers a simple relational scheme that allows repeats of identical data to be referenced from two or more points in the tree rather than entered redundantly at those points. This is similar to the facility IDREF built into XML. The YAML parser then expands these references into the fully populated data structures they imply when read in, so whatever program is using the parser doesn't have to be aware of a relational encoding model, unlike XML processors which don't expand references. This expansion can enhance readability while reducing data entry errors in configuration files or processing protocols where many parameters remain the same in a sequential series of records while only a few vary. An example being that "ship-to" and "bill-to" records in an invoice are nearly always the same data.
Practical considerations
YAML is line oriented and thus it's often simple to convert the unstructured output of existing programs into YAML format while having them retain much of the look of the original document. Because there are no close-tags, braces and quotation marks to balance it's generally easy to generate well-formed YAML directly from distributed print statements within unsophisticated programs. Likewise, the white space delimiters facilitate quick-and-dirty filtering of YAML files using the line oriented commands in grep, awk, perl, ruby, and python.
In particular, unlike mark-up languages, chunks of consecutive YAML lines tend to be well-formed YAML documents themselves. This makes it very easy to write parsers that don't have to process a document in its entirety (for example balancing open- and close-tags and navigating quoted and escaped characters) before they begin extracting specific records within. This property is particularly expedient when iterating in a single, stateless pass, over records in a file whose entire data structure is too large to hold in memory, or for which reconstituting the entire structure to extract one item would be prohibitively expensive.
Counterintuitively, although its indented delimiting might seem to complicate deeply nested hierarchies, YAML handles indents as small as a single space, and this may achieve better compression than markup languages. Additionally, extremely deep indentation can be avoided entirely by either: 1) reverting to "inline-style" (i.e JSON-like format) without the indentation; or 2) using relational anchors to unwind the hierarchy to a flat form that the YAML parser will transparently reconstitute into the full data structure.
Security
YAML is purely a data representation language and thus has no executable commands. This means that parsers will be (or at least should be) safe to apply to tainted data without fear of a latent command-injection security hole. For example, because JSON is native JavaScript it's tempting to use the JavaScript interpreter itself to evaluate the data structure into existence, leading to command injection holes when inadequately verified. While safe parsing is inherently possible in any data language, implementation is such a notorious pitfall that YAML's lack of an associated command language may be a relative security benefit.
Data processing and representation
The XML and YAML specifications provide very different
logical models for data node representation, processing, and storage.
XML: The primary logical structures in an XML
instance document are: 1) Element; and 2) Element attribute. For these primary logical structures, the base XML specification doesn't define constraints regarding such factors as duplication of elements or the order in which they're allowed to appear. In defining conformance for XML processors, the XML specification generalizes them into two types: 1)
validating ; and 2)
non-validating. The XML specification asserts no detailed definitions for: an API; processing model; or data representation model; although several are defined in separate specifications that a user or specification implementor may choose independently. These include the
Document Object Model and
XQuery.
A richer model for defining valid XML content is the W3C XML Schema standard. This allows for full specification of valid XML content and is supported by a wide range of open source, free and commercial processors and libraries.
YAML: The primary logical structures in a YAML
instance document are: 1) Scalar; 2) Sequence; and 3) Mapping. The YAML specification also indicates some basic constraints that apply to these primary logical structures. For example, according to the specification, mapping keys don't have an order. In every case where node order is significant, a sequence must be used.
Moreover, in defining conformance for YAML processors, the YAML specification defines two primary operations: 1)
Dump; and 2)
Load. All YAML-compliant processors must provide
at least one of these operations, and may optionally provide both. Finally, the YAML specification defines an
information model or "representation graph" which must be created during processing for both
Dump and
Load operations, although this representation need not be made available to the user through an API.
Implementations
Portability
Simple YAML files (for example key value pairs) are readily parsed with regular expressions without resort to a formal YAML parser. YAML emitters and parsers for many popular languages written in the pure native language itself exist, making it portable in a self-contained manner. Bindings to C-libraries also exist when speed is needed.
C libraries
libYAML
As of 2007-06, this implementation of YAML 1.1 is stable and recommended by the YAML specification authors for production use (despite the 0.0.1 version number and a mild caution that the API isn't barred from evolution.).
SYCK
This implementation supports most of YAML 1.0 specification and is in widespread use. It is optimized for use with higher level interpreted languages, obtaining speed by writing directly to the symbol table of the higher level language when it can. Unfortunately, as of 2005 it's no longer maintained and has some incompatibilities with the specification.
Bindings
Bindings for YAML exist for the following languages:
Perl
- YAML::
is a common interface to several YAML parsers.
- YAML::Tiny
implements a useful subset of YAML; small, pure Perl, and faster than the full implementation.
- YAML::Syck
Binding to SYCK C-library. Offers fast, highly featured YAML
- YAML::XS
Binding to LibYaml. Better yaml 1.1 compatibility.
PHP
- Spyc
is a pure PHP implementation
- PHP-Syck
(binding to SYCK library)
Python
- PyYaml
Highly featured. Pure Python or optionally uses LibYAML.
- PySyck
Binding to SYCK C-Library
Ruby (YAML included in standard library since 1.8. based on SYCK)
Java
- jvyaml
based on Syck, and patterned off ruby-yaml
- JYaml
pure Java implementation
R (programming language)
JavaScript
- native Java script emits but doesn't read YAML
- YAML-Javascript
emitter and parser
.NET Framework
OCaml
C++
Objective-C
Lua
Haskell
XML YAXML
(currently draft only)
Pitfalls and implementation defects
Editors:
- An editor mode that autoexpands tabs to spaces and displays text in a fixed-width font is recommended.
- The editor needs to handle UTF-8 and UTF-16 correctly (otherwise, it'll be necessary to use only ASCII as a subset of UTF-8).
Strings:
- For readability, and avoiding the need for meta-escape sequences, it's desirable to avoid quoted strings. However, this leads to a pitfall when inline strings are ambiguous single words (for example digits or boolean words) or when the un-quoted phrase accidentally contains a YAML construct (for example a leading exclamation point or a colon-space after a word: "!Caca de vaca!" or "Caution: lions ahead"). This isn't an issue anyone using a proper YAML emitter will confront, but can come up in ad hoc scripts or human editing of files. What it comes down to is if the data structure creator has control over content of the unquoted inline strings or not. If not then using either !!str tags or enclosing them in quotes may be a good practice. The !!str tag is preferable if the string itself may contain quotation marks. Another, simpler, approach is to use block literals ("|" or ">") rather than inline string expressions as these have no such ambiguities to resolve.
Anticipating implementation idiosyncrasies:
- Some implementations of YAML, such as perl's YAML::BASE will load an entire file (stream) and parse it en-mass. Conversely, YAML::Tiny only reads the first document in the stream and stops. Other implementations like pyYaml are lazy and iterate over the next document only upon request. For very large files in which one plans to handle the documents independently, instantiating the entire file before processing may be prohibitive. Thus in YAML::BASE, occasionally one must chunk a file into documents and parse those individually. Fortunately, YAML makes this easy since this simply requires splitting on the document separator, m/^---/. That strategy could be disrupted if anchor and reference tags happen to lie in different documents of the same file.
Further Information
Get more info on 'Yaml'.
|
External Link Exchanges
Do you know how hard it is to get a link from a large encyclopaedia? Well we're different and will prove it. To get a link from us just add the following HTML to your site on a relevant page:
<a href="http://yaml.totallyexplained.com">YAML Totally Explained</a>
Then simply click through this link from your web page. Our crawlers will verify your link, extract the title of your web page and instantly add a link back to it. If you like you can remove the words Totally Explained and embed the link in article text.
As long as your link remains in place, we'll keep our link to you right here. Please play fair - our crawlers are watching. Your site must be closely related to this one's topic. Any kind of spamming, dubious practises or removing the link will result in your link from us being dropped and, potentially, your whole site being banned. |