Parsing JSON Data In SQL Server
- Posted in:
- Big Data
- Development
- JSON
- SQL Server 2016
- SQL Server 2017
- T-SQL
Microsoft added support for JSON data beginning with SQL Server 2016. JSON is an open-standard file format consisting of attribute–value pairs and array data types. It is commonly used to transmit data objects for asynchronous browser–server communication. But it is also used for storing unstructured data in files or NoSQL databases such as Microsoft Azure Cosmos DB. For most of us, SQL Server's support for JSON probably means two things: we can convert relational data to JSON and vice versa. In this post, I'll focus on converting JSON to relational data and share what I've learned from a recent experience.
Let's begin with a look at some sample data. This example from Microsoft's documentation (slightly modified) shows the two JSON data structures: key/value pairs and arrays (an ordered list of values).
{
"NullValue":null,
"StringValue":"John",
"IntValue":45,
"DecimalValue":6.7,
"TrueValue":true,
"FalseValue":false,
"ArrayValue":["a","r","r","a","y"],
"ObjectValue":{"obj":"ect"}
}
Objects are enclosed in curly braces. The entire block of data is enclosed in curly braces--it is a JSON object. It consists of key/value pairs including "StringValue":"John", "IntValue":45, and "FalseValue":false. The last key "ObjectValue" has a value that is another object, which is a very simple object consisting of a single key/value pair: {"obj":"ect"}.
Array values are enclosed in square brackets within JSON data. The "ArrayValue" key has an array for a value. It consists of five, single-character strings. Here, the array values are all single character strings ["a","r","r","a","y"]. We'll get to JSON types in a moment, but for now I'll quickly note array values don't have to be of the same type.
OPENSJON
SQL Server provides the OPENJSON function to "read" JSON data. It is a table-valued function that parses JSON text and returns objects and properties from the JSON input as rows and columns. Here is the data sample from above, parsed by the OPENJSON function:
DECLARE @json NVARCHAR(4000) = N'
{
"NullValue":null,
"StringValue":"John",
"IntValue":45,
"DecimalValue":6.7,
"TrueValue":true,
"FalseValue":false,
"ArrayValue":["a","r","r","a","y"],
"ObjectValue":{"obj":"ect"}
}'
SELECT *
FROM OPENJSON(@json) j
The JSON keys and values are returned as columns in a result set, along with the JSON type. JSON documentation indicates there are only six JSON types, which is far less than there are SQL data types. For instance, both 45 and 6.7 have the same JSON type (type 2), whereas in SQL Server there are TINYINT, SMALLINT, INT, BIGINT, and NUMERIC/DECIMAL data types.
OPENROWSET
Reading JSON data from a hard-coded variable is probably not a real world use case for most of us. To read data out of a *.json file, we can enlist the help of our old friend, the OPENROWSET function. Here we'll read the JSON data out of the Sample.json file, store the contents of the file into a @json variable, and parse it with OPENJSON:
DECLARE @json AS NVARCHAR(MAX);
SELECT @json = r.BulkColumn
FROM OPENROWSET (BULK 'C:\Data\Sample.json', SINGLE_CLOB) AS r
SELECT * FROM OPENJSON(@json);
The CROSS APPLY operator in the next example produces the same output as the previous example. There's less code, and it's arguably more clear:
SELECT j.*
FROM OPENROWSET (BULK 'C:\Data\Sample.json', SINGLE_CLOB) AS r
CROSS APPLY OPENJSON(BulkColumn) AS j
Once again, here is the output. Did you notice how different the values are for the last two rows? The OPENJSON function returns array values and object values as JSON text.
JSON_VALUE
How are values of arrays and nested objects accessed? One way is to use the JSON_VALUE function, which extracts a scalar value from a JSON string. It takes two arguments: a JSON text expresion and a path that specifies the property to extract. The query below extracts each element of the ArrayValue array, and the value of the obj key within the ObjectValue object.
--Use the JSON_VALUE function to extract scalar values.
--Note the JSON path expressions with the "$" character.
SELECT j.[key], j.value,
JSON_VALUE(BulkColumn, '$.ArrayValue[0]') AS Element1,
JSON_VALUE(BulkColumn, '$.ArrayValue[1]') AS Element2,
JSON_VALUE(BulkColumn, '$.ArrayValue[2]') AS Element3,
JSON_VALUE(BulkColumn, '$.ArrayValue[3]') AS Element4,
JSON_VALUE(BulkColumn, '$.ArrayValue[4]') AS Element5,
JSON_VALUE(BulkColumn, '$.ObjectValue.obj') AS ObjValue
FROM OPENROWSET (BULK 'C:\Data\Sample.json', SINGLE_CLOB) AS r
CROSS APPLY OPENJSON(BulkColumn) AS j
The path string in this context is comprised of a dollar sign ($), which represents the context item (the entire JSON object string). It is followed by the dot operator, which indicates a member of an object. $.ArrayValue is the path to the key named "ArrayValue", and $.ObjectValue is the path to the key named "ObjectValue". Both are members of the JSON text object. For the array, elements are referenced by an index number in square brackets. Index numbers for array elements are zero-based. So the first array element is referenced by index 0, the second array element is referenced by index 1, and so on. For the object, there is only one key/value pair. The value is referenced by a second dot operator and the name of the key ("obj").
What happens if we specify a path to an array element or key name that don't exist? Let's try:
--There is no 6th element at index 5.
--"Obj" is spelled with the wrong case.
SELECT j.[key], j.value,
JSON_VALUE(BulkColumn, '$.ArrayValue[5]') AS Element6,
JSON_VALUE(BulkColumn, '$.ObjectValue.Obj') AS ObjValue
FROM OPENROWSET (BULK 'C:\Data\Sample.json', SINGLE_CLOB) AS r
CROSS APPLY OPENJSON(BulkColumn) AS j
NULL is returned where we tried to access the nonexistent 6th element and the value of a key that doesn't exist (we spelled it wrong by capitalizing the first letter). Be aware that there are two path modes: lax and strict. Lax is the default. In lax mode, the function returns NULL if the path expression contains an error, as is the case in the example above. In strict mode, the function raises an error if the path expression contains an error:
--There is no 6th element at index 5.
--"Obj" is spelled with the wrong case.
SELECT j.[key], j.value,
JSON_VALUE(BulkColumn, 'strict $.ArrayValue[5]') AS Element6,
JSON_VALUE(BulkColumn, 'strict $.ObjectValue.Obj') AS ObjValue
FROM OPENROWSET (BULK 'C:\Data\Sample.json', SINGLE_CLOB) AS r
CROSS APPLY OPENJSON(BulkColumn) AS j
Msg 13608, Level 16, State 2, Line 51
Property cannot be found on the specified JSON path.
Although there is one nested object ("ObjectValue"), the JSON data in these examples is quite simple. In my short time working with JSON data, I encountered several levels of nesting with much greater complexity than is shown here. Look for another post or two on this, along with a string splitting approach with OPENJSON that works for a use case not covered by STRING_SPLIT.
Comments