Avro To Json Example

Avro schemas are defined with JSON that helps in data interoperability. I'd pick Avro out of the two - it is very well designed for typed. JSON Example: Play with JSON data: Insurance Company JSON. The example is mostly the same as we used for JSON serializer. Moving this to the [email protected] lists. Publishing Avro. A JSON parsing API to parse AVRO – I have chosen Jackson API as one among the many JSON parsing API/libraries available. parse takes a string containing a JSON schema definition as input and outputs a avro. So you have some data that you want to store in a file or send over the network. I'm going to create the JSON schema inline from Python structures. An Avro schema can be viewed as a language-agnostic contract for systems to interoperate. Record data types. Avro creates a self-describing file named Avro Data File, in which it stores data along with its schema in the metadata section. A dataset is a collection of records, similar to a relational database table. Avro file structure. jsonschema. Then, you can extract the Avro values into a table with a column of type DATASET STORAGE FORMAT AVRO by using the AvroContainerSplit table operator. Thankfully this is very easy to do in Spark using Spark SQL DataFrames. A schema parser, which can parse Avro schema (written in JSON) into a Schema object. JsonEncoder. The structure of a binary Avro file can be described with the following informal production rules:. In Kafka tutorial #3 - JSON SerDes, I introduced the name SerDe but we had 2 separate classes for the serializer and the deserializer. JavaScript Object Notation (JSON) is a hierarchical data-interchange format similar to XML. An Avro object contains the schema and the data. The following example shows how to load Avro object container files into Teradata Database as BLOBs by using BTEQ. Provides functions for serialization and deserialization of messages in an Apache Avro format. Avro creates a self-describing file named Avro Data File, in which it stores data along with its schema in the metadata section. In order to transport the data over the network or to store on some persistent storage. It supports JSON schema declaration files, JSON protocol declaration files, and Avro IDL. Rather than invent a new type system, PFA uses the same one as the Avro serialization format. When serialized, schema content is also included in serialized data. We're passing in the contents of our user. Why is does the avro schema file need to be within the outputDirectory? It seems like the avro schema would be the source of what the code generation would be based on, so its confusing that its. The library parses JSON into a Python dictionary or list. The following examples contain samples for each schema format. In this article I show code examples of MapReduce jobs in Java, Hadoop Streaming, Pig and Hive that read and/or write data in Avro format. Topics: JSON, Jerkson, SBT quick start, running the Scala REPL in SBT, Java implicit conversions, @transient annotation, SBT run and run-main, Avro. An optional Avro schema in JSON format given inline to use for reading. JavaScript Object Notation (JSON) pronounced as "Jason" is the de facto standard for data interchange on the web these days. For example, a union with an int as its active branch may be represented as “{“int”:123}”. I have used Protocol Buffer and just want to learn Avro. ConvertAvroToJSON - appropriately named processor that converts Avro files to -- wait for it -- JSON. Reddit gives you the best of the internet in one place. Because every developer knows or can easily learn JSON, there's a very low barrier to entry. Unlike the basic Spark RDD API, the interfaces provided by Spark SQL provide Spark with more information about the structure of both the data and the computation being performed. avpr file, it may be processed by the idl tool. Adopting Avro for event (de)serialization is fairly straightforward. For example, say you have a relational database whose contents you want to dump to a file, and you want to use a binary format to avoid the aforementioned problems with textual formats (JSON, CSV, XML). 2018-08-01. Confluent REST Proxy is the perfect way to communicate for sending Avro data using non Java languages to Apache Kafka. To achieve this, Avro assumes that the schema is always present at both read and write time. In the future, we will expand Spark SQL’s JSON support to handle the case where each object in the dataset might have considerably different schema. Unlike Thrift and Protocol Buf, Avro does not require code generation. import java. Avro needs less encoding as part of the data since it stores names and types in the schema. Non-union values are encoded as follows:. There will be at least one partition per input file:. In JSON, the value of a union is encoded as: It is encoded as a JSON null if its type is null. JSON-LD provides a powerful solution for representing object references and namespaces in JSON based on standard web URIs, but is not itself a schema language. An Avro schema can be viewed as a language-agnostic contract for systems to interoperate. This makes it easy while deserializing the content, as required schema is locally present in data. The following example shows a possible JSON representation describing a person. It is a simple format that is easier to comprehend than XML. Published by Martin Kleppmann on 05 Dec 2012. While this post only covered JSON, gRPC is compatible with Thrift, Avro, Flatbuffers, Cap’n Proto, and even raw bytes! gRPC lets you be in control of how your data is handled. Hi Splunkers, I am in an odd pickle here. Both approaches should work equally well. - Usage (hadoop): hadoop jar avro-tools-*. Copy an Avro file from HDFS. Deeply Nested “JSON”. ToAvro(Newtonsoft. The following are top voted examples for showing how to use org. Let’s add Avro dependency in build: "org. Early versions of JSON (such as specified by RFC 4627) required that a valid JSON text must consist of only an object or an array type, which could contain other types within them. As you embark on converting vast quantities of JSON to Avro, you soon discover that things are not as simple as they seem. Avro's Json encoding uses a Json object to tag each union value with the intended type. It's own data format can be processed by many languages (currently C, C++, Python, Java, Ruby and PHP). Validating Data With JSON-Schema, Part 2 When you’re dealing with complex and structured data, you need to determine whether the data is valid or not. Flexter can generate a target schema from an XML file or a combination of XML and XML schema (XSD) files. parse taken from open source projects. Example of usage: java -jar avro-cli-0. To dynamically generate Scala bindings for reading and writing Avro-mapped Scala types to and from Avro binary. When performing runtime topic resolution, Kafka Producer can write to any topic by default. While in many cases this is convenient and pythonic, not being able to discover the schema. JSON to Avro conversion tool designed to make migration to Avro. Set the format to json. Dynamic typing: Avro does not require that code be generated. For example, if you have this nested JSON:. The structure of a binary Avro file can be described with the following informal production rules:. Binary Avro to JSON. avdl - A code-like language that gets translated to. KSQL automatically generates an appropriate Avro schema for the new pageviews_avro stream, and it registers the schema with Schema Registry. A avro data is described in a language independent schema. Big Data trends have developers working with XML alongside other data protocols such as JSON and Apache Avro, and XMLSpy supports both of these with dedicated editing views and functionality. Avro schemas are defined using JSON that helps in data interoperability. For example, the following class:. This is a short example showing the use of Avro, a data serialization format, based on JSON. Here are the examples of the csharp api class NEsper. A schema parser, which can parse Avro schema (written in JSON) into a Schema object. Paste your schema and data in the appropriate text areas and press the Validate button. JSON is described in a great many places, both on the web and in after-market documentation. The following are top voted examples for showing how to use org. For example, consider a dataset where JSON fields are used to hold key/value pairs representing HTTP headers. (Last Updated On: June 26, 2018) I have been experimenting with Apache Avro and Python. Avro is an open source data format that bundles serialized data with the data's schema in the same file. The interesting about Avro is that is encapsulates both the schema and the content in it’s binary message. SchemaBuilder // When reading the key and value of a Kafka topic, decode the // binary (Avro) data into structured data. Unlike the basic Spark RDD API, the interfaces provided by Spark SQL provide Spark with more information about the structure of both the data and the computation being performed. Which means that when a human needs to read the data, you have to transform it into some third format. Additionally, Avro serializes data in a binary format, making it compact and efficient. If you continue browsing the site, you agree to the use of cookies on this website. Below is what I have learned thus far. For In this example you can download avro-1. Each byte (in either Hex or Uint8 form) can have spaces in between them. JSON is a text-based, human-readable format for representing simple data structures and associative arrays (called objects). For those of you who haven’t worked with it yet, Avro is a data serialization system that allows for rich data structures and promises an easy integration for use in many languages. Each Avro IDL file defines a single Avro Protocol, and thus generates as its output a JSON-format Avro Protocol file with extension. A avro data is described in a language independent schema. Applying an avro schema to the json and using record readers is another beast so let us know if that is what you are looking for. Generate data record from JSON to Avro $ java -jar avro-tools-1. You can vote up the examples you like and your votes will be used in our system to generate more good examples. For Avro, you need to specify the Schema Registry. This entry was posted in Avro and tagged apache avro schema example avro compile schema avro complex schema avro data types avro enum example avro hadoop tutorial avro java code generation avro json serialization example avro record example avro schema definition compile avro schema define avro schema example schemas file in avro on September. but getting below error: var df = a member of package com import com. These can be created in any language and can be processed in hadoop and the results can be fed to a third language. Apache Avro is a serialization format whose support in C# is officially provided by Microsoft. > > I couldn't find a way to parse the complex nested JSON using morphlines,. Specifies to use the metadata from the first Parquet or Avro file or from all Parquet or Avro files in the file_uri_list to auto generate the columns and their data types. Skip navigation Sign in. JSON Formatter Online and JSON Validator Online work well in Windows, Mac, Linux, Chrome, Firefox, Safari, and Edge and it's Free. Values should be stored in Avro using the Avro types and logical type annotations in the table below. Avro #opensource. The deserializer can be configured by passing in parameters with the deserializer. Reducer used TableReducer to insert row into. Avro format data has an independent schema, also defined in JSON. Why do we need serializer instead of just dumping into JSON? validation that your data matches the schema. 7 (not Athena)? Looks like a csv file in my s3 bucket. An Avro schema can be viewed as a language-agnostic contract for systems to interoperate. This is the only union type allowed in Iceberg data files. It's a bit of a long story but the machine only has access to vertica client tools but not vertica itself. You can vote up the examples you like and your votes will be used in our system to generate more good examples. Hackolade was specially adapted to support the data modeling of Avro schema. File Format Benchmark - Avro, JSON, ORC, & Parquet Owen O’Malley [email protected] Avro provides data structures, binary data format, container file format to store persistent data, and provides RPC capabilities. Generate Scala from Avro and Avro from JSon. Then you need to parse it, for example, with MapReduce. Avro schemas are defined in JSON. Please add absolute paths for the files you mention. Also, we saw Avro Schema example and creating Avro Schema from JSON. JSON is an acronym standing for JavaScript Object Notation. Avro Component. Lets copy below lines of code into GenerateDataWithCode. databricks:spark-avro_2. When Kafka Producer evaluates a record, it calculates the expression based on record values and writes the record to the resulting topic. are the biggest difference. Switching JSON protocol to a binary format in order to achieve better performance is pretty straightforward with Avro. I presume you are asking which serialisation format is better ?. This example will read from a SnapshotFileSet named 'users'. With most of today’s traditional, on-premises and cloud-washed data warehouses, and big data environments, you have to first load this type of data to a Hadoop or NoSQL platform. avsc location1. The Google Cloud Storage destination writes data to Google Cloud Storage based on the data format that you select. Thanks, Jayme. {"widget": { "debug": "on", "window": { "title": "Sample Konfabulator Widget", "name": "main_window", "width": 500, "height": 500 }, "image": { "src": "Images/Sun. For example, if the desired output format is JSON, one may implement an ByteArrayToJsonConverter to convert the byte array to JSON. Recently I had a need to parse all the historic data files, created by Stream, for a trend in some of my IoT Data. avsc schema file here. Because every developer knows or can easily learn JSON, there's a very low barrier to entry. This is the only union type allowed in Iceberg data files. The AWS Documentation website is getting a new look! Try it now and let us know what you think. stands for YAML ain't markup language and is a superset of JSON - Convert YAML to JSON. simplicity, ii. Example: Returning a JSON-Encoded Avro Value Instead of an Avro Instance as its Result - Teradata Vantage NewSQL Engine - 16. Remote procedure call (RPC). Every programming language has an implicit or explicit type system, and most of these type systems are similar to one another. An Avro schema can be used to generate and validate JSON representations of Avro objects. AVRO JSON CONVERSIONS: What is AVRO File Format: Avro stores both the data definition and the data together in one message or file making it easy for programs to dynamically understand the information stored in an Avro file or message. It uses JSON for defining data types and protocols, and serializes data in a compact binary format. Avro has enum type, while Protocol Buffers and MessagePack don't have. Avro provides rich data structures, for example, we can create a record that contains an array, an enumerated type, and a sub record. From the Jackson download page, download the core-asl and mapper-asl jars. Getting Avro Tools. 1) Copy/paste or upload your Excel data (CSV or TSV) to convert it to JSON. avro dataframes spark dataframe spark sql json parquet pyspark hive change data capture maptype json schema search column spark1. Below is what I have learned thus far. Then you need to parse it, for example, with MapReduce. jar ; avro-tools-1. We come across various circumstances where we receive data in json format and we need to send or store it in csv format. Spark provides built-in support to read from and write DataFrame to Avro file using “spark-avro” library. Each event become a record in the Avro file, where the Body contains the original JSON string that was sent as UTF-8 bytes. This entry was posted in Avro and tagged apache avro schema example avro compile schema avro complex schema avro data types avro enum example avro hadoop tutorial avro java code generation avro json serialization example avro record example avro schema definition compile avro schema define avro schema example schemas file in avro on September. Available as of Camel 2. The examples are extracted from open. The AWS Documentation website is getting a new look! Try it now and let us know what you think. A quick Google search eventually leads you to the avro-tools jar, and you find yourself attempting to convert some JSON, such as:. I've been able to copy/paste your command and it works ok for me. The following example uses a CTAS query to create a new table with Parquet data from a source table in a different format. In this tutorial I will demonstrate how to process your Event Hubs Capture (Avro files) located in your Azure Data Lake Store using Azure Databricks (Spark). Got VALUE_STRING Is there e. To define structure for Avro data, language-independent schema is used. By default, these are the name of the class and the enclosing package name, but it is possible to customize these using the annotations AvroName and AvroNamespace. Fork Star. Comparing Avro to Protocol Buffers, XML and JSON payloads In a previous blog post I compared Mainframe COBOL payloads to Protocol Buffers, XML and JSON. These examples are extracted from open source projects. You may need to define the schema in sink dataset and do the schema mapping in copy activity. Recipe 6: Serialize data as JSON data. Start spark shell using below line of command $ spark2-shell --packages com. SchemaExtensions. A JSON encoder, which encodes into JSON Avro. MessagePack is an efficient binary serialization format. Introduction. What you'll need Confluent OSS Confluent CLI Python and pipenv Docker Compose Stack Python 3 Pipenv Flake8 Docker Compose Postgres Kafka Kafka Connect AVRO Confluent Schema Registry Project. So far, no luck. The following are top voted examples for showing how to use org. Avro schemas are defined using JSON that helps in data interoperability. ) For a given schema it is simple to write a short Java program that converts from Json to Avro. For example, Avro supports nested lists and maps that have no good representation in CSV so we'll have to be careful about that conversion. Therefore, if your Kafka produces or consumes AVRO data and for some reason, the Kafka components for AVRO are not available, you must use an avro-tools library to convert your data between AVRO and JSON outside your Job. Let’s see how specialized Avro support in XMLSpy makes visualizing and searching Avro files, as well as editing Avro schemas, uniquely easy. Switching JSON protocol to a binary format in order to achieve better performance is pretty straightforward with Avro. When the data format for the Kafka key or value is JSON, individual fields of that JSON structure can be specified in the connector mapping. We’ll be using JSON because that is what the Python script is producing. Encoders and decoders to encode data into Avro format and decode it back using primitive functions. Confluent Schema Registry stores Avro Schemas for Kafka producers and consumers. These examples demonstrate a sample sensor, which is reading input events in JSON. There is a process of translating data structures or objects state into binary or textual form, is what we call Serialization process. An Avro deserializer that parses Avro container files, generating one Flume event per record in the Avro file, and storing binary avro-encoded records in the Flume event body. The previous tutorial covered basic XML processing in Scala, but as I noted, XML is not the primary choice for data serialization these days. When unloading table data to files, Snowflake outputs only to NDJSON format. Switch to the new look >> You can return to the original look by selecting English in the language selector above. Also, we saw Avro Schema example and creating Avro Schema from JSON. We are using the same Yelp data from Episode 2 of the Season 2 of the Hadoop Tutorial series. When a Source receives an Event, it stores it into one or more Channel s. This document will briefly explain how Gobblin integrates with Hive's SerDe library, and show an example of writing ORC files. #serialization #avro 0. 2, the latest version at the time of writing. Before you use the JMS Producer, install the JMS drivers for the implementation that you are using. There is a wide variety of data serialization formats, including XML, JSON, BSON, YAML, MessagePack, Protocol Buffers, Thrift and Avro. Spark provides built-in support to read from and write DataFrame to Avro file using “spark-avro” library. Download json-schema-avro-0. MessagePack aims interoperability with JSON. JSON is a very common way to store data. It supports JSON schema declaration files, JSON protocol declaration files, and Avro IDL files. Length of an array. Avro is a popular file format in BigData. We’ll be using JSON because that is what the Python script is producing. 2 thoughts on “Simple Apache Avro Example using Java” Tristan April 14, 2017 at 8:33 am. Hence, in this Avro Schema tutorial, we have learned the whole about Apache Avro Schemas in detail which also includes Schema Declaration & Avro Schema Resolution to understand well. As an example, if we want to export to the melville table in our exports dataset, which is part of the bookstore-1382 project, we might use a configuration of something like this:. It avoids to have some surprises when data is read and, for example, a mandatory field is missing or has bad type (int instead of array). Use this code it will help to covert the csv to avro file. Recipe 6: Serialize data as JSON data. To extract fields using an expression in JSON, this plugin uses the 'JsonPath' library. Avro Introduction for Big Data and Data Streaming Architectures Apache Avro™ is a data serialization system. import java. Avro is a recent addition to Apache’s Hadoop family of projects. We will start with an example Avro schema and a corresponding data file in plain-text JSON format. We wrote command-line scripts to convert the CSV file to JSON, and then the JSON file to Avro. Avro, JSON, ORC, Parquet, Cloud Datastore exports, and Cloud Firestore exports also support data with nested and repeated fields. Unlike popular formats as JSON or XML, serialized data is splittable easier. ) For a given schema it is simple to write a short Java program that converts from Json to Avro. Avro format. Checkout the examples. For example, consider a dataset where JSON fields are used to hold key/value pairs representing HTTP headers. An Avro deserializer that parses Avro container files, generating one Flume event per record in the Avro file, and storing binary avro-encoded records in the Flume event body. Hi, so I figure with Nifi 1. For example, you can create a readable external table using the below syntax. Using Avro in MapReduce Jobs with Java. MIMB Bridge Documentation MIMB Import Bridge from Apache Avro File (JSON) Please refer to the individual parameter's tool tips for more detailed examples. Data Type Mapping. The following are top voted examples for showing how to use org. Avro provides rich data structures, for example, we can create a record that contains an array, an enumerated type, and a sub record. Converting data stored somewhere else (for example in the database) to the Avro or Parquet datasets and storing them in the distributed file system can be performed using home-grown tools or (preferred) a good ETL tool, which can connect to any source, map, transform, and store data in the Hadoop cluster. This facilitates implementation in languages that already have. Each byte (in either Hex or Uint8 form) can have spaces in between them. The JSON format is the same format as used for default values in schemas except unions are encoded differently. Available as of Camel 2. but getting below error: var df = a member of package com import com. For reference. - authorSTREAM Presentation. Paste your schema and data in the appropriate text areas and press the Validate button. Thanks, Jayme. Anyone needing to implement validation for a JSON protocol (like we did for OpenRTB) will appreciate this feature. And, in order to declare the data structures, Avro uses JSON format. Apache Avro is a serialization format whose support in C# is officially provided by Microsoft. Avro is also used in Remote Procedure Calls (RPCs). Confluent Schema Registry stores Avro Schemas for Kafka producers and consumers. nz [email protected] avsc", "rb"). So, I am ingesting data from Amazon Web Services (AWS) to my Splunk instance. Interacting with JSON from JavaScript is extremely seamless. Avro schemas describe the structure of the corresponding Avro data and are written in JSON format. We’ll be using JSON because that is what the Python script is producing. Comparisons of API Architectural Styles Resources and comparison for different API types for such REST, GraphQL, and others, including topics such as HTTP vs WebSockets. If you want to learn more, get the Kafka Confluent Schema Registry & REST Proxy course at a special price: https://www. Binary Avro to JSON. Remote procedure call (RPC). Of course, you will get different results with different COBOL structures and a different data mix but you can be practically sure you will get a smaller Avro payload than the Mainframe payload. Each event become a record in the Avro file, where the Body contains the original JSON string that was sent as UTF-8 bytes. As mentioned previously, the schema for the data within an Avro file is contained in the file and is read by the receiving system to determine how to deserialize the messages. Embedded. This is the official tutorial for Apache Gora. Additional Information¶ Inspecting the generated Javascript code and HTML code¶ Inside the html file is the entry point of using the Thrift. PyCharm helps you work with JSON files - it checks their syntax and formatting. Hackolade was specially adapted to support the data modeling of Avro schema. js and compiled Javascript files. version at the time of writing. {"widget": { "debug": "on", "window": { "title": "Sample Konfabulator Widget", "name": "main_window", "width": 500, "height": 500 }, "image": { "src": "Images/Sun. basePath: Base path for the PartitionedFileSet. A jq program is a “filter”: it takes an input, and produces an output. 3 the best way to send PDFs between two systems is to put it inside a binary field ("bytes" field) in Avro. The Channel is a passive store that holds the Event until that Event is consumed by a Sink. There are four attributes for a given Avro schema: Type: specifies the data type of the JSON record, whether its complex type or primitive value. Python Avro JSON serializer. It comes with a very sophisticated schema description language that describes data. Then, you can extract the Avro values into a table with a column of type DATASET STORAGE FORMAT AVRO by using the AvroContainerSplit table operator. - authorSTREAM Presentation. avsc schema file here. Avro is a remote procedure call and data serialization framework developed within Apache's Hadoop project. For example, using Avro stream parser with schema repo Avro bytes decoder:. Avro provides data structures, binary data format, container file format to store persistent data, and provides RPC capabilities. Comparing Avro to Protocol Buffers, XML and JSON payloads In a previous blog post I compared Mainframe COBOL payloads to Protocol Buffers, XML and JSON. Avro provides rich data structures, for example, we can create a record that contains an array, an enumerated type, and a sub record. Here are the examples of the python api avro. This entry was posted in Avro and tagged apache avro schema example avro compile schema avro complex schema avro data types avro enum example avro hadoop tutorial avro java code generation avro json serialization example avro record example avro schema definition compile avro schema define avro schema example schemas file in avro on September. I disscussed a small topic on Avro schema here. The format supports Avro records only. This tutorial walks you through a NiFI flow that utilizes the ConvertRecord processor and Record Reader/Writer controller services to easily convert a CVS file into JSON format. for example a database table description output can be input to this tool. Unlike Thrift and Protocol Buf, Avro does not require code generation. A schema parser, which can parse Avro schema (written in JSON) into a Schema object. Big Data trends have developers working with XML alongside other data protocols such as JSON and Apache Avro, and XMLSpy supports both of these with dedicated editing views and functionality. It lets you exchange data among multiple languages like JSON. This makes it easy while deserializing the content, as required schema is locally present in data. An avro block. There are no format_options values for format="avro". An Introduction to Apache Avro 1. import org. When a Source receives an Event, it stores it into one or more Channel s. 1 one can use URL address as schema source. And add avro-1. JsonNodeFactory. AVRO (for loading only; data cannot be unloaded to AVRO format) Binary file in AVRO format. We will see here how to use a custom SerDe (Serializer / Deserializer) and how to use Avro and the Schema Registry. You can use the following data formats: Avro The destination writes records based on the Avro schema. Labels Spring 13 Spring Boot 9 Java 8 8 Core Java 6 Spring Core 6 Spring JDBC 6 Apache AVRO 5 Apache Ignite 4 Apache Ignite first application 3 Design pattern 3 ELK stack 3 Elasticsearch 3 Logstash 3 Software design principles 3 Spring Framework 3 Spring Security 3 AVRO RPC Framework 2 AVRO example 2 AVRO format 2 Apache AVRO example 2. Schema evolution in Avro, Protocol Buffers and Thrift. An external data source (also known as a federated data source) is a data source that you can query directly even though the data is not stored in BigQuery. In this entry we'll simplify log collection and transport using JSON. It is a simple format that is easier to comprehend than XML. It is a row-oriented remote procedure call and data serialization framework developed within Apache's Hadoop project. These schemas are defined in JSON. Avro is similar to Apache Thrift and Google Protobuf. I was recently working with a customer that uses Apache Avro compression and discovered that there isn’t a great example out there that shows you how to send Avro compressed messages through the IoT p. If you want to learn more, get the Kafka Confluent Schema Registry & REST Proxy course at a special price: https://www. In this article I show code examples of MapReduce jobs in Java, Hadoop Streaming, Pig and Hive that read and/or write data in Avro format. jar validate -c backward -s schema2-string-null-field. Note that this topic applies to JSON, Avro, ORC, and Parquet data; the topic does not apply to XML data. ) For a given schema it is simple to write a short Java program that converts from Json to Avro. This is the official tutorial for Apache Gora.