RProtoBuf: Efficient Cross-Language Data Serialization in R

Modern data collection and analysis pipelines often involve a sophisticated mix of applications written in general purpose and specialized programming languages. Many formats commonly used to import and export data between different programs or systems, such as CSV or JSON, are verbose, inefficient, not type-safe, or tied to a specific programming language. Protocol Buffers are a popular method of serializing structured data between applications - while remaining independent of programming languages or operating systems. They offer a unique combination of features, performance, and maturity that seems particularly well suited for data-driven applications and numerical computing. The RProtoBuf package provides a complete interface to Protocol Buffers from the R environment for statistical computing. This paper outlines the general class of data serialization requirements for statistical computing, describes the implementation of the RProtoBuf package, and illustrates its use with example applications in large-scale data collection pipelines and web services.


Introduction
Modern data collection and analysis pipelines increasingly involve collections of decoupled components in order to better manage software complexity through reusability, modularity, and fault isolation (Wegiel and Krintz 2010).These pipelines are frequently built using different programming languages for the different phases of data analysis -collection, cleaning, modeling, analysis, post-processing, and presentation -in order to take advantage of the unique combination of performance, speed of development, and library support offered by different environments and languages.Each stage of such a data analysis pipeline may produce intermediate results that need to be stored in a file, or sent over the network for further processing.
Given these requirements, how do we safely and efficiently share intermediate results between different applications, possibly written in different languages, and possibly running on different computer systems?In computer programming, serialization is the process of translating data structures, variables, and session state into a format that can be stored or transmitted and then reconstructed in the original form later (Cline 2013).Programming languages such as R, Julia, Java, and Python include built-in support for serialization, but the default formats are usually language-specific and thereby lock the user into a single environment.
Data analysts and researchers often use character-separated text formats such as CSV (Shafra-arXiv:1401.7372v1[stat.CO] 28 Jan 2014 computers or operating systems.
• Efficient: Data is serialized into a compact binary representation for transmission or storage.
• Extensible: New fields can be added to Protocol Buffer schemas in a forward-compatible way that does not break older applications.
• Stable: Protocol Buffers have been in wide use for over a decade.Figure 1 illustrates an example communication work flow with Protocol Buffers and an interactive R session.Common use cases include populating a request remote-procedure call (RPC) Protocol Buffer in R that is then serialized and sent over the network to a remote server.The server would then deserialize the message, act on the request, and respond with a new Protocol Buffer over the network.The key difference to, say, a request to an Rserve instance is that the remote server may be implemented in any language, with no dependence on R.
While traditional IDLs have at times been criticized for code bloat and complexity, Protocol Buffers are based on a simple list and records model that is flexible and easy to use.The schema for structured Protocol Buffer data is defined in .protofiles, which may contain one or more message types.Each message type has one or more fields.A field is specified with a unique number (called a tag number ), a name, a value type, and a field rule specifying whether the field is optional, required, or repeated.The supported value types are numbers, enumerations, booleans, strings, raw bytes, or other nested message types.The .proto file syntax for defining the structure of Protocol Buffer data is described comprehensively on Google Code1 .Table 1 shows an example .protofile that defines the tutorial.Person type2 .The R code in the right column shows an example of creating a new message of this type and populating its fields.
For added speed and efficiency, the C++, Java, and Python bindings to Protocol Buffers are used with a compiler that translates a Protocol Buffer schema description file (ending in

R> class(p)
[1] "Message" attr(,"package") [1] "RProtoBuf" Table 1: The schema representation from a .protofile for the tutorial.Person class (left) and simple R code for creating an object of this class and accessing its fields (right).
.proto) into language-specific classes that can be used to create, read, write, and manipulate Protocol Buffer messages.The R interface, in contrast, uses a reflection-based API that makes some operations slightly slower but which is much more convenient for interactive data analysis.All messages in R have a single class structure, but different accessor methods are created at runtime based on the named fields of the specified message type, as described in the next section.

Basic Usage: Messages and descriptors
This section describes how to use the R API to create and manipulate Protocol Buffer messages in R, and how to read and write the binary representation of the message (often called the payload ) to files and arbitrary binary R connections.The two fundamental building blocks of Protocol Buffers are Messages and Descriptors.Messages provide a common abstract encapsulation of structured data fields of the type specified in a Message Descriptor.Message Descriptors are defined in .protofiles and define a schema for a particular named class of messages.

Importing message descriptors from .proto files
To create or parse a Protocol Buffer Message, one must first read in the message type specification from a .protofile.The .proto files are imported using the readProtoFiles function, which can either import a single file, all files in a directory, or every .protofile provided by a particular R package.
After importing proto files, the corresponding message descriptors are available by name from the RProtoBuf:DescriptorPool environment in the R search path.This environment is implemented with the user-defined tables framework from the RObjectTables package available from the OmegaHat project (Temple Lang 2012).Instead of being associated with a static hash table, this environment dynamically queries the in-memory database of loaded descriptors during normal variable lookup.

Creating a message
New messages are created with the new function which accepts a Message Descriptor and optionally a list of "name = value" pairs to set in the message.R> p1 <-new(tutorial.Person) R> p <-new(tutorial.Person, name = "Murray", id = 1)

Access and modify fields of a message
Once the message is created, its fields can be queried and modified using the dollar operator of R, making Protocol Buffer messages seem like lists.

R> p$id
[1] 1 R> p$email <-"murray@stokely.org" As opposed to R lists, no partial matching is performed and the name must be given entirely.
The [[ operator can also be used to query and set fields of a messages, supplying either their name or their tag number: [1] "murray@stokely.org"Protocol Buffers include a 64-bit integer type, but R lacks native 64-bit integer support.A workaround is available and described in Section 5.3 for working with large integer values.

Display messages
Protocol Buffer messages and descriptors implement show methods that provide basic information about the message:

R> p
[1] "message of type 'tutorial.Person' with 3 fields set" For additional information, such as for debugging purposes, the as.character method provides a more complete ASCII representation of the contents of a message.

Serializing messages
One of the primary benefits of Protocol Buffers is the efficient binary wire-format representation.The serialize method is implemented for Protocol Buffer messages to serialize a message into a sequence of bytes (raw vector) that represents the message.The raw bytes can then be parsed back into the original message safely as long as the message type is known and its descriptor is available.

Parsing messages
The RProtoBuf package defines the read and readASCII functions to read messages from files, raw vectors, or arbitrary connections.read expects to read the message payload from binary files or connections and readASCII parses the human-readable ASCII output that is created with as.character.
The binary representation of the message does not contain information that can be used to dynamically infer the message type, so we have to provide this information to the read function in the form of a descriptor: R> msg <-read(tutorial.Person, tf1) R> writeLines(as.character(msg))name: "Murray Stokely" id: 3 email: "murray@stokely.org" The input argument of read can also be a binary readable R connection, such as a binary file connection: R> con <-file(tf2, open = "rb") R> message <-read(tutorial.Person, con) R> close(con) R> writeLines(as.character(message))name: "Murray Stokely" id: 3 email: "murray@stokely.org"Finally, the payload of the message can be used: R> # reading the raw vector payload of the message R> payload <-readBin(tf1, raw(0), 5000) R> message <-read(tutorial.Person, payload) read can also be used as a pseudo-method of the descriptor object: R> # reading from a file R> message <-tutorial.Person$read(tf1) R> # reading from a binary connection R> con <-file(tf2, open = "rb") R> message <-tutorial.Person$read(con) R> close(con) R> # read from the payload R> message <-tutorial.Person$read(payload) 4.Under the hood: S4 classes, methods, and pseudo methods The RProtoBuf package uses the S4 system to store information about descriptors and messages.Using the S4 system allows the package to dispatch methods that are not generic in the S3 sense, such as new and serialize.Table 2 lists the six primary Message and Descriptor classes in RProtoBuf.Each R object contains an external pointer to an object managed by the protobuf C++ library, and the R objects make calls into more than 100 C++ functions that provide the glue code between the R language classes and the underlying C++ classes.
The Rcpp package (Eddelbuettel and François 2011;Eddelbuettel 2013) is used to facilitate this integration of the R and C++ code for these objects.Each method is wrapped individually which allows us to add user-friendly custom error handling, type coercion, and performance improvements at the cost of a more verbose implementation.The RProtoBuf package in many ways motivated the development of Rcpp Modules (Eddelbuettel and François 2013), which provide a more concise way of wrapping C++ functions and classes in a single entity.
The RProtoBuf package supports two forms for calling functions with these S4 classes: • The functional dispatch mechanism of the the form method(object, arguments) (common to R), and • The traditional object oriented notation object$method(arguments).
Additionally, RProtoBuf supports tab completion for all classes.Completion possibilities include pseudo-method names for all classes, plus dynamic dispatch on names or types specific

Messages
The Message S4 class represents Protocol Buffer Messages and is the core abstraction of RProtoBuf.Each Message contains a pointer to a Descriptor which defines the schema of the data defined in the Message, as well as a number of FieldDescriptors for the individual fields of the message.A complete list of the slots and methods for Messages is available in Table 3.

Descriptors
Descriptors describe the type of a Message.This includes what fields a message contains and what the types of those fields are.Message descriptors are represented in R by the Descriptor S4 class.The class contains the slots pointer and type.Similarly to messages, the $ operator can be used to retrieve descriptors that are contained in the descriptor, or invoke pseudo-methods.
When RProtoBuf is first loaded it calls readProtoFiles to read in the example addressbook.protofile included with the package.The tutorial.Person descriptor and all other descriptors defined in the loaded .protofiles are then available on the search path 3 .

Field descriptors
The class FieldDescriptor represents field descriptors in R.This is a wrapper S4 class around the google::protobuf::FieldDescriptor C++ class.Table 5 describes the methods defined for the FieldDescriptor class.

Enum descriptors
The class EnumDescriptor represents enum descriptors in R.This is a wrapper S4 class around the google::protobuf::EnumDescriptor C++ class.Table 6 describes the methods defined for the EnumDescriptor class.
The $ operator can be used to retrieve the value of enum constants contained in the EnumDescriptor, or to invoke pseudo-methods.

Enum value descriptors
The class EnumValueDescriptor represents enumeration value descriptors in R.This is a wrapper S4 class around the google::protobuf::EnumValueDescriptor C++ class.Table 7 describes the methods defined for the EnumValueDescriptor class.
The $ operator can be used to invoke pseudo-methods.

File descriptors
The class FileDescriptor represents file descriptors in R.This is a wrapper S4 class around the google::protobuf::FileDescriptor C++ class.Table 8 describes the methods defined for the FileDescriptor class.
The $ operator can be used to retrieve named fields defined in the FileDescriptor, or to invoke pseudo-methods.

Type coercion
One of the benefits of using an Interface Definition Language (IDL) like Protocol Buffers is that it provides a highly portable basic type system.This permits different language and hardware implementations to map to the most appropriate type in different environments.
Table 9 details the correspondence between the field type and the type of data that is retrieved by $ and [[ extractors.Three types in particular need further attention due to specific differences in the R language: booleans, unsigned integers, and 64-bit integers.Table 9: Correspondence between field type and R type retrieved by the extractors.Note that R lacks native 64-bit integers, so the RProtoBuf.int64AsStringoption is available to return large integers as characters to avoid losing precision.This option is described in Section 5.3.

Unsigned integers
R lacks a native unsigned integer type.Values between 2 31 and 2 32 − 1 read from unsigned integer Protocol Buffer fields must be stored as doubles in R.

64-bit integers
R also does not support the native 64-bit integer type.Numeric vectors with values ≥ 2 31 can only be stored as doubles, which have limited precision.Thereby R loses the ability to distinguish some distinct integers: [1] TRUE However, most modern languages do have support for 64-bit integers, which becomes problematic when RProtoBuf is used to exchange data with a system that requires this integer type.To work around this, RProtoBuf allows users to get and set 64-bit integer values by specifying them as character strings.
If we try to set an int64 field in R to double values, we lose precision: R> test <-new(protobuf_unittest.TestAllTypes) R> test$repeated_int64 <-c(2^53, 2^53+1) R> length(unique(test$repeated_int64)) [1] 1 But when the values are specified as character strings, RProtoBuf will automatically coerce them into a true 64-bit integer types before storing them in the Protocol Buffer message: R> test$repeated_int64 <-c("9007199254740992", "9007199254740993") When reading the value back into R, numeric types are returned by default, but when the full precision is required a character value will be returned if the RProtoBuf.int64AsStringoption is set to TRUE.The character values are useful because they can accurately be used as unique identifiers and can easily be passed to R packages such as int64 (François 2011) or bit64 (Oehlschlägel 2012) which represent 64-bit integers in R.

Converting R data structures into Protocol Buffers
The previous sections discussed functionality in the RProtoBuf package for creating, manipulating, parsing, and serializing Protocol Buffer messages of a defined schema.This is useful when there are pre-existing systems with defined schemas or significant software components written in other languages that need to be accessed from within R. The package also provides methods for converting arbitrary R data structures into Protocol Buffers and vice versa with a universal R object schema.The serialize_pb and unserialize_pb functions serialize arbitrary R objects into a universal Protocol Buffer message: R> msg <-serialize_pb(iris, NULL) R> identical(iris, unserialize_pb(msg)) [1] TRUE In order to accomplish this, RProtoBuf uses the same catch-all proto schema used by RHIPE for exchanging R data with Hadoop (Guha 2010).This schema, which we will refer to as rexp.proto, is printed in the appendix.The Protocol Buffer messages generated by RPro-toBuf and RHIPE are naturally compatible between the two systems because they use the same schema.This shows the power of using a schema-based cross-platform format such as Protocol Buffers: interoperability is achieved without effort or close coordination.
The rexp.proto schema supports all main R storage types holding data.These include NULL, list and vectors of type logical, character, double, integer, and complex.In addition, every type can contain a named set of attributes, as is the case in R. The rexp.proto schema does not support some of the special R specific storage types, such as function, language or environment.Such objects have no native equivalent type in Protocol Buffers, and have little meaning outside the context of R. When serializing R objects using serialize_pb, values or attributes of unsupported types are skipped with a warning.If the user really wishes to serialize these objects, they need to be converted into a supported type.For example, the can use deparse to convert functions or language objects into strings, or as.list for environments.

Evaluation: Converting R data sets
To illustrate how this method works, we attempt to convert all of the built-in data sets from R into this serialized Protocol Buffer representation.

R> n <-nrow(datasets)
There are 206 standard data sets included in the datasets package included with R.These data sets include data frames, matrices, time series, tables lists, and some more exotic data classes.The can_serialize_pb method is used to determine which of those can fully be converted to the rexp.protoProtocol Buffer representation.This method simply checks if any of the values or attributes in an object is of an unsupported type: R> m <-sum(sapply(datasets$name, function(x) can_serialize_pb(get(x)))) 192 data sets can be converted to Protocol Buffers without loss of information (93%).Upon closer inspection, all other data sets are objects of class nfnGroupedData.This class represents a special type of data frame that has some additional attributes (such as a formula object) used by the nlme package (Pinheiro et al. 2013).Because formulas are R language objects, they have little meaning to other systems, and are not supported by the rexp.protodescriptor.
When serialize_pb is used on objects of this class, it will serialize the data frame and all attributes, except for the formula.

Compression performance
This section compares how many bytes are used to store data sets using four different methods: • normal R serialization (Tierney 2003), • R serialization followed by gzip, • normal Protocol Buffer serialization, and • Protocol Buffer serialization followed by gzip.
Table 10 shows the sizes of 50 sample R data sets as returned by object.size()compared to the serialized sizes.Note that Protocol Buffer serialization results in slightly smaller byte streams compared to native R serialization in most cases, but this difference disappears if the results are compressed with gzip.One takeaway from this table is that the universal R object schema included in RProtoBuf does not in general provide any significant saving in file size compared to the normal serialization mechanism in R. The benefits of RProtoBuf accrue more naturally in applications where multiple programming languages are involved, or when a more concise  application-specific schema has been defined.The example in the next section satisfies both of these conditions.

Application: Distributed data collection with MapReduce
Many large data sets in fields such as particle physics and information processing are stored in binned or histogram form in order to reduce the data storage requirements (Scott 2009).
In the last decade, the MapReduce programming model (Dean and Ghemawat 2008) has emerged as a popular design pattern that enables the processing of very large data sets on large compute clusters.
Many types of data analysis over large data sets may involve very rare phenomenon or deal with highly skewed data sets or inflexible raw data storage systems from which unbiased sampling is not feasible.In such situations, MapReduce and binning may be combined as a pre-processing step for a wide range of statistical and scientific analyses (Blocker and Meng 2013).
There are two common patterns for generating histograms of large data sets in a single pass with MapReduce.In the first method, each mapper task generates a histogram over a subset of the data that it has been assigned, serializes this histogram and sends it to one or more reducer tasks which merge the intermediate histograms from the mappers.
In the second method, illustrated in Figure 2, each mapper rounds a data point to a bucket width and outputs that bucket as a key and '1' as a value.Reducers then sum up all of the values with the same key and output to a data store.In both methods, the mapper tasks must choose identical bucket boundaries in advance if we are to construct the histogram in a single pass, even though they are analyzing disjoint parts of the input set that may cover different ranges.All distributed tasks involved in the pre-processing as well as any downstream data analysis tasks must share a schema of the histogram representation to coordinate effectively.
The HistogramTools package (Stokely 2013)  One of the authors has used this design pattern for several large-scale studies of distributed storage systems (Stokely et al. 2012;Albrecht et al. 2013).

Application: Data Interchange in web Services
As described earlier, the primary application of Protocol Buffers is data interchange in the context of inter-system communications.Network protocols such as HTTP provide mechanisms for client-server communication, i.e., how to initiate requests, authenticate, send messages, etc.However, network protocols generally do not regulate the content of messages: they allow transfer of any media type, such as web pages, static files or multimedia content.When designing systems where various components require exchange of specific data structures, we need something on top of the network protocol that prescribes how these structures are to be represented in messages (buffers) on the network.Protocol Buffers solve exactly this problem by providing a cross-platform method for serializing arbitrary structures into well defined messages, which can then be exchanged using any protocol.The descriptors (.proto files) are used to formally define the interface of a remote API or network application.Libraries to parse and generate protobuf messages are available for many programming languages, making it relatively straightforward to implement clients and servers.

Interacting with R through HTTPS and Protocol Buffers
One example of a system that supports Protocol Buffers to interact with R is OpenCPU (Ooms 2013).OpenCPU is a framework for embedded statistical computation and reproducible research based on R and L A T E X.It exposes a HTTP(S) API to access and manipulate R objects and allows for performing remote R function calls.Clients do not need to understand or generate any R code: HTTP requests are automatically mapped to function calls, and arguments/return values can be posted/retrieved using several data interchange formats, such as Protocol Buffers.OpenCPU uses the serialize_pb and unserialize_pb functions from the RProtoBuf package to convert between R objects and protobuf messages.Therefore, clients need the rexp.protodescriptor mentioned earlier to parse and generate protobuf messages when interacting with OpenCPU.

HTTP GET: Retrieving an R object
The HTTP GET method is used to read a resource from OpenCPU.For example, to access the data set Animals from the package MASS, a client performs the following HTTP request: GET https://public.opencpu.org/ocpu/library/MASS/data/Animals/pb The postfix /pb in the URL tells the server to send this object in the form of a protobuf message.Alternative formats include /json, /csv, /rds and others.If the request is successful, OpenCPU returns the serialized object with HTTP status code 200 and HTTP response header Content-Type: application/x-protobuf.The latter is the conventional MIME type that formally notifies the client to interpret the response as a protobuf message.
Because both HTTP and Protocol Buffers have libraries available for many languages, clients can be implemented in just a few lines of code.Below is example code for both R and Python that retrieves a data set from R with OpenCPU using a protobuf message.In R, we use the HTTP client from the httr package (Wickham 2012).In this example we download a data set which is part of the base R distribution, so we can verify that the object was transferred without loss of information.
R> # Load packages R> library(RProtoBuf) R> library(httr) R> # Retrieve and parse message R> req <-GET('https://public.opencpu.org/ocpu/library/MASS/data/Animals/pb')R> output <-unserialize_pb(req$content) R> # Check that no information was lost R> identical(output, MASS::Animals) This code suggests a method for exchanging objects between R servers, however this might as well be done without Protocol Buffers.The main advantage of using an inter-operable format is that we can actually access R objects from within another programming language.For example, in a very similar fashion we can retrieve the same data set in a Python client.To parse messages in Python, we first compile the rexp.protodescriptor into a python module using the protoc compiler: protoc rexp.proto--python_out=.This generates Python module called rexp_pb2.py,containing both the descriptor information as well as methods to read and manipulate the R object message.In the example below we use the HTTP client from the urllib2 module.

HTTP POST: Calling an R function
The example above shows how the HTTP GET method retrieves a resource from OpenCPU, for example an R object.The HTTP POST method on the other hand is used for calling functions and running scripts, which is the primary purpose of the framework.As before, the /pb postfix requests to retrieve the output as a protobuf message, in this case the function return value.However, OpenCPU allows us to supply the arguments of the function call in the form of protobuf messages as well.This is a bit more work, because clients needs to both generate messages containing R objects to post to the server, as well as retrieve and parse protobuf messages returned by the server.Using Protocol Buffers to post function arguments is not required, and for simple (scalar) arguments the standard application/x-www-form-urlencoded format might be sufficient.However, with Protocol Buffers the client can perform function calls with more complex arguments such as R vectors or lists.The result is a complete RPC system to do arbitrary R function calls from within any programming language.

Summary
Over the past decade, many formats for interoperable data exchange have become available, each with their unique features, strengths and weaknesses.Text based formats such as CSV and JSON are easy to use, and will likely remain popular among statisticians for many years to come.However, in the context of increasingly complex analysis stacks and applications involving distributed computing as well as mixed language analysis pipelines, choosing a more sophisticated data interchange format may reap considerable benefits.The Protocol Buffers standard and library offer a unique combination of features, performance, and maturity, that seems particularly well suited for data-driven applications and numerical computing.
The RProtoBuf package builds on the Protocol Buffers C++ library, and extends the R system with the ability to create, read, write, parse, and manipulate Protocol Buffer messages.RProtoBuf has been used extensively inside Google for the past three years by statisticians, analysts, and software engineers.At the time of this writing there are over 300 active users of RProtoBuf using it to read data from and otherwise interact with distributed systems written in C++, Java, Python, and other languages.We hope that making Protocol Buffers available to the R community will contribute towards better software integration and allow for building even more advanced applications and analysis pipelines with R.

Table 2 :
Overview of class, slot, method and dispatch relationships Slot Description pointer External pointer to the Message object of the C++ protobuf library.Documentation for the Message class is available from the Protocol Buffer project page.typeFully qualified name of the message.For example a Person message has its type slot set to tutorial.Person Method Description has Indicates if a message has a given field.cloneCreates a clone of the message isInitialized Indicates if a message has all its required fields set serialize serialize a message to a file, binary connection, or raw vector clear Clear one or several fields of a message, or the entire message sizeThe number of elements in a message field bytesize The number of bytes the message would take once serialized swap swap elements of a repeated field of a message set set elements of a repeated field fetch fetch elements of a repeated field setExtension set an extension of a message getExtension get the value of an extension of a message add add elements to a repeated field str the R structure of the message as.character character representation of a message toString character representation of a message (same as as.character) as.list converts message to a named R list update updates several fields of a message at once descriptor get the descriptor of the message type of this message fileDescriptor get the file descriptor of this message's descriptor

Table 3 :
Description of slots and methods for the Message S4 class to a given object.This functionality is implemented with the .DollarNames S3 generic function defined in the utils package.
Person' " R> tutorial.Person$PhoneType # enum descriptor [1] "descriptor for enum 'PhoneType' of type 'tutorial.Person' with 3 values"Table 4 provides a complete list of the slots and available methods for Descriptors.

The
EnumDescriptor contains information about what values this type defines, while the EnumValueDescriptor describes an individual enum constant of a particular type.Descriptor object of the C++ proto library.Documentation for the Descriptor class is available from the Protocol Buffer project page.type Fully qualified path of the message type.

Table 4 :
Description of slots and methods for the Descriptor S4 class cpp_type Gets the C++ type of the field.label Gets the label of a field (optional, required, or repeated).is_repeated Return TRUE if this field is repeated.is_required Return TRUE if this field is required.is_optional Return TRUE if this field is optional.has_default_value Return TRUE if this field has a default value.default_value Return the default value.message_type Return the message type if this is a message type field.enum_type Return the enum type if this is an enum type field.

Table 5 :
Description of slots and methods for the FieldDescriptor S4 class

Table 6 :
Description of slots and methods for the EnumDescriptor S4 class

Table 7 :
Description of slots and methods for the EnumValueDescriptor S4 class FileDescriptor object of the C++ proto library.Documentation for the FileDescriptor class is available from the Protocol Buffer project page: http://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.descriptor.html#FileDescriptorfilename fully qualified pathname of the .protofile.package package name defined in this .protofile.

Table 8 :
Description of slots and methods for the FileDescriptor S4 class [1] "descriptor for type 'tutorial.Person' " However, most other languages, including the Protocol Buffer schema, only accept TRUE or FALSE.This means that we simply can not store R logical vectors that include all three possible values as booleans.The library will refuse to store NAs in Protocol Buffer boolean fields, and users must instead choose another type (such as enum or integer) capable of storing three distinct values.
R booleans can accept three values: TRUE, FALSE, and NA.Error: NA boolean values can not be stored in bool Protocol Buffer fields

Table 10 :
Serialization sizes for default serialization in R and RProtoBuf for 50 R data sets.

Histogram of Example Histogram Created in Python
enhances RProtoBuf by providing a concise schema for R histogram objects:This HistogramState message type is designed to be helpful if some of the Map or Reduce tasks are written in R, or if those components are written in other languages and only the resulting output histograms need to be manipulated in R. For example, to create Histogram-State messages in Python for later consumption by R, we first compile the histogram.protodescriptorintoa python module using the protoc compiler:protoc histogram.proto--python_out=.This generates a Python module called histogram_pb2.py, containing both the descriptor information as well as methods to read and manipulate the histogram message data.The following simple Python script uses this generated module to create a histogram and write out the Protocol Buffer representation to a file: