Package 'tfio' reference manual

Title:	Interface to 'TensorFlow IO'
Description:	Interface to 'TensorFlow IO', Datasets and filesystem extensions maintained by `TensorFlow SIG-IO` <https://github.com/tensorflow/community/blob/master/sigs/io/CHARTER.md>.
Authors:	TensorFlow IO Contributors [aut, cph] (Full list of contributors can be found at <https://github.com/tensorflow/io/graphs/contributors>), Yuan Tang [aut, cre] , TensorFlow Authors [cph], Ant Financial [cph], RStudio [cph]
Maintainer:	Yuan Tang <[email protected]>
License:	Apache License 2.0
Version:	0.4.1
Built:	2025-02-18 04:25:51 UTC
Source:	https://github.com/cran/tfio

Creates a `ArrowFeatherDataset`.

Description

An Arrow Dataset for reading record batches from Arrow feather files. Feather is a light-weight columnar format ideal for simple writing of Pandas DataFrames.

Usage

arrow_feather_dataset(filenames, columns, output_types, output_shapes = NULL)
arrow_feather_dataset(filenames, columns, output_types, output_shapes = NULL)

Arguments

`filenames`	A `tf.string` tensor, list or scalar containing files in Arrow Feather format.
`columns`	A list of column indices to be used in the Dataset.
`output_types`	Tensor dtypes of the output tensors.
`output_shapes`	TensorShapes of the output tensors or `NULL` to infer partial.

Examples

## Not run: 
dataset <- arrow_feather_dataset(
    list('/path/to/a.feather', '/path/to/b.feather'),
    columns = reticulate::tuple(0L, 1L),
    output_types = reticulate::tuple(tf$int32, tf$float32),
    output_shapes = reticulate::tuple(list(), list())) %>%
  dataset_repeat(1)

sess <- tf$Session()
iterator <- make_iterator_one_shot(dataset)
next_batch <- iterator_get_next(iterator)

until_out_of_range({
  batch <- sess$run(next_batch)
  print(batch)
})

## End(Not run)

## Not run: 
dataset <- arrow_feather_dataset(
    list('/path/to/a.feather', '/path/to/b.feather'),
    columns = reticulate::tuple(0L, 1L),
    output_types = reticulate::tuple(tf$int32, tf$float32),
    output_shapes = reticulate::tuple(list(), list())) %>%
  dataset_repeat(1)

sess <- tf$Session()
iterator <- make_iterator_one_shot(dataset)
next_batch <- iterator_get_next(iterator)

until_out_of_range({
  batch <- sess$run(next_batch)
  print(batch)
})

## End(Not run)

Creates a `ArrowStreamDataset`.

Description

An Arrow Dataset for reading record batches from an input stream. Currently supported input streams are a socket client or stdin.

Usage

arrow_stream_dataset(host, columns, output_types, output_shapes = NULL)
arrow_stream_dataset(host, columns, output_types, output_shapes = NULL)

Arguments

`host`	A `tf.string` tensor or string defining the input stream. For a socket client, use "<HOST_IP>:<PORT>", for stdin use "STDIN".
`columns`	A list of column indices to be used in the Dataset.
`output_types`	Tensor dtypes of the output tensors.
`output_shapes`	TensorShapes of the output tensors or `NULL` to infer partial.

Examples

## Not run: 
dataset <- arrow_stream_dataset(
    host,
    columns = reticulate::tuple(0L, 1L),
    output_types = reticulate::tuple(tf$int32, tf$float32),
    output_shapes = reticulate::tuple(list(), list())) %>%
  dataset_repeat(1)

sess <- tf$Session()
iterator <- make_iterator_one_shot(dataset)
next_batch <- iterator_get_next(iterator)

until_out_of_range({
  batch <- sess$run(next_batch)
  print(batch)
})

## End(Not run)

## Not run: 
dataset <- arrow_stream_dataset(
    host,
    columns = reticulate::tuple(0L, 1L),
    output_types = reticulate::tuple(tf$int32, tf$float32),
    output_shapes = reticulate::tuple(list(), list())) %>%
  dataset_repeat(1)

sess <- tf$Session()
iterator <- make_iterator_one_shot(dataset)
next_batch <- iterator_get_next(iterator)

until_out_of_range({
  batch <- sess$run(next_batch)
  print(batch)
})

## End(Not run)

Create an Arrow Dataset from the given Arrow schema.

Description

Infer output types and shapes from the given Arrow schema and create an Arrow Dataset.

Usage

from_schema(object, ...)
from_schema(object, ...)

Arguments

`object`	An R object.
`...`	Optional arguments passed on to implementing methods.

Create an Arrow Dataset for reading record batches from Arrow feather files, inferring output types and shapes from the given Arrow schema.

Description

Create an Arrow Dataset for reading record batches from Arrow feather files, inferring output types and shapes from the given Arrow schema.

Usage

## S3 method for class 'arrow_feather_dataset'
from_schema(object, schema, columns = NULL, host = NULL, filenames = NULL, ...)
## S3 method for class 'arrow_feather_dataset'
from_schema(object, schema, columns = NULL, host = NULL, filenames = NULL, ...)

Arguments

`object`	An R object.
`schema`	Arrow schema defining the record batch data in the stream.
`columns`	A list of column indices to be used in the Dataset.
`host`	Not used.
`filenames`	A `tf.string` tensor, list or scalar containing files in Arrow Feather format.
`...`	Optional arguments passed on to implementing methods.

Create an Arrow Dataset from an input stream, inferring output types and shapes from the given Arrow schema.

Description

Create an Arrow Dataset from an input stream, inferring output types and shapes from the given Arrow schema.

Usage

## S3 method for class 'arrow_stream_dataset'
from_schema(object, schema, columns = NULL, host = NULL, filenames = NULL, ...)
## S3 method for class 'arrow_stream_dataset'
from_schema(object, schema, columns = NULL, host = NULL, filenames = NULL, ...)

Arguments

`object`	An R object.
`schema`	Arrow schema defining the record batch data in the stream.
`columns`	A list of column indices to be used in the Dataset.
`host`	A `tf.string` tensor or string defining the input stream. For a socket client, use "<HOST_IP>:<PORT>", for stdin use "STDIN".
`filenames`	Not used.
`...`	Optional arguments passed on to implementing methods.

Create a `IgniteDataset`.

Description

Apache Ignite is a memory-centric distributed database, caching, and processing platform for transactional, analytical, and streaming workloads, delivering in-memory speeds at petabyte scale. This contrib package contains an integration between Apache Ignite and TensorFlow. The integration is based on tf.data from TensorFlow side and Binary Client Protocol from Apache Ignite side. It allows to use Apache Ignite as a datasource for neural network training, inference and all other computations supported by TensorFlow. Ignite Dataset is based on Apache Ignite Binary Client Protocol.

Usage

ignite_dataset(
  cache_name,
  host = "localhost",
  port = 10800,
  local = FALSE,
  part = -1,
  page_size = 100,
  username = NULL,
  password = NULL,
  certfile = NULL,
  keyfile = NULL,
  cert_password = NULL
)
ignite_dataset(
  cache_name,
  host = "localhost",
  port = 10800,
  local = FALSE,
  part = -1,
  page_size = 100,
  username = NULL,
  password = NULL,
  certfile = NULL,
  keyfile = NULL,
  cert_password = NULL
)

Arguments

`cache_name`	Cache name to be used as datasource.
`host`	Apache Ignite Thin Client host to be connected.
`port`	Apache Ignite Thin Client port to be connected.
`local`	Local flag that defines to query only local data.
`part`	Number of partitions to be queried.
`page_size`	Apache Ignite Thin Client page size.
`username`	Apache Ignite Thin Client authentication username.
`password`	Apache Ignite Thin Client authentication password.
`certfile`	File in PEM format containing the certificate as well as any number of CA certificates needed to establish the certificate's authenticity.
`keyfile`	File containing the private key (otherwise the private key will be taken from certfile as well).
`cert_password`	Password to be used if the private key is encrypted and a password is necessary.

Examples

## Not run: 
dataset <- ignite_dataset(
    cache_name = "SQL_PUBLIC_TEST_CACHE", port = 10800) %>%
  dataset_repeat(1)

sess <- tf$Session()
iterator <- make_iterator_one_shot(dataset)
next_batch <- iterator_get_next(iterator)

until_out_of_range({
  batch <- sess$run(next_batch)
  print(batch)
})

## End(Not run)

## Not run: 
dataset <- ignite_dataset(
    cache_name = "SQL_PUBLIC_TEST_CACHE", port = 10800) %>%
  dataset_repeat(1)

sess <- tf$Session()
iterator <- make_iterator_one_shot(dataset)
next_batch <- iterator_get_next(iterator)

until_out_of_range({
  batch <- sess$run(next_batch)
  print(batch)
})

## End(Not run)

Creates a `KafkaDataset`.

Description

Creates a KafkaDataset.

Usage

kafka_dataset(
  topics,
  servers = "localhost",
  group = "",
  eof = FALSE,
  timeout = 1000
)
kafka_dataset(
  topics,
  servers = "localhost",
  group = "",
  eof = FALSE,
  timeout = 1000
)

Arguments

`topics`	A `tf.string` tensor containing one or more subscriptions, in the format of `⁠[topic:partition:offset:length]⁠`, by default length is -1 for unlimited.
`servers`	A list of bootstrap servers.
`group`	The consumer group id.
`eof`	If True, the kafka reader will stop on EOF.
`timeout`	The timeout value for the Kafka Consumer to wait (in millisecond).

Examples

## Not run: 
dataset <- kafka_dataset(
    topics = list("test:0:0:4"), group = "test", eof = TRUE) %>%
  dataset_repeat(1)

sess <- tf$Session()
iterator <- make_iterator_one_shot(dataset)
next_batch <- iterator_get_next(iterator)

until_out_of_range({
  batch <- sess$run(next_batch)
  print(batch)
})

## End(Not run)

## Not run: 
dataset <- kafka_dataset(
    topics = list("test:0:0:4"), group = "test", eof = TRUE) %>%
  dataset_repeat(1)

sess <- tf$Session()
iterator <- make_iterator_one_shot(dataset)
next_batch <- iterator_get_next(iterator)

until_out_of_range({
  batch <- sess$run(next_batch)
  print(batch)
})

## End(Not run)

Creates a `KinesisDataset`.

Description

Kinesis is a managed service provided by AWS for data streaming. This dataset reads messages from Kinesis with each message presented as a tf.string.

Usage

kinesis_dataset(stream, shard = "", read_indefinitely = TRUE, interval = 1e+05)
kinesis_dataset(stream, shard = "", read_indefinitely = TRUE, interval = 1e+05)

Arguments

`stream`	A `tf.string` tensor containing the name of the stream.
`shard`	A `tf.string` tensor containing the id of the shard.
`read_indefinitely`	If `True`, the Kinesis dataset will keep retry again on `EOF` after the `interval` period. If `False`, then the dataset will stop on `EOF`. The default value is `True`.
`interval`	The interval for the Kinesis Client to wait before it tries to get records again (in millisecond).

Create a `LMDBDataset`.

Description

This function allows a user to read data from a LMDB file. A lmdb file consists of (key value) pairs sequentially.

Usage

lmdb_dataset(filenames)
lmdb_dataset(filenames)

Arguments

filenames

A tf.string tensor containing one or more filenames.

Examples

## Not run: 
dataset <- sequence_file_dataset("testdata/data.mdb") %>%
  dataset_repeat(1)

sess <- tf$Session()
iterator <- make_iterator_one_shot(dataset)
next_batch <- iterator_get_next(iterator)

until_out_of_range({
  batch <- sess$run(next_batch)
  print(batch)
})

## End(Not run)

## Not run: 
dataset <- sequence_file_dataset("testdata/data.mdb") %>%
  dataset_repeat(1)

sess <- tf$Session()
iterator <- make_iterator_one_shot(dataset)
next_batch <- iterator_get_next(iterator)

until_out_of_range({
  batch <- sess$run(next_batch)
  print(batch)
})

## End(Not run)

Create a Dataset from LibSVM files.

Description

Create a Dataset from LibSVM files.

Usage

make_libsvm_dataset(
  file_names,
  num_features,
  dtype = NULL,
  label_dtype = NULL,
  batch_size = 1,
  compression_type = "",
  buffer_size = NULL,
  num_parallel_parser_calls = NULL,
  drop_final_batch = FALSE,
  prefetch_buffer_size = 0
)
make_libsvm_dataset(
  file_names,
  num_features,
  dtype = NULL,
  label_dtype = NULL,
  batch_size = 1,
  compression_type = "",
  buffer_size = NULL,
  num_parallel_parser_calls = NULL,
  drop_final_batch = FALSE,
  prefetch_buffer_size = 0
)

Arguments

`file_names`	A `tf.string` tensor containing one or more filenames.
`num_features`	The number of features.
`dtype`	The type of the output feature tensor. Default to `tf.float32`.
`label_dtype`	The type of the output label tensor. Default to `tf.int64`.
`batch_size`	An integer representing the number of records to combine in a single batch, default 1.
`compression_type`	A `tf.string` scalar evaluating to one of `""` (no compression), `"ZLIB"`, or `"GZIP"`.
`buffer_size`	A `tf.int64` scalar denoting the number of bytes to buffer. A value of 0 results in the default buffering values chosen based on the compression type.
`num_parallel_parser_calls`	Number of parallel records to parse in parallel. Defaults to an automatic selection.
`drop_final_batch`	Whether the last batch should be dropped in case its size is smaller than `batch_size`; the default behavior is not to drop the smaller batch.
`prefetch_buffer_size`	An integer specifying the number of feature batches to prefetch for performance improvement. Defaults to auto-tune. Set to 0 to disable prefetching.

Creates a `MNISTImageDataset`.

Description

This creates a dataset for MNIST images.

Usage

mnist_image_dataset(filenames, compression_type = NULL)
mnist_image_dataset(filenames, compression_type = NULL)

Arguments

`filenames`	A `tf.string` tensor containing one or more filenames.
`compression_type`	A `tf.string` scalar evaluating to one of `""` (no compression), `"ZLIB"`, or `"GZIP"`.

Creates a `MNISTLabelDataset`.

Description

This creates a dataset for MNIST labels.

Usage

mnist_label_dataset(filenames, compression_type = NULL)
mnist_label_dataset(filenames, compression_type = NULL)

Arguments

`filenames`	A `tf.string` tensor containing one or more filenames.
`compression_type`	A `tf.string` scalar evaluating to one of `""` (no compression), `"ZLIB"`, or `"GZIP"`.

Create a `ParquetDataset`.

Description

This allows a user to read data from a parquet file.

Usage

parquet_dataset(filenames, columns, output_types)
parquet_dataset(filenames, columns, output_types)

Arguments

`filenames`	A 0-D or 1-D `tf.string` tensor containing one or more filenames.
`columns`	A 0-D or 1-D `tf.int32` tensor containing the columns to extract.
`output_types`	A tuple of `tf.DType` objects representing the types of the columns returned.

Examples

## Not run: 
dtypes <- tf$python$framework$dtypes
output_types <- reticulate::tuple(
  dtypes$bool, dtypes$int32, dtypes$int64, dtypes$float32, dtypes$float64)
dataset <- parquet_dataset(
    filenames = list("testdata/parquet_cpp_example.parquet"),
    columns = list(0, 1, 2, 4, 5),
    output_types = output_types) %>%
  dataset_repeat(2)

sess <- tf$Session()
iterator <- make_iterator_one_shot(dataset)
next_batch <- iterator_get_next(iterator)

until_out_of_range({
  batch <- sess$run(next_batch)
  print(batch)
})

## End(Not run)

## Not run: 
dtypes <- tf$python$framework$dtypes
output_types <- reticulate::tuple(
  dtypes$bool, dtypes$int32, dtypes$int64, dtypes$float32, dtypes$float64)
dataset <- parquet_dataset(
    filenames = list("testdata/parquet_cpp_example.parquet"),
    columns = list(0, 1, 2, 4, 5),
    output_types = output_types) %>%
  dataset_repeat(2)

sess <- tf$Session()
iterator <- make_iterator_one_shot(dataset)
next_batch <- iterator_get_next(iterator)

until_out_of_range({
  batch <- sess$run(next_batch)
  print(batch)
})

## End(Not run)

Creates a `PubSubDataset`.

Description

This creates a dataset for consuming PubSub messages.

Usage

pubsub_dataset(subscriptions, server = NULL, eof = FALSE, timeout = 1000)
pubsub_dataset(subscriptions, server = NULL, eof = FALSE, timeout = 1000)

Arguments

`subscriptions`	A `tf.string` tensor containing one or more subscriptions.
`server`	The pubsub server.
`eof`	If True, the pubsub reader will stop on EOF.
`timeout`	The timeout value for the PubSub to wait (in millisecond).

Create a `SequenceFileDataset`.

Description

This function allows a user to read data from a hadoop sequence file. A sequence file consists of (key value) pairs sequentially. At the moment, org.apache.hadoop.io.Text is the only serialization type being supported, and there is no compression support.

Usage

sequence_file_dataset(filenames)
sequence_file_dataset(filenames)

Arguments

filenames

A tf.string tensor containing one or more filenames.

Examples

## Not run: 
dataset <- sequence_file_dataset("testdata/string.seq") %>%
  dataset_repeat(1)

sess <- tf$Session()
iterator <- make_iterator_one_shot(dataset)
next_batch <- iterator_get_next(iterator)

until_out_of_range({
  batch <- sess$run(next_batch)
  print(batch)
})

## End(Not run)

## Not run: 
dataset <- sequence_file_dataset("testdata/string.seq") %>%
  dataset_repeat(1)

sess <- tf$Session()
iterator <- make_iterator_one_shot(dataset)
next_batch <- iterator_get_next(iterator)

until_out_of_range({
  batch <- sess$run(next_batch)
  print(batch)
})

## End(Not run)

TensorFlow IO API for R

Description

This library provides an R interface to the TensorFlow IO API that provides datasets and filesystem extensions maintained by SIG-IO.

Create a `TIFFDataset`.

Description

A TIFF Image File Dataset that reads the TIFF file.

Usage

tiff_dataset(filenames)
tiff_dataset(filenames)

Arguments

filenames

A tf.string tensor containing one or more filenames.

Examples

## Not run: 
dataset <- tiff_dataset(
    filenames = list("testdata/small.tiff")) %>%
  dataset_repeat(1)

sess <- tf$Session()
iterator <- make_iterator_one_shot(dataset)
next_batch <- iterator_get_next(iterator)

until_out_of_range({
  batch <- sess$run(next_batch)
  print(batch)
})

## End(Not run)

## Not run: 
dataset <- tiff_dataset(
    filenames = list("testdata/small.tiff")) %>%
  dataset_repeat(1)

sess <- tf$Session()
iterator <- make_iterator_one_shot(dataset)
next_batch <- iterator_get_next(iterator)

until_out_of_range({
  batch <- sess$run(next_batch)
  print(batch)
})

## End(Not run)

Create a `VideoDataset` that reads the video file.

Description

This allows a user to read data from a video file with ffmpeg. The output of VideoDataset is a sequence of (height, weight, 3) tensor in rgb24 format.

Usage

video_dataset(filenames)
video_dataset(filenames)

Arguments

filenames

A tf.string tensor containing one or more filenames.

Examples

## Not run: 
dataset <- video_dataset(
    filenames = list("testdata/small.mp4")) %>%
  dataset_repeat(2)

sess <- tf$Session()
iterator <- make_iterator_one_shot(dataset)
next_batch <- iterator_get_next(iterator)

until_out_of_range({
  batch <- sess$run(next_batch)
  print(batch)
})

## End(Not run)

## Not run: 
dataset <- video_dataset(
    filenames = list("testdata/small.mp4")) %>%
  dataset_repeat(2)

sess <- tf$Session()
iterator <- make_iterator_one_shot(dataset)
next_batch <- iterator_get_next(iterator)

until_out_of_range({
  batch <- sess$run(next_batch)
  print(batch)
})

## End(Not run)

Create a `WebPDataset`.

Description

A WebP Image File Dataset that reads the WebP file.

Usage

webp_dataset(filenames)
webp_dataset(filenames)

Arguments

filenames

A tf.string tensor containing one or more filenames.

Examples

## Not run: 
dataset <- webp_dataset(
    filenames = list("testdata/sample.webp")) %>%
  dataset_repeat(1)

sess <- tf$Session()
iterator <- make_iterator_one_shot(dataset)
next_batch <- iterator_get_next(iterator)

until_out_of_range({
  batch <- sess$run(next_batch)
  print(batch)
})

## End(Not run)

## Not run: 
dataset <- webp_dataset(
    filenames = list("testdata/sample.webp")) %>%
  dataset_repeat(1)

sess <- tf$Session()
iterator <- make_iterator_one_shot(dataset)
next_batch <- iterator_get_next(iterator)

until_out_of_range({
  batch <- sess$run(next_batch)
  print(batch)
})

## End(Not run)

Package 'tfio'

Help Index

Creates a ArrowFeatherDataset.

Description

Usage

Arguments

Examples

Creates a ArrowStreamDataset.

Description

Usage

Arguments

Examples

Create an Arrow Dataset from the given Arrow schema.

Description

Usage

Arguments

Create an Arrow Dataset for reading record batches from Arrow feather files, inferring output types and shapes from the given Arrow schema.

Description

Usage

Arguments

Create an Arrow Dataset from an input stream, inferring output types and shapes from the given Arrow schema.

Description

Usage

Arguments

Create a IgniteDataset.

Description

Usage

Arguments

Examples

Creates a KafkaDataset.

Description

Usage

Arguments

Examples

Creates a KinesisDataset.

Description

Usage

Arguments

Create a LMDBDataset.

Description

Usage

Arguments

Examples

Create a Dataset from LibSVM files.

Description

Usage

Arguments

Creates a MNISTImageDataset.

Description

Usage

Arguments

Creates a MNISTLabelDataset.

Description

Usage

Arguments

Create a ParquetDataset.

Description

Usage

Arguments

Examples

Creates a PubSubDataset.

Description

Usage

Arguments

Create a SequenceFileDataset.

Description

Usage

Arguments

Examples

TensorFlow IO API for R

Description

Create a TIFFDataset.

Description

Usage

Arguments

Examples

Create a VideoDataset that reads the video file.

Description

Usage

Arguments

Creates a `ArrowFeatherDataset`.

Creates a `ArrowStreamDataset`.

Create a `IgniteDataset`.

Creates a `KafkaDataset`.

Creates a `KinesisDataset`.

Create a `LMDBDataset`.

Creates a `MNISTImageDataset`.

Creates a `MNISTLabelDataset`.

Create a `ParquetDataset`.

Creates a `PubSubDataset`.

Create a `SequenceFileDataset`.

Create a `TIFFDataset`.

Create a `VideoDataset` that reads the video file.

Create a `WebPDataset`.