Title: | Interface to 'TensorFlow IO' |
---|---|
Description: | Interface to 'TensorFlow IO', Datasets and filesystem extensions maintained by `TensorFlow SIG-IO` <https://github.com/tensorflow/community/blob/master/sigs/io/CHARTER.md>. |
Authors: | TensorFlow IO Contributors [aut, cph] (Full list of contributors can be found at <https://github.com/tensorflow/io/graphs/contributors>), Yuan Tang [aut, cre] , TensorFlow Authors [cph], Ant Financial [cph], RStudio [cph] |
Maintainer: | Yuan Tang <[email protected]> |
License: | Apache License 2.0 |
Version: | 0.4.1 |
Built: | 2024-11-20 03:19:55 UTC |
Source: | https://github.com/cran/tfio |
ArrowFeatherDataset
.An Arrow Dataset for reading record batches from Arrow feather files. Feather is a light-weight columnar format ideal for simple writing of Pandas DataFrames.
arrow_feather_dataset(filenames, columns, output_types, output_shapes = NULL)
arrow_feather_dataset(filenames, columns, output_types, output_shapes = NULL)
filenames |
A |
columns |
A list of column indices to be used in the Dataset. |
output_types |
Tensor dtypes of the output tensors. |
output_shapes |
TensorShapes of the output tensors or |
## Not run: dataset <- arrow_feather_dataset( list('/path/to/a.feather', '/path/to/b.feather'), columns = reticulate::tuple(0L, 1L), output_types = reticulate::tuple(tf$int32, tf$float32), output_shapes = reticulate::tuple(list(), list())) %>% dataset_repeat(1) sess <- tf$Session() iterator <- make_iterator_one_shot(dataset) next_batch <- iterator_get_next(iterator) until_out_of_range({ batch <- sess$run(next_batch) print(batch) }) ## End(Not run)
## Not run: dataset <- arrow_feather_dataset( list('/path/to/a.feather', '/path/to/b.feather'), columns = reticulate::tuple(0L, 1L), output_types = reticulate::tuple(tf$int32, tf$float32), output_shapes = reticulate::tuple(list(), list())) %>% dataset_repeat(1) sess <- tf$Session() iterator <- make_iterator_one_shot(dataset) next_batch <- iterator_get_next(iterator) until_out_of_range({ batch <- sess$run(next_batch) print(batch) }) ## End(Not run)
ArrowStreamDataset
.An Arrow Dataset for reading record batches from an input stream. Currently supported input streams are a socket client or stdin.
arrow_stream_dataset(host, columns, output_types, output_shapes = NULL)
arrow_stream_dataset(host, columns, output_types, output_shapes = NULL)
host |
A |
columns |
A list of column indices to be used in the Dataset. |
output_types |
Tensor dtypes of the output tensors. |
output_shapes |
TensorShapes of the output tensors or |
## Not run: dataset <- arrow_stream_dataset( host, columns = reticulate::tuple(0L, 1L), output_types = reticulate::tuple(tf$int32, tf$float32), output_shapes = reticulate::tuple(list(), list())) %>% dataset_repeat(1) sess <- tf$Session() iterator <- make_iterator_one_shot(dataset) next_batch <- iterator_get_next(iterator) until_out_of_range({ batch <- sess$run(next_batch) print(batch) }) ## End(Not run)
## Not run: dataset <- arrow_stream_dataset( host, columns = reticulate::tuple(0L, 1L), output_types = reticulate::tuple(tf$int32, tf$float32), output_shapes = reticulate::tuple(list(), list())) %>% dataset_repeat(1) sess <- tf$Session() iterator <- make_iterator_one_shot(dataset) next_batch <- iterator_get_next(iterator) until_out_of_range({ batch <- sess$run(next_batch) print(batch) }) ## End(Not run)
Infer output types and shapes from the given Arrow schema and create an Arrow Dataset.
from_schema(object, ...)
from_schema(object, ...)
object |
An R object. |
... |
Optional arguments passed on to implementing methods. |
Create an Arrow Dataset for reading record batches from Arrow feather files, inferring output types and shapes from the given Arrow schema.
## S3 method for class 'arrow_feather_dataset' from_schema(object, schema, columns = NULL, host = NULL, filenames = NULL, ...)
## S3 method for class 'arrow_feather_dataset' from_schema(object, schema, columns = NULL, host = NULL, filenames = NULL, ...)
object |
An R object. |
schema |
Arrow schema defining the record batch data in the stream. |
columns |
A list of column indices to be used in the Dataset. |
host |
Not used. |
filenames |
A |
... |
Optional arguments passed on to implementing methods. |
Create an Arrow Dataset from an input stream, inferring output types and shapes from the given Arrow schema.
## S3 method for class 'arrow_stream_dataset' from_schema(object, schema, columns = NULL, host = NULL, filenames = NULL, ...)
## S3 method for class 'arrow_stream_dataset' from_schema(object, schema, columns = NULL, host = NULL, filenames = NULL, ...)
object |
An R object. |
schema |
Arrow schema defining the record batch data in the stream. |
columns |
A list of column indices to be used in the Dataset. |
host |
A |
filenames |
Not used. |
... |
Optional arguments passed on to implementing methods. |
IgniteDataset
.Apache Ignite is a memory-centric distributed database, caching, and processing platform for transactional, analytical, and streaming workloads, delivering in-memory speeds at petabyte scale. This contrib package contains an integration between Apache Ignite and TensorFlow. The integration is based on tf.data from TensorFlow side and Binary Client Protocol from Apache Ignite side. It allows to use Apache Ignite as a datasource for neural network training, inference and all other computations supported by TensorFlow. Ignite Dataset is based on Apache Ignite Binary Client Protocol.
ignite_dataset( cache_name, host = "localhost", port = 10800, local = FALSE, part = -1, page_size = 100, username = NULL, password = NULL, certfile = NULL, keyfile = NULL, cert_password = NULL )
ignite_dataset( cache_name, host = "localhost", port = 10800, local = FALSE, part = -1, page_size = 100, username = NULL, password = NULL, certfile = NULL, keyfile = NULL, cert_password = NULL )
cache_name |
Cache name to be used as datasource. |
host |
Apache Ignite Thin Client host to be connected. |
port |
Apache Ignite Thin Client port to be connected. |
local |
Local flag that defines to query only local data. |
part |
Number of partitions to be queried. |
page_size |
Apache Ignite Thin Client page size. |
username |
Apache Ignite Thin Client authentication username. |
password |
Apache Ignite Thin Client authentication password. |
certfile |
File in PEM format containing the certificate as well as any number of CA certificates needed to establish the certificate's authenticity. |
keyfile |
File containing the private key (otherwise the private key will be taken from certfile as well). |
cert_password |
Password to be used if the private key is encrypted and a password is necessary. |
## Not run: dataset <- ignite_dataset( cache_name = "SQL_PUBLIC_TEST_CACHE", port = 10800) %>% dataset_repeat(1) sess <- tf$Session() iterator <- make_iterator_one_shot(dataset) next_batch <- iterator_get_next(iterator) until_out_of_range({ batch <- sess$run(next_batch) print(batch) }) ## End(Not run)
## Not run: dataset <- ignite_dataset( cache_name = "SQL_PUBLIC_TEST_CACHE", port = 10800) %>% dataset_repeat(1) sess <- tf$Session() iterator <- make_iterator_one_shot(dataset) next_batch <- iterator_get_next(iterator) until_out_of_range({ batch <- sess$run(next_batch) print(batch) }) ## End(Not run)
KafkaDataset
.Creates a KafkaDataset
.
kafka_dataset( topics, servers = "localhost", group = "", eof = FALSE, timeout = 1000 )
kafka_dataset( topics, servers = "localhost", group = "", eof = FALSE, timeout = 1000 )
topics |
A |
servers |
A list of bootstrap servers. |
group |
The consumer group id. |
eof |
If True, the kafka reader will stop on EOF. |
timeout |
The timeout value for the Kafka Consumer to wait (in millisecond). |
## Not run: dataset <- kafka_dataset( topics = list("test:0:0:4"), group = "test", eof = TRUE) %>% dataset_repeat(1) sess <- tf$Session() iterator <- make_iterator_one_shot(dataset) next_batch <- iterator_get_next(iterator) until_out_of_range({ batch <- sess$run(next_batch) print(batch) }) ## End(Not run)
## Not run: dataset <- kafka_dataset( topics = list("test:0:0:4"), group = "test", eof = TRUE) %>% dataset_repeat(1) sess <- tf$Session() iterator <- make_iterator_one_shot(dataset) next_batch <- iterator_get_next(iterator) until_out_of_range({ batch <- sess$run(next_batch) print(batch) }) ## End(Not run)
KinesisDataset
.Kinesis is a managed service provided by AWS for data streaming.
This dataset reads messages from Kinesis with each message presented
as a tf.string
.
kinesis_dataset(stream, shard = "", read_indefinitely = TRUE, interval = 1e+05)
kinesis_dataset(stream, shard = "", read_indefinitely = TRUE, interval = 1e+05)
stream |
A |
shard |
A |
read_indefinitely |
If |
interval |
The interval for the Kinesis Client to wait before it tries to get records again (in millisecond). |
LMDBDataset
.This function allows a user to read data from a LMDB file. A lmdb file consists of (key value) pairs sequentially.
lmdb_dataset(filenames)
lmdb_dataset(filenames)
filenames |
A |
## Not run: dataset <- sequence_file_dataset("testdata/data.mdb") %>% dataset_repeat(1) sess <- tf$Session() iterator <- make_iterator_one_shot(dataset) next_batch <- iterator_get_next(iterator) until_out_of_range({ batch <- sess$run(next_batch) print(batch) }) ## End(Not run)
## Not run: dataset <- sequence_file_dataset("testdata/data.mdb") %>% dataset_repeat(1) sess <- tf$Session() iterator <- make_iterator_one_shot(dataset) next_batch <- iterator_get_next(iterator) until_out_of_range({ batch <- sess$run(next_batch) print(batch) }) ## End(Not run)
Create a Dataset from LibSVM files.
make_libsvm_dataset( file_names, num_features, dtype = NULL, label_dtype = NULL, batch_size = 1, compression_type = "", buffer_size = NULL, num_parallel_parser_calls = NULL, drop_final_batch = FALSE, prefetch_buffer_size = 0 )
make_libsvm_dataset( file_names, num_features, dtype = NULL, label_dtype = NULL, batch_size = 1, compression_type = "", buffer_size = NULL, num_parallel_parser_calls = NULL, drop_final_batch = FALSE, prefetch_buffer_size = 0 )
file_names |
A |
num_features |
The number of features. |
dtype |
The type of the output feature tensor. Default to |
label_dtype |
The type of the output label tensor. Default to
|
batch_size |
An integer representing the number of records to combine in a single batch, default 1. |
compression_type |
A |
buffer_size |
A |
num_parallel_parser_calls |
Number of parallel records to parse in parallel. Defaults to an automatic selection. |
drop_final_batch |
Whether the last batch should be dropped in case its
size is smaller than |
prefetch_buffer_size |
An integer specifying the number of feature batches to prefetch for performance improvement. Defaults to auto-tune. Set to 0 to disable prefetching. |
MNISTImageDataset
.This creates a dataset for MNIST images.
mnist_image_dataset(filenames, compression_type = NULL)
mnist_image_dataset(filenames, compression_type = NULL)
filenames |
A |
compression_type |
A |
MNISTLabelDataset
.This creates a dataset for MNIST labels.
mnist_label_dataset(filenames, compression_type = NULL)
mnist_label_dataset(filenames, compression_type = NULL)
filenames |
A |
compression_type |
A |
ParquetDataset
.This allows a user to read data from a parquet file.
parquet_dataset(filenames, columns, output_types)
parquet_dataset(filenames, columns, output_types)
filenames |
A 0-D or 1-D |
columns |
A 0-D or 1-D |
output_types |
A tuple of |
## Not run: dtypes <- tf$python$framework$dtypes output_types <- reticulate::tuple( dtypes$bool, dtypes$int32, dtypes$int64, dtypes$float32, dtypes$float64) dataset <- parquet_dataset( filenames = list("testdata/parquet_cpp_example.parquet"), columns = list(0, 1, 2, 4, 5), output_types = output_types) %>% dataset_repeat(2) sess <- tf$Session() iterator <- make_iterator_one_shot(dataset) next_batch <- iterator_get_next(iterator) until_out_of_range({ batch <- sess$run(next_batch) print(batch) }) ## End(Not run)
## Not run: dtypes <- tf$python$framework$dtypes output_types <- reticulate::tuple( dtypes$bool, dtypes$int32, dtypes$int64, dtypes$float32, dtypes$float64) dataset <- parquet_dataset( filenames = list("testdata/parquet_cpp_example.parquet"), columns = list(0, 1, 2, 4, 5), output_types = output_types) %>% dataset_repeat(2) sess <- tf$Session() iterator <- make_iterator_one_shot(dataset) next_batch <- iterator_get_next(iterator) until_out_of_range({ batch <- sess$run(next_batch) print(batch) }) ## End(Not run)
PubSubDataset
.This creates a dataset for consuming PubSub messages.
pubsub_dataset(subscriptions, server = NULL, eof = FALSE, timeout = 1000)
pubsub_dataset(subscriptions, server = NULL, eof = FALSE, timeout = 1000)
subscriptions |
A |
server |
The pubsub server. |
eof |
If True, the pubsub reader will stop on EOF. |
timeout |
The timeout value for the PubSub to wait (in millisecond). |
SequenceFileDataset
.This function allows a user to read data from a hadoop sequence
file. A sequence file consists of (key value) pairs sequentially. At
the moment, org.apache.hadoop.io.Text
is the only serialization type
being supported, and there is no compression support.
sequence_file_dataset(filenames)
sequence_file_dataset(filenames)
filenames |
A |
## Not run: dataset <- sequence_file_dataset("testdata/string.seq") %>% dataset_repeat(1) sess <- tf$Session() iterator <- make_iterator_one_shot(dataset) next_batch <- iterator_get_next(iterator) until_out_of_range({ batch <- sess$run(next_batch) print(batch) }) ## End(Not run)
## Not run: dataset <- sequence_file_dataset("testdata/string.seq") %>% dataset_repeat(1) sess <- tf$Session() iterator <- make_iterator_one_shot(dataset) next_batch <- iterator_get_next(iterator) until_out_of_range({ batch <- sess$run(next_batch) print(batch) }) ## End(Not run)
This library provides an R interface to the TensorFlow IO API that provides datasets and filesystem extensions maintained by SIG-IO.
TIFFDataset
.A TIFF Image File Dataset that reads the TIFF file.
tiff_dataset(filenames)
tiff_dataset(filenames)
filenames |
A |
## Not run: dataset <- tiff_dataset( filenames = list("testdata/small.tiff")) %>% dataset_repeat(1) sess <- tf$Session() iterator <- make_iterator_one_shot(dataset) next_batch <- iterator_get_next(iterator) until_out_of_range({ batch <- sess$run(next_batch) print(batch) }) ## End(Not run)
## Not run: dataset <- tiff_dataset( filenames = list("testdata/small.tiff")) %>% dataset_repeat(1) sess <- tf$Session() iterator <- make_iterator_one_shot(dataset) next_batch <- iterator_get_next(iterator) until_out_of_range({ batch <- sess$run(next_batch) print(batch) }) ## End(Not run)
VideoDataset
that reads the video file.This allows a user to read data from a video file with ffmpeg. The output of VideoDataset is a sequence of (height, weight, 3) tensor in rgb24 format.
video_dataset(filenames)
video_dataset(filenames)
filenames |
A |
## Not run: dataset <- video_dataset( filenames = list("testdata/small.mp4")) %>% dataset_repeat(2) sess <- tf$Session() iterator <- make_iterator_one_shot(dataset) next_batch <- iterator_get_next(iterator) until_out_of_range({ batch <- sess$run(next_batch) print(batch) }) ## End(Not run)
## Not run: dataset <- video_dataset( filenames = list("testdata/small.mp4")) %>% dataset_repeat(2) sess <- tf$Session() iterator <- make_iterator_one_shot(dataset) next_batch <- iterator_get_next(iterator) until_out_of_range({ batch <- sess$run(next_batch) print(batch) }) ## End(Not run)
WebPDataset
.A WebP Image File Dataset that reads the WebP file.
webp_dataset(filenames)
webp_dataset(filenames)
filenames |
A |
## Not run: dataset <- webp_dataset( filenames = list("testdata/sample.webp")) %>% dataset_repeat(1) sess <- tf$Session() iterator <- make_iterator_one_shot(dataset) next_batch <- iterator_get_next(iterator) until_out_of_range({ batch <- sess$run(next_batch) print(batch) }) ## End(Not run)
## Not run: dataset <- webp_dataset( filenames = list("testdata/sample.webp")) %>% dataset_repeat(1) sess <- tf$Session() iterator <- make_iterator_one_shot(dataset) next_batch <- iterator_get_next(iterator) until_out_of_range({ batch <- sess$run(next_batch) print(batch) }) ## End(Not run)