This is the R interface to datasets and filesystem extensions maintained by SIG-IO. Some example data sources that TensorFlow I/O supports are:
We provide a reference Dockerfile here for you so that you can use the R package directly for testing. You can build it via:
docker build -t tfio-r-dev -f R-package/scripts/Dockerfile .
Inside the container, you can start your R session, instantiate a
SequenceFileDataset
from an example Hadoop
SequenceFile string.seq,
and then use any transformation
functions provided by tfdatasets
package on the dataset like the following:
library(tfio)
dataset <- sequence_file_dataset("R-package/tests/testthat/testdata/string.seq") %>%
dataset_repeat(2)
sess <- tf$Session()
iterator <- make_iterator_one_shot(dataset)
next_batch <- iterator_get_next(iterator)
until_out_of_range({
batch <- sess$run(next_batch)
print(batch)
})
You’ll see the key-value pairs from string.seq
file are
printed as follows:
[1] "001" "VALUE001"
[1] "002" "VALUE002"
[1] "003" "VALUE003"
[1] "004" "VALUE004"
[1] "005" "VALUE005"
[1] "006" "VALUE006"
[1] "007" "VALUE007"
[1] "008" "VALUE008"
...
[1] "020" "VALUE020"
[1] "021" "VALUE021"
[1] "022" "VALUE022"
[1] "023" "VALUE023"
[1] "024" "VALUE024"
[1] "025" "VALUE025"