Guide to RADAR HDFS Connector
An adapted Connector to write stream data to HDFS, using Confluent HDFS Connector with an adapted AvroFormat to write both key and value data.
Using HDFS Connector with Custom Format
Adapted HDFS Connector should be added to the $CLASSPATH
to be used. Add radarbackend.jar
to $CLASSPATH
and follow the configurations as mentioned in User Guide of HDFS Connector.
Sample Configuration
name=radar-hdfs-sink connector.class=io.confluent.connect.hdfs.HdfsSinkConnector tasks.max=1 topics=mock_empatica_e4_battery_level,mock_empatica_e4_blood_volume_pulse flush.size=1200 hdfs.url=hdfs://localhost:9000 format.class=org.radarcns.sink.HDFS.AvroFormatRadar
To execute connector in standalone mode
connect-standalone /etc/schema-registry/connect-avro-standalone.properties path-to-your-hdfs-connector-configuration.properties
Extracting data from HDFS to local file system
By default the data is written to /topics/
directory. (Can be configured using topics.dir
property)
To extract data from HDFS
Create a directory to collect written data
sudo mkdir <dir-name>
Change current directory of command line to that directory
cd <dir-name>
Extract data from HDFS
hadoop fs -get /topics
Aggregating collected data
Download avro-tools-1.7.7.jar.
Create a radar-data-extract.sh
script as below
#!/bin/bash #Save current directory cur=$PWD #Save command line arguments so functions can access it args=("$@") #To access command line arguments use syntax ${args[1]} etc function dir_command { #This example command implements doing git status for folder cd "/partition=0" echo $PWD java -jar /path-to/avro-tools-1.7.7.jar concat $(ls) "../_full.avro" java -jar /path-to/avro-tools-1.7.7.jar tojson "../_full.avro" >>"../_full.json" } #This loop will go to each immediate child and execute dir_command find . -maxdepth 1 -type d \( ! -name . \) | while read dir; do dir_command "$dir" cd "$cur" done
Navigate to /topics
directory. Execute radar-data-extract.sh
from /topics
directory.