I made a simple stock server that randomly updates stock prices and an Apache Spark program to process these results. The code is on github here github/babsher/spark-stocks.
To run the stock server execute the following:
sbt "run-main StockServer 9999 IBM:142 MS:64 AAPL:345"
Then run the VWAP calculation:sbt "run-main StockStreamingMain"
The key part of the calculation is a simple weighted average implemented by a flatmap and a reduceByKey.
stocks.window(Seconds(30)).
flatMap(_.map(u => {
(u.name, (u.vol * u.value, u.vol))
}))
.reduceByKey((e1, e2) => (e1._1 + e2._1, e1._2 + e2._2))
.map{case (stock, (price, vol)) => (stock, price / vol)}
First I used a flatmap to take the RDD of Seq of Stocks and map that to a sequence of the stock symbol and the first weight calculation. Then I used a reduceByKey to sum the numerator and denominator of the average for each stock. The last map calculates the VWAP for each symbol.If you are feeling adventurous you can set this up in Apache Zeppelin.
