StockIO: Real-Time Stock Market Data Streaming and Analysis with Kafka and AWS

StockIO is a real-time data streaming solution designed to process and analyze stock market data using Apache Kafka and AWS services.

This project is tested with BrowserStack

Project Overview

StockIO is a real-time streaming application that simulates stock market data and processes it using Apache Kafka and various AWS services.

Hosted on AWS EC2
The processed data is stored in Amazon S3
Analyzed using AWS Glue and Amazon Athena.

Data:

Ensure the stock market dataset is available:
- Make sure you have access to the Kaggle Stock Market Dataset with the following features:
- Dataset is given in /data folder.
  - Index
  - Date
  - Open
  - High
  - Low
  - Close
  - Adj Close
  - Volume
  - CloseUSD
Implement the sleep function:
- To simulate the real-time data flow into Kafka, the producer script should include a sleep function. This will introduce delays between sending each data entry, mimicking real-time data streaming.
Execute the producer script:
- Run the script that sends data to the Kafka topic, with the sleep function applied.
Execute the consumer script:
- Run the script that reads data from the Kafka topic and stores it in S3.

Architecture

The project architecture is designed to handle real-time stock market data and process it efficiently using the following components:

Producer: Simulates stock market data and sends it to a Kafka topic.
Kafka: Acts as the message broker to handle the stream of data.
Consumer: Reads data from the Kafka topic and stores it in Amazon S3.
AWS S3: Stores the processed stock market data.
AWS Glue: Crawls the data in S3 to create a metadata catalog.
Amazon Athena: Queries and analyzes the data stored in S3.

How to Run the Project

Set up Kafka on AWS EC2

Launch an EC2 instance and install Kafka:
- Follow the instructions provided by Kafka to install it on your EC2 instance.
Start the Kafka server:
- Use the command to start the Kafka server, usually something like bin/kafka-server-start.sh config/server.properties.

Run the Producer

Ensure the stock market dataset is available:
- Make sure you have access to the dataset required for the producer script.
Execute the producer script:
- Run the script that sends data to the Kafka topic.

Run the Consumer

Execute the consumer script:
- Run the script that reads data from the Kafka topic and stores it in S3.

Set up AWS Glue

Create a Glue crawler:
- Configure the Glue crawler to crawl the S3 bucket and create a metadata catalog.

Query Data with Amazon Athena

Use Athena to query the data:
- Utilize Amazon Athena to run queries on the data stored in S3.

Dependencies

pandas
kafka-python
s3fs
boto3

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.streamlit		.streamlit
__pycache__		__pycache__
data		data
notebooks		notebooks
output		output
product		product
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StockIO: Real-Time Stock Market Data Streaming and Analysis with Kafka and AWS

Project Overview

Data:

Architecture

How to Run the Project

Set up Kafka on AWS EC2

Run the Producer

Run the Consumer

Set up AWS Glue

Query Data with Amazon Athena

Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

StockIO: Real-Time Stock Market Data Streaming and Analysis with Kafka and AWS

Project Overview

Data:

Architecture

How to Run the Project

Set up Kafka on AWS EC2

Run the Producer

Run the Consumer

Set up AWS Glue

Query Data with Amazon Athena

Dependencies

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages