The purpose of this repository is two-fold:
-
Once you have a list of tickers you are interested in, in a text file line by line, save them in main directory as 'tikz.csv'.
-
Open run.py
-
Adjust the timeframe to desired length. All stocks on list must be active during this timeframe or an error will result.
-
Set location variable, location='some_descriptive_name.h5'. This must be a .h5 file.
-
Save in-file changes and compile and a single HDF5 file will be generated with a dataset for each ticker containing daily Open, Close, High, Volume data for each date within the specified timeframe. The datasets within the file generated is extractable as a Pandas dataframe.
-
The .h5 files in /data folder can be briefly inspected with inspecth5.py (filename must be specified in the file) and compiling will yield a simple UI. Although you will be able to explore better with an HDF5 viewer.
This is part of a class project for a Machine Learning course and all generated data will be kept in data/ subfolder.
I will try to be as desriptive as possible in naming the datasets and eventually, it is my intention, for it to hold some pretty complete HDF5 files with Groups, Subgroups and Metadata.
- pandas
- pandas_datareader
- tables
- fix_yahoo_finance
- csv
- tables
- h5py (for inspecth5.py and h5_to_csv.py)