In this post we discuss how we are going to download intraday High Frequency Trading data using Python. Python is a powerful scripting language. Python is now extensively being used by the big banks, hedge funds and other big institutions for machine learning, artificial intelligence, statistical analysis and algorithmic trading. Did you check our Million Dollar Trading Challenge? High Frequency Trading is another name for intraday trading. Intraday trading is done on M1, M5, M15, M30, H1 and H4 timeframes. Daily and weekly timeframes are low frequency timeframes.
Price feeds are expensive. Most price feed vendors charge $50-$100 per month for providing price feed services. But we don’t need to pay this monthly cost. In this post we give you the list of a few places online from where you can download your price feeds. Quantitative trading is on the rise. Today more than 60% of the trades on NYSE are being placed by algorithms. Algorithmic trading is the future now. If you are a manual trader, in the next few years you will be competing more and more with these algorithmic trading systems. Did you check our Million Dollar Trading Challenge II?
Until and unless we don’t have good intraday high frequency trading (HFT) data with us we cannot use it to build quantitative predictive models. The idea is to use the HFT data and find patterns in it for making trading decisions. First we build models for testing. Once we have a model that has a high predictive accuracy during testing, we can build a prototype trading system and test it in live trading. Python is a good prototyping language. Once we have a good system we can then switch to C++, C# or Java and build a more robust model that is very fast. So first we need to build our model. Building a good model requires high quality HFT data.
I started off as a manual trader. I use candlestick patterns in my trading system a lot. Naked trading means trading solely based on price action. Read this post on how to do naked trading. The idea is to build a quantitative trading model that supplements the manual trading system and makes it more accurate. The idea is to build quantitative trading models based on our manual trading strategies. We will download the data from Google Finance and Yahoo Finance. Of course this will be historical data. Historical data serves our purpose of building quantitative models. The following is the url for downloading data from Google Finance:
http://www.google.com/finance/getprices?i=[PERIOD]&p=[DAYS]d&f=d,o,h,l,c,v&df=cpct&q=[TICKER]
In the above url, PERIOD is the HFT data time interval. It should be in seconds. For example if you want to download 1 minute data we will use 60 for PERIOD. 1 minute is also the lowest time interval .TICKER is the ticker symbol. For example it can be MSFT for Microsoft, AAPL for Apple etc. DAYS is the number of days HFT data that you want. Now keep this in mind that the data from Google Finance is delayed and is not live data. You cannot use it for live trading. But of course as said above you can use the data for quantitative model building. So this data serves our purpose. Now there are a few things about this data. The time is in unix format. We will need to convert that into the standard time format. Sounds complicated? Not really if you know how to make the conversion. I will show you how to make the conversion. We download AAPL stock data. AAPL is the ticker symbol for Apple stock. Apple stock is pretty popular with day traders. We will download the 1 minute AAPL data from Google Finance.
#import the libraries import pandas as pd #download tick data for AAPL stock data = pd.read_csv("http://www.google.com/finance/getprices?q=AAPL&i=300&p=10d&f=d,o,h,l,c,v", skiprows=8, header=None) data.head()
This is the output!
data.head()
3 116.92 117.03 116.78 117 466878
0 4 117.0360 117.0765 116.8700 116.9200 402779
1 5 117.1573 117.1900 116.8900 117.0400 457673
2 6 117.1100 117.1850 117.0600 117.1500 311011
3 7 117.1100 117.1600 117.0000 117.1100 291641
4 8 117.1500 117.1600 117.0599 117.1045 246471
Now adding timestamp in our known format means we will need to convert the data from Unix format to the usual format that we are accustomed to that includes day, hour, minutes.
#add a timestamp to the intraday AAPL data import pandas as pd, numpy as np, datetime x=np.array(pd.read_csv("http://www.google.com/finance/getprices?q=AAPL&i=300&p=10d&f=d,o,h,l,c,v",skiprows=7,header=None)) date=[] for i in range(0,len(x)): if x[i][0][0]=='a': t= datetime.datetime.fromtimestamp(int(x[i][0].replace('a',''))) date.append(t) else: date.append(t+datetime.timedelta(minutes =int(x[i][0]))) data1=pd.DataFrame(x,index=date) data1.columns=['a','Open','High','Low','Close','Vol'] data1.head() data1.tail()
When you use the above code make sure you indent the for loop and the if else statement. This is a must in python otherwise you will get a traceback error. Below is the output:
>>> data1=pd.DataFrame(x,index=date)
a Open High Low Close Vol
2016-12-21 19:30:00 a1482330600 116.84 116.84 116.8 116.8 225329
2016-12-21 19:31:00 1 117.27 117.35 116.8 116.84 633227
2016-12-21 19:32:00 2 117 117.35 117 117.28 400675
2016-12-21 19:33:00 3 116.92 117.03 116.78 117 466878
2016-12-21 19:34:00 4 117.036 117.076 116.87 116.92 402779>>>
data1.columns=[‘a’,’Open’,’High’,’Low’,’Close’,’Vol’]
a Open High Low Close Vol
2017-01-04 06:20:00 650 116.695 116.7 116.61 116.64 240149
2017-01-04 06:21:00 651 116.625 116.72 116.62 116.695 243736
2017-01-04 06:22:00 652 116.745 116.75 116.62 116.62 316471
2017-01-04 06:23:00 653 116.61 116.76 116.6 116.745 402803
2017-01-04 06:24:00 654 116.64 116.74 116.58 116.6 1839289>>>
data1.head()
>>> data1.tail()
In the above data you can see the original time format which is a1482330600. We have converted it to the regular format. In the next post I am going to show you how you are going to download google finance intraday in real time.
In the above chart, there are holidays so you can see straight lines. Whatever if you compare python with R, I would say R is much simpler when it comes to dealing with financial time series data. Python is a bit difficult when it comes to dealing with financial time series data. In the above code we have change the Unix time format to the standard time format. Whatever you can use the above method to download intraday stock data absolutely free.