2017年1月12日 星期四

[python] 抓取台灣上市櫃公司股票資訊

以前看到電視的股票分析師在電視上報股票, 老是遮遮掩掩的, 那時就用MFC寫程式從yahoo抓資料, 找出分析說的是那隻, 研判是要炒作還是出貨, 後來也寫過群益API的期貨交易程式. 程式功能要全部自己來, 要花很多時間. 近來python很紅, 不是語言本身很強, 而是有太多已經ready的package可用, 現在甚至在連手機上都已經可以寫python了, 想想以後未必會在windows上做這件事,所以想轉換到python上, 逐步建立自己的工具程式. 第一步當然是先抓到股名代碼名稱, 這從證券交易所抓每日收盤行情就可以了. 因為yahoo有提供全世界股票的資料, 如果要抓個股, 從yahoo抓比較方便, 就不用每天抓收盤行情存資料庫了




先弄個抓取每日收盤行情的程式, 以後再慢慢變化


#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import csv
import numpy as np
import datetime as dt
import pandas as pd
from datetime import timedelta
import httplib2
from urllib.parse import urlencode

def twdate(date):
    year  = date.year-1911
    month = date.month
    day   = date.day
    twday = '{}/{:02}/{:02}'.format(year,month,day)
    return twday
   
def downloadTWSE(date):   
    url="http://www.twse.com.tw/ch/trading/exchange/MI_INDEX/MI_INDEX.php"
    values = {'download' : 'csv', 'qdate' : twdate(date), 'selectType' : 'ALLBUT0999' }       
   
    agent = 'Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0'
    #httplib2.debuglevel = 1
    conn = httplib2.Http('.cache')
    headers = {'Content-type': 'application/x-www-form-urlencoded',
               'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
               'User-Agent': agent}
    resp, content = conn.request(url, 'POST', urlencode(values), headers)
    respStr = str(content.decode('cp950'));     
   
    return respStr
   
def downloadOTC(date):      
    url='http://www.tpex.org.tw/web/stock/aftertrading/otc_quotes_no1430/stk_wn1430_print.php?l=zh-tw&d='+twdate(date)+'&se=EW'
    table = pd.read_html(url)[0]
    rowCount = table.values.shape[0]-1;
    return table.values[:rowCount]
   

def showStock(stockID, stockName, Open, High, Low, Close,Volume):
    showLen=8
    print('\nTWSE count=',len(stockID))
    print('ID:',stockID[:showLen])
    print('Name:',stockName[:showLen])
    print('Open:',Open[:showLen])
    print('High:',High[:showLen])
    print('Low:',Low[:showLen])
    print('Close:',Close[:showLen])
    print('Volume:',Volume[:showLen])
   
   
#main
downloadDate= dt.date.today() #- timedelta(days=1)
# download TWSE  
strCSV = downloadTWSE(downloadDate) 
srcCSV = list(csv.reader(strCSV.split('\n'), delimiter=','))

#search stock list
firstIndex=0
lastIndex=0
for i in range(len(srcCSV)):   
    row = srcCSV[i]
    if (len(row)>15):  #16 columns
        row[0]=row[0].strip(' =\"')
        row[1]=row[1].strip(' =\"')
        if (row[0]=="0050"):
            firstIndex=i       
        elif (row[0]=="9958"):
            lastIndex=i+1    
            break
            
#get result           
result = np.array(srcCSV[firstIndex:lastIndex])
stockID=result[:,0]
stockName=result[:,1]
Open=result[:,5]
High=result[:,6]
Low=result[:,7]
Close=result[:,8]
Volume=result[:,2]
print('\nTWSE count=',len(stockID))
showStock(stockID, stockName, Open, High, Low, Close,Volume)


#download OTC
result = np.array(downloadOTC(downloadDate))
stockID=result[:,0]
stockName=result[:,1]
Open=result[:,4]
High=result[:,5]
Low=result[:,6]
Close=result[:,2]
Volume=result[:,7]
print('\nOTC count=',len(stockID))
showStock(stockID, stockName, Open, High, Low, Close,Volume)
 




沒有留言:

張貼留言