Extract Columns from html using Python (Beautifulsoup) -
i need extract info page -http://www.investing.com/currencies/usd-brl-historical-data. need date, price, open, high, low,change %. i`m new python got stuck @ step:
import requests bs4 import beautifulsoup datetime import datetime url='http://www.investing.com/currencies/usd-brl-historical-data' r = requests.get(url) soup=beautifulsoup(r.content,'lxml') g_data = soup.find_all('table', {'class':'gentbl closedtbl historicaltbl'}) d=[] item in g_data: table_values = item.find_all('tr') n=len(table_values)-1 n in range(n): k = (item.find_all('td', {'class':'first left bold nowrap'})[n].text) print(item.find_all('td', {'class':'first left bold nowrap'})[n].text)
here have several problems:
column price can de tagged or . how can specify want items tagged class = 'redfont' or/and 'greenfont'?. change % can have class redfont , greenfont. other columns tagged . how can extract them?
is there way extract columns table?
ideally have dateframe columns date, price, open, high, low,change %.
thanks
how parse table site have answered here since want dataframe, use pandas.read_html
url = 'http://www.investing.com/currencies/usd-brl-historical-data' r = requests.get(url) import pandas pd df = pd.read_html(r.content,attrs = {'id': 'curr_table'})[0]
which give you:
date price open high low change % 0 jun 08, 2016 3.3609 3.4411 3.4465 3.3584 -2.36% 1 jun 07, 2016 3.4421 3.4885 3.5141 3.4401 -1.36% 2 jun 06, 2016 3.4896 3.5265 3.5295 3.4840 -1.09% 3 jun 05, 2016 3.5280 3.5280 3.5280 3.5280 0.11% 4 jun 03, 2016 3.5240 3.5910 3.5947 3.5212 -1.91% 5 jun 02, 2016 3.5926 3.6005 3.6157 3.5765 -0.22% 6 jun 01, 2016 3.6007 3.6080 3.6363 3.5755 -0.29% 7 may 31, 2016 3.6111 3.5700 3.6383 3.5534 1.11% 8 may 30, 2016 3.5713 3.6110 3.6167 3.5675 -1.11% 9 may 27, 2016 3.6115 3.5824 3.6303 3.5792 0.81% 10 may 26, 2016 3.5825 3.5826 3.5857 3.5757 -0.03% 11 may 25, 2016 3.5836 3.5702 3.6218 3.5511 0.34% 12 may 24, 2016 3.5713 3.5717 3.5903 3.5417 -0.04% 13 may 23, 2016 3.5728 3.5195 3.5894 3.5121 1.49% 14 may 20, 2016 3.5202 3.5633 3.5663 3.5154 -1.24% 15 may 19, 2016 3.5644 3.5668 3.6197 3.5503 -0.11% 16 may 18, 2016 3.5683 3.4877 3.5703 3.4854 2.28% 17 may 17, 2016 3.4888 3.4990 3.5300 3.4812 -0.32% 18 may 16, 2016 3.5001 3.5309 3.5366 3.4944 -0.96% 19 may 13, 2016 3.5340 3.4845 3.5345 3.4630 1.39% 20 may 12, 2016 3.4855 3.4514 3.5068 3.4346 0.95% 21 may 11, 2016 3.4528 3.4755 3.4835 3.4389 -0.66% 22 may 10, 2016 3.4758 3.5155 3.5173 3.4623 -1.15% 23 may 09, 2016 3.5164 3.5010 3.6766 3.4906 0.40%
you can pass url directly 403 error particular site using urllib2 lib used read_html need use requests html.
Comments
Post a Comment