Extract Columns from html using Python (Beautifulsoup) -


i need extract info page -http://www.investing.com/currencies/usd-brl-historical-data. need date, price, open, high, low,change %. i`m new python got stuck @ step:

import requests bs4 import beautifulsoup datetime import datetime  url='http://www.investing.com/currencies/usd-brl-historical-data' r = requests.get(url)  soup=beautifulsoup(r.content,'lxml')  g_data = soup.find_all('table', {'class':'gentbl closedtbl historicaltbl'})  d=[]  item in g_data: table_values = item.find_all('tr') n=len(table_values)-1  n in range(n):     k = (item.find_all('td', {'class':'first left bold nowrap'})[n].text)      print(item.find_all('td', {'class':'first left bold nowrap'})[n].text) 

here have several problems:

column price can de tagged or . how can specify want items tagged class = 'redfont' or/and 'greenfont'?. change % can have class redfont , greenfont. other columns tagged . how can extract them?

is there way extract columns table?

ideally have dateframe columns date, price, open, high, low,change %.

thanks

how parse table site have answered here since want dataframe, use pandas.read_html

url = 'http://www.investing.com/currencies/usd-brl-historical-data' r = requests.get(url)   import pandas pd  df = pd.read_html(r.content,attrs = {'id': 'curr_table'})[0] 

which give you:

            date   price    open    high     low change % 0   jun 08, 2016  3.3609  3.4411  3.4465  3.3584   -2.36% 1   jun 07, 2016  3.4421  3.4885  3.5141  3.4401   -1.36% 2   jun 06, 2016  3.4896  3.5265  3.5295  3.4840   -1.09% 3   jun 05, 2016  3.5280  3.5280  3.5280  3.5280    0.11% 4   jun 03, 2016  3.5240  3.5910  3.5947  3.5212   -1.91% 5   jun 02, 2016  3.5926  3.6005  3.6157  3.5765   -0.22% 6   jun 01, 2016  3.6007  3.6080  3.6363  3.5755   -0.29% 7   may 31, 2016  3.6111  3.5700  3.6383  3.5534    1.11% 8   may 30, 2016  3.5713  3.6110  3.6167  3.5675   -1.11% 9   may 27, 2016  3.6115  3.5824  3.6303  3.5792    0.81% 10  may 26, 2016  3.5825  3.5826  3.5857  3.5757   -0.03% 11  may 25, 2016  3.5836  3.5702  3.6218  3.5511    0.34% 12  may 24, 2016  3.5713  3.5717  3.5903  3.5417   -0.04% 13  may 23, 2016  3.5728  3.5195  3.5894  3.5121    1.49% 14  may 20, 2016  3.5202  3.5633  3.5663  3.5154   -1.24% 15  may 19, 2016  3.5644  3.5668  3.6197  3.5503   -0.11% 16  may 18, 2016  3.5683  3.4877  3.5703  3.4854    2.28% 17  may 17, 2016  3.4888  3.4990  3.5300  3.4812   -0.32% 18  may 16, 2016  3.5001  3.5309  3.5366  3.4944   -0.96% 19  may 13, 2016  3.5340  3.4845  3.5345  3.4630    1.39% 20  may 12, 2016  3.4855  3.4514  3.5068  3.4346    0.95% 21  may 11, 2016  3.4528  3.4755  3.4835  3.4389   -0.66% 22  may 10, 2016  3.4758  3.5155  3.5173  3.4623   -1.15% 23  may 09, 2016  3.5164  3.5010  3.6766  3.4906    0.40% 

you can pass url directly 403 error particular site using urllib2 lib used read_html need use requests html.


Comments

Popular posts from this blog

sequelize.js - Sequelize group by with association includes id -

java - Android raising EPERM (Operation not permitted) when attempting to send UDP packet after network connection -

c++ - Migration from QScriptEngine to QJSEngine -