我试图为.csv文件中的数据绘制直方图。但是当我运行它时,它非常慢。我等了20分钟,但仍然无法得到情节。请问那是问题吗?
以下几行是我的代码。
import pandas as pd
import matplotlib.pyplot as plt
spy = pd.read_csv( 'SPY.csv' )
stock_price_spy = spy.values[ :, 5 ]
n, bins, patches = plt.hist( stock_price_spy, 50 )
plt.show()
我做了以下,似乎这可以解决问题。
似乎“stock_price_spy = spy ['Adj Close'] .values”给出了一个真正的numpy ndarray。
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
spy = pd.read_csv( 'SPY.csv' )
stock_price_spy = spy[ 'Adj Close' ].values
plt.hist( stock_price_spy, bins = 100, label = 'S&P 500 ETF', alpha = 0.8 )
plt.show()
事实上,你正在使用一种非常不足的方式来实现你的目标,你需要使用numpy来提高性能。
import numpy as np
import matplotlib.pyplot as plt
stock_price_spy = np.loadtxt('SPY.csv', dtype=float, delimiter=',', skiprows=1, usecols=4)
#here you have nothing else than the 5th column of your csv, this cuts the bottleneck in memory.
n, bins, patches = plt.hist( stock_price_spy, 50 )
plt.show()
我没有测试它,但它应该工作。
我建议你使用英特尔的优化版python。管理这种流程更好。 Intel python distribution
import numpy as np
import pandas as pd
import random
import csv
import matplotlib.pyplot as plt
import time
#Creating a random csv file 6 x 4871, simulating the problem.
rows = 4871
columns = 6
fields = ['one', 'two', 'three', 'four', 'five', 'six']
write_a_csv = csv.DictWriter(open("random.csv", "w"),
fieldnames=fields)
for i in range(0, rows):
write_a_csv.writerow(dict([
('one', random.random()),
('two', random.random()),
('three', random.random()),
('four', random.random()),
('five', random.random()),
('six', random.random())
]))
start_old = time.clock()
spy = pd.read_csv( 'random.csv' )
print(type(spy))
stock_price_spy = spy.values[ :, 5 ]
n, bins, patches = plt.hist( stock_price_spy, 50 )
plt.show()
end_old = time.clock()
total_time_old = end_old - start_old
print(total_time_old)
start_new = time.clock()
stock_price_spy_new = np.loadtxt('random.csv', dtype=float,
delimiter=',', skiprows=1, usecols=4)
print(type(stock_price_spy_new))
#here you have nothing else than the 5th column of your csv, this cuts the bottleneck in memory.
n, bins, patches = plt.hist( stock_price_spy_new, 50 )
plt.show()
end_new = time.clock()
total_time_new = end_new - start_new
print(total_time_new)