python - theano csv to pkl file -
i trying make pkl file loaded theano csv starting point
import numpy np import csv import gzip, cpickle numpy import genfromtxt import theano import theano.tensor t #open csv file , read in data csvfile = "filename.csv" my_data = genfromtxt(csvfile, delimiter=',', skip_header=1) data_shape = "there " + repr(my_data.shape[0]) + " samples of vector length " + repr(my_data.shape[1]) num_rows = my_data.shape[0] # number of data samples num_cols = my_data.shape[1] # length of data vector total_size = (num_cols-1) * num_rows data = np.arange(total_size) data = data.reshape(num_rows, num_cols-1) # 2d matrix of data points data = data.astype('float32') label = np.arange(num_rows) print label.shape #label = label.reshape(num_rows, 1) # 2d matrix of data points label = label.astype('float32') print data.shape #read through data file, assume label in last col in range(my_data.shape[0]): label[i] = my_data[i][num_cols-1] j in range(num_cols-1): data[i][j] = my_data[i][j] #split data in terms of 70% train, 10% val, 20% test train_num = int(num_rows * 0.7) val_num = int(num_rows * 0.1) test_num = int(num_rows * 0.2) datasetstate = "this dataset has " + repr(data.shape[0]) + " samples of length " + repr(data.shape[1]) + ". number of training examples " + repr(train_num) print datasetstate train_set_x = data[:train_num] train_set_y = label[:train_num] val_set_x = data[train_num+1:train_num+val_num] val_set_y = label[train_num+1:train_num+val_num] test_set_x = data[train_num+val_num+1:] test_set_y = label[train_num+val_num+1:] # divided dataset 3 parts. split percentage. train_set = train_set_x, train_set_y val_set = val_set_x, val_set_y test_set = test_set_x, val_set_y dataset = [train_set, val_set, test_set] f = gzip.open(csvfile+'.pkl.gz','wb') cpickle.dump(dataset, f, protocol=2) f.close()
when run resulting pkl file through thenao, (as dbn or sda) pretrains fine, makes me think data stored correctly.
however when comes finetune following error:
epoch 1, minibatch 2775/2775, validation error 0.000000 % traceback (most recent call last): file "sda_custom.py", line 489, in test_sda() file "sda_custom.py", line 463, in test_sda test_losses = test_model() file "sda_custom.py", line 321, in test_score return [test_score_i(i) in xrange(n_test_batches)] file "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 606, in __call__ storage_map=self.fn.storage_map) file "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 595, in __call__ outputs = self.fn() valueerror: input dimension mis-match. (input[0].shape[0] = 10, input[1].shape[0] = 3) apply node caused error: elemwise{neq,no_inplace}(argmax, subtensor{int64:int64:}.0) inputs types: [tensortype(int64, vector), tensortype(int32, vector)] inputs shapes: [(10,), (3,)] inputs strides: [(8,), (4,)] inputs values: ['not shown', array([0, 0, 0], dtype=int32)] backtrace when node created: file "/home/dean/documents/deeplearningrepo/deeplearningtutorials-master/code/logistic_sgd.py", line 164, in errors return t.mean(t.neq(self.y_pred, y)) hint: use theano flag 'exception_verbosity=high' debugprint , storage map footprint of apply node.
10 size of batch, if change batch size of 1 following:
valueerror: input dimension mis-match. (input[0].shape[0] = 1, input[1].shape[0] = 0)
i think storing labels wrong when make pkl, can't seem spot happening or why changing batch alters error
hope can help!
saw looking similar error getting. posting reply might looking similar error. me error resolved when changed n_out 2 1 in dbn_test() parameter list. n_out number of labels rather number of output layers.
Comments
Post a Comment