Stack Overflow is a community of 4.7 million programmers, just like you, helping each other.

Join them; it only takes a minute:

Sign up
Join the Stack Overflow community to:
  1. Ask programming questions
  2. Answer and help your peers
  3. Get recognized for your expertise

I have one simple data set with class label and stored as "mydata.csv",

GA_ID   PN_ID   PC_ID   MBP_ID  GR_ID   AP_ID   class
0.033   6.652   6.681   0.194   0.874   3.177     0
0.034   9.039   6.224   0.194   1.137   3.177     0
0.035   10.936  10.304  1.015   0.911   4.9       1
0.022   10.11   9.603   1.374   0.848   4.566     1

i simply use given code to convert this data into numpy array so that i can use this data set for predictions and machine learning modeling but due to header is error has been raised "ValueError: could not convert string to float: " when i removed header from the file this method work well for me :

import numpy as np
#from sklearn import metrics
#from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

raw_data = open("/home/me/Desktop/scklearn/data.csv")
dataset = np.loadtxt(raw_data, delimiter=",")
X = dataset[:,0:5]
y = dataset[:,6]

i also tried to skip header but error occurs:

dataset = np.loadtxt(raw_data, delimiter=",")[1:]

then i moved to pandas and able import data from this method:

raw_data = pandas.read_csv("/home/me/Desktop/scklearn/data.csv")

but here I sucked again when i tried to convert this into numpy array its showing error like previous.

is there any method available in pandas that can : save heathers as list :

header_list = ('GA_ID','PN_ID','PC_ID' ,'MBP_ID' ,'GR_ID' , 'AP_ID','class')

last column as class label and remaining part(1:4,0:5) to numpy array for model building:

I have write down a code to get column list

clm_list = []
raw_data = pandas.read_csv("/home/me/Desktop/scklearn/data.csv")
clms = raw_data.columns()
for clm in clms:
    clm_list.append(clm)
print clm_list ## produces column list
share|improve this question
    
Unclear what your real problem here is, pandas dataframes are compatible with sklearn interfaces, also if you don't want to write the header to a csv from pandas than you can pass param header=None in to_csv – EdChum Apr 7 '15 at 10:51
    
@EdChum yes this is true actually my problem is that 1) if suppose i pass param as header=None and after modeling or at the time of feature selection i want to know the header how would i know the headers as i overlooked the header at the time of file opening. and 2) how can i use the given example data directly with pandas to scikit-learn data frame in the form of X = (data without header and class label) and y = (class label for predictions ) – jax Apr 7 '15 at 10:55
    
Well you can do all this pandas fine, like I said the sklearn interfaces are compatible with pandas dfs – EdChum Apr 7 '15 at 11:00
    
@EdChum Hi thanks for reply i have solve my problem and write down a code which i have posted as a answer. This code is doing well for me. thanks – jax Apr 7 '15 at 11:47

after reading a lot finally I achieved what I want and successfully implemented data on scikit-learn, code to convert CSV data with scikit-learn compatible form is given bellow. thanks

import pandas as pd
r = pd.read_csv("/home/zebrafish/Desktop/ex.csv")
print r.values

clm_list = []
for column in r.columns:
    clm_list.append(column)


X = r[clm_list[0:len(clm_list)-1]].values
y = r[clm_list[len(clm_list)-1]].values

print clm_list
print X
print y

out come of this code is exactly what i want :

['GA_ID', 'PN_ID', 'PC_ID', 'MBP_ID', 'GR_ID', 'AP_ID', 'class']

[[  0.033   6.652   6.681   0.194   0.874   3.177]
 [  0.034   9.039   6.224   0.194   1.137   3.177]
 [  0.035  10.936  10.304   1.015   0.911   4.9  ]
 [  0.022  10.11    9.603   1.374   0.848   4.566]]

[0 0 1 1]
share|improve this answer
    
You can simplify your column list creation to just this: clm_list = list(r) – EdChum Apr 7 '15 at 11:59
    
thanks it works great – jax Apr 8 '15 at 13:00
    
I just copied your code. It ran my Scikit program. THanks. – Chakra Oct 5 '15 at 11:07

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.