When dealing with attributes in classification problem etc, there are some data saved in onehot or multihot matrix despite attributes are too many. Such data is too large to handle, I want to get only the target attributes and hold the data with smaller size. It is inverse transformation of multihot matrix and I processed it with DataFrame.
First prepare the data. Attributes (a - d) are 1 in each row.
lis = [ ['row_0',1,0,0,0], ['row_1',1,1,0,0], ['row_2',0,0,0,1] ] df = pd.DataFrame(lis, columns=['name', 'a', 'b', 'c', 'd']) print(df)
name a b c d 0 row_0 1 0 0 0 1 row_1 1 1 0 0 2 row_2 0 0 0 1
Get attributes in each row and insert them in one column. I am using a for statement, so I want to improve it.
lis =  for i in range(len(df)): active_columns = df[df==1].ix[i].dropna().index.tolist() lis.append('_'.join(active_columns)) df['attr'] = lis df = df.drop(['a', 'b', 'c', 'd'], axis=1) print(df)
name attr 0 row_0 a 1 row_1 a_b 2 row_2 d