I’m trying to create MODEL like decision tree style that receive series of STRINGs.
I’m using WEKA , with J48 classifier and stringToWordVector as a filter.
As I know a lot of classifiers run with numbers instead of strings (like regression , currently I don’t want map between string <-> numbers).
I’ve create an .arff file training data and test data.
@relation test
@attribute class-att {OUTPUT_1,OUTPUT_2,OUTPUT_3}
@attribute Text1 string
@attribute Text2 string
@attribute Text3 string
@attribute Text4 string
@attribute Text5 string
@data
OUTPUT_1,'a','b','c','d','e'
OUTPUT_2,'a','b','c','d','?'
OUTPUT_2,'a','b','?','?','?'
OUTPUT_3,'f','g','h','i','j'
OUTPUT_3,'f','g','h','i','?'
% -- here where instead of '?' I want to be
string regex any char-- %
Test data:
@relation test
@attribute class-att {OUTPUT_1,OUTPUT_2,OUTPUT_3}
@attribute Text1 string
@attribute Text2 string
@attribute Text3 string
@attribute Text4 string
@attribute Text5 string
@data
?,'a','b','c','d','e'
?,'a','b','c','d','x'
?,'a','b','q','w','r'
?,'f','g','h','i','j'
?,'f','g','h','i','x'
How can I classify data as regex when ‘?’ appears…?
Any suggestions please