How can I append a list/vector in python as a comma separated string of two values as each of its elements?

list
vectors
python

#1

Suppose I have a vector a which contains emails of different users:
a = ('rahul.kumar@gmail.com','akashaggarwal123@gmail.com','ank.mishra@gmail.com','kumar.rahul96@gmail.com','rahulganotra@yahoo.com')

I tried writing a code to run a for loop on this vector to identify similar emails using the python fuzzywuzzy library and obtain the required output:
> email = []

     for i in range(0,len(a)):
           for j in range(i+1,len(a)):
                if(fuzz.token_sort_ratio(a[i],a[j]) >83):
                    email.append(a[i], a[j])

But this obviously doesn’t work.
I want to append the similar emails in email[] as:

I know there must be some way for this, i just can’t figure it out. If anyone could help me out with this?


#2

You can consider two approaches, one which will give duplicate records, and one which will not.

Without duplicate records:

 email_df = pd.DataFrame(columns=['Emails'])    # Dataframe that will store all the emails 
 index_value = 0   # Counter for row indexing of the Dataframe

 for i in range(0,len(a)):
     similar_email = []
     if a[i]=-1:     # To manage duplication / emails already checked
         continue
     else:
         similar_email.append(a[i])
          for j in range(i+1,len(a)):
             if a[j]=-1:
                 continue
             else:
                 if(fuzz.token_sort_ratio(a[i],a[j]) >83):
                     similar_email.append(a[j])
                     a[j] = -1    # So that its not considered again for similarity check
         email_df.set_value(index_value,'Emails',similar_email)
         index_value+=1

With duplicate records:

email_df = pd.DataFrame(columns=['Emails'])
index_value = 0
for i in range(0,len(a)):
    similar_email = []
    similar_email.append(a[i])
    for j in range(i+1,len(a)):
        if(fuzz.token_sort_ratio(a[i],a[j]) >83):
            similar_email.append(a[j])
    email_df.set_value(index_value,'Emails',similar_email)
    index_value+=1

Hoping that this works!