Dataframe rearrangement in R

r
data_wrangling

#1

I have a matrix with 3 columns named X, Y and Z.
I want a data frame containing the sum of Z for values of X and Y.
For example, I have the following matrix:

1 adnaw 72572 0
2 AJH 72572 0
3 atgmdog 72572 0
4 adnaw 72500 4
5 AJH 72500 1
6 Babymooner2 72500 3

The resultant matrix should have the columns as ‘adnaw’, ‘AJH’ etc. and the rows as ‘72572’, '72500’
Can someone help me in doing this?
I have been unable in doing so using the aggregate function in R


#2

@Harshita_Dudhe-you can use row sum to calculate the value of z and use cbind to add column for naming you can use rownames and colnames to give the names of row and column .

Hope this helps!
Regards,
hinduja1234


#3

hello @Harshita_Dudhe,

Is this what you are looking for :-

In that case the code is:

data <- data.frame(name = c('adnaw','AJH','atgmdog','A TripAdvisor Member','AuntSusie006','Babymooner2'),
               x = c(72572,72572,72572,72572,72500,72500),
               y = c(0,0,0,4,1,3))
data <- cbind(data,z = data$x+data$y)
data <- data[,-c(2:3)]
library(reshape2)
t(data)

#4

I am looking for the column names to be name, rownames to be x and the entries in the matrix to be y(or NA if no such NA exists.)
I have changed the data a bit so that it is relevant to the problem.


#5

Harshita, instead of explaining the desired output in words, it’s much better to show the exact output so there is no confusion.

For example,

           V1     V2 V3
1       adnaw 145072  4
2         AJH 145072  1
3     atgmdog  72572  0
4 Babymooner2  72500  3

If not, we could be here for a long time trying to figure out exactly what you are looking for.


#6

I suppose as @Pierre_Lafortune said, you need the aggregated results for each user, but in the transposed form

data <- data.frame(name = c(‘adnaw’,‘AJH’,‘atgmdog’,‘adnaw’,‘AJH’,‘Babymooner2’),x = c(72572,72572,72572,72572,72500,72500),y = c(0,0,0,4,1,3))
a<-ddply(data,~name,summarise,agg.x=sum(x))
b<-ddply(data,~name,summarise,agg.y=sum(y))
c<-merge(a,b,by=“name”,all = TRUE)

Result:
[,1] [,2] [,3] [,4]
name “adnaw” “AJH” “atgmdog” “Babymooner2"
agg.x “145144” “145072” " 72572” " 72500"
agg.y “4” “1” “0” “3”


#7

@sowmiyanm,
My reading of the OP makes me think that summing of x is not required. If I understood her correctly, what she needs is a matrix of name and x, with the entries being total of y corresponding to each pair of name-x (or NA, if there are no ys for any given pair).

To be a little more precise, let’s suppose there are two additional records, 7 and 8.

1 adnaw 72572 0
2 AJH 72572 0 
3 atgmdog 72572 0
4 adnaw 72500 4
5 AJH 72500 1
6 Babymooner2 72500 3
7 adnaw 72500 3
8 AJH 72503 2

Then the entry for adnaw-72572 in the final matrix should be 0; that of adnaw-72500 should be 7, and NA for adnaw-72503.

Here’s one way of getting the equivalent data-frame – well, sort of: you don’t have those NA terms, and it’s not the required matrix.

(You will need to import the dplyr package for this one.)

> data = data.frame(X = c('adnaw','AJH','atgmdog','adnaw','AJH','Babymooner2', 'adnaw', 'AJH'), Y = c(72572,72572,72572,72500,72500,72500,72500,72503), Z = c(0,0,0,4,1,3,3,2))
> data
            X     Y Z
1       adnaw 72572 0
2         AJH 72572 0
3     atgmdog 72572 0
4       adnaw 72500 4
5         AJH 72500 1
6 Babymooner2 72500 3
7       adnaw 72500 3
8         AJH 72503 2
> data = tbl_df(data) #Good practice
> grouped = group_by(data, X, Y)
> final_df = summarise(grouped, total = sum(Z))
> final_df
Source: local data frame [7 x 3]
Groups: X

            X     Y total
1       adnaw 72500     7
2       adnaw 72572     0
3         AJH 72500     1
4         AJH 72503     2
5         AJH 72572     0
6     atgmdog 72572     0
7 Babymooner2 72500     3