@sowmiyanm,
My reading of the OP makes me think that summing of x
is not required. If I understood her correctly, what she needs is a matrix of name
and x
, with the entries being total of y
corresponding to each pair of name-x
(or NA
, if there are no y
s for any given pair).
To be a little more precise, let’s suppose there are two additional records, 7 and 8.
1 adnaw 72572 0
2 AJH 72572 0
3 atgmdog 72572 0
4 adnaw 72500 4
5 AJH 72500 1
6 Babymooner2 72500 3
7 adnaw 72500 3
8 AJH 72503 2
Then the entry for adnaw-72572 in the final matrix should be 0; that of adnaw-72500 should be 7, and NA for adnaw-72503.
Here’s one way of getting the equivalent data-frame – well, sort of: you don’t have those NA terms, and it’s not the required matrix.
(You will need to import the dplyr
package for this one.)
> data = data.frame(X = c('adnaw','AJH','atgmdog','adnaw','AJH','Babymooner2', 'adnaw', 'AJH'), Y = c(72572,72572,72572,72500,72500,72500,72500,72503), Z = c(0,0,0,4,1,3,3,2))
> data
X Y Z
1 adnaw 72572 0
2 AJH 72572 0
3 atgmdog 72572 0
4 adnaw 72500 4
5 AJH 72500 1
6 Babymooner2 72500 3
7 adnaw 72500 3
8 AJH 72503 2
> data = tbl_df(data) #Good practice
> grouped = group_by(data, X, Y)
> final_df = summarise(grouped, total = sum(Z))
> final_df
Source: local data frame [7 x 3]
Groups: X
X Y total
1 adnaw 72500 7
2 adnaw 72572 0
3 AJH 72500 1
4 AJH 72503 2
5 AJH 72572 0
6 atgmdog 72572 0
7 Babymooner2 72500 3