Difference between wide and long data format?

r
data

#1

I am currently studying about different data set while studying I came across two type of data format 1)long and 2)wide.I completely understand the long data format in which there are a number of instances with many variable and subject variable can repeat but I am unable to understand wide data format.I also want to how they are converted into each other and which package in R is used to convert them.


#2

This is a long format:
Product | Attribute | Value
A | Height | 10
A | Width | 5
A | Weight | 2
B | Height | 20
B | Width | 10

The same data is a wide format would be:
Product | Height | Width | Weight
A | 10 | 5 | 2
B | 20 | 10 | NA

In R , tidyr and dplyr are mosted used for such transformations.

Thanks


#3

Hi @sid100158

@sonny has aptly explained wisely explained this concept. However if you are still facing difficulty, you can think of it this way!

When you think ‘wide’, think 'horizontal’
When you think long, think vertical

  1. In wide format, categorical data is always grouped. You can think of it as a summary of long data. It is easier to read and interpret as compared to long format.
  2. In long vertical format, every row represents an observation belonging to a particular category.

For better understanding, I’d suggest you to practice with this cheatsheet on data wrangling in R.

data-wrangling-cheatsheet.pdf (492.3 KB)
Source: RStudio


#4

Hi,

For converting data to wide or long formats in R use Reshape2 package .
2 functions used from the above pack:

Melt() - converts wide data to long format.
Dcast()- converts long to wide.

Wide data has a column for each variable.
Whereas long format data has a column for possible variable types & a column for the values of those variables.

Ggplot2 requires wide format .