Chapter 4 Reshape

From my experience, I feel that understanding and mastering the reshape command is invaluable. This is especially true when working with panel data. Panel data should not be used in a wide format. Panel data should be in a long format where you can use the xtset unit time command. After the xtset is established you can easily add lags or differences with the l.var or d.var

It is also helpful in terms of merging datasets. A lot of time if I’m merging BLS data, such as state unemployment, or inflation rates, I need to reshape the BLS data from wide to long to I can merge m:1

I typically work in long formats, but reshaping to wide formats does have its benefits. I find wide formats are helpful for people who work in excel, especially when years are across columns.

4.1 Wide and Long Datasets

We have two types of datasets: wide and long

Wide Format

Our wide datasets typically include years or variables types spread out across the columns.

cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell"
use cardio_wide, clear
describe

/Users/Sam/Desktop/Econ 645/Data/Mitchell



Contains data from cardio_wide.dta
  obs:             6                          
 vars:            12                          22 Dec 2009 20:43
 size:           120                          
----------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
----------------------------------------------------------------------------------------------------------------------
id              byte    %3.0f                 ID of person
age             byte    %3.0f                 Age of person
bp1             int     %3.0f                 Blood pressure systolic Trial 1
bp2             int     %3.0f                 Blood pressure systolic Trial 2
bp3             int     %3.0f                 Blood pressure systolic Trial 3
bp4             int     %3.0f                 Blood pressure systolic Trial 4
bp5             int     %3.0f                 Blood pressure systolic Trial 5
pl1             int     %3.0f                 Pulse: Trial 1
pl2             byte    %3.0f                 Pulse: Trial 2
pl3             int     %3.0f                 Pulse: Trial 3
pl4             int     %3.0f                 Pulse: Trial 4
pl5             byte    %3.0f                 Pulse: Trial 5
----------------------------------------------------------------------------------------------------------------------
Sorted by:

We have 12 variables idcode, age, and our 5 different blood pressure and pulse trials. We see that are 5 different incidents/trials of blood pressure and pulse are spread out across 5 different variables \(bp1-bp5\) and \(pl1-pl5\)*

list

     | id   age   bp1   bp2   bp3   bp4   bp5   pl1   pl2   pl3   pl4   pl5 |
     |----------------------------------------------------------------------|
  1. |  1    40   115    86   129   105   127    54    87    93    81    92 |
  2. |  2    30   123   136   107   111   120    92    88   125    87    58 |
  3. |  3    16   124   122   101   109   112   105    97   128    57    68 |
  4. |  4    23   105   115   121   129   137    52    79    71   106    39 |
  5. |  5    18   116   128   112   125   111    70    64    52    68    59 |
     |----------------------------------------------------------------------|
  6. |  6    27   108   126   124   131   107    74    78    92    99    80 |
     +----------------------------------------------------------------------+

With a wide format each row is an observation and all of our different trials are located on a single row

Long Format

We’ll look at an example of the same dataset but in long format

cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell"
use cardio_long, clear
describe

/Users/Sam/Desktop/Econ 645/Data/Mitchell



Contains data from cardio_long.dta
  obs:            30                          
 vars:             5                          10 Feb 2020 23:02
 size:           210                          
----------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
----------------------------------------------------------------------------------------------------------------------
id              byte    %3.0f                 ID of person
trial           byte    %9.0g                 Trial number
age             byte    %3.0f                 Age of person
bp              int     %3.0f                 Blood pressure (systolic)
pl              int     %3.0f                 Pulse
----------------------------------------------------------------------------------------------------------------------
Sorted by: id  trial

We only have 5 variables idcode, age, trial, blood pressure, and pulse.

list

     | id   trial   age    bp    pl |
     |------------------------------|
  1. |  1       1    40   115    54 |
  2. |  1       2    40    86    87 |
  3. |  1       3    40   129    93 |
  4. |  1       4    40   105    81 |
  5. |  1       5    40   127    92 |
     |------------------------------|
  6. |  2       1    30   123    92 |
  7. |  2       2    30   136    88 |
  8. |  2       3    30   107   125 |
  9. |  2       4    30   111    87 |
 10. |  2       5    30   120    58 |
     |------------------------------|
 11. |  3       1    16   124   105 |
 12. |  3       2    16   122    97 |
 13. |  3       3    16   101   128 |
 14. |  3       4    16   109    57 |
 15. |  3       5    16   112    68 |
     |------------------------------|
 16. |  4       1    23   105    52 |
 17. |  4       2    23   115    79 |
 18. |  4       3    23   121    71 |
 19. |  4       4    23   129   106 |
 20. |  4       5    23   137    39 |
     |------------------------------|
 21. |  5       1    18   116    70 |
 22. |  5       2    18   128    64 |
 23. |  5       3    18   112    52 |
 24. |  5       4    18   125    68 |
 25. |  5       5    18   111    59 |
     |------------------------------|
 26. |  6       1    27   108    74 |
 27. |  6       2    27   126    78 |
 28. |  6       3    27   124    92 |
 29. |  6       4    27   131    99 |
 30. |  6       5    27   107    80 |
     +------------------------------+

We also see that each individual has 5 observations. One for each trial that is sorted from 1 to 5.

Notice: We have a cross-sectional unit variable with \(idcode\) and we also have a time component with \(trial\). Our dataset is basically a panel data set. We observe the same person over 5 trials.

What kind of format should we use?

It depends.

Wide Format

Mitchell makes a good point of looking at correlations among the blood pressure trials.

cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell"
use cardio_wide, clear
correlate bp1 bp2 bp3 bp4 bp5

/Users/Sam/Desktop/Econ 645/Data/Mitchell


(obs=6)

             |      bp1      bp2      bp3      bp4      bp5
-------------+---------------------------------------------
         bp1 |   1.0000
         bp2 |   0.2427   1.0000
         bp3 |  -0.7662  -0.6657   1.0000
         bp4 |  -0.7644   0.3980   0.2644   1.0000
         bp5 |  -0.3643  -0.4984   0.3694  -0.0966   1.0000

Mitchell also mentions that multivariate analysis with multiple trials is a bit easier with factor variables. If we wanted to see all of the different regressions for each trial, then

mvreg bp* = age

Equation             Obs   Parms        RMSE    "R-sq"          F        P
--------------------------------------------------------------------------
bp1                    6       2    8.516293    0.0160   .0651388   0.8111
bp2                    6       2    15.34222    0.3858   2.512064   0.1882
bp3                    6       2    8.846404    0.4597   3.402782   0.1388
bp4                    6       2    11.60081    0.1554   .7357661   0.4394
bp5                    6       2    11.81248    0.1360    .629676   0.4719

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
bp1          |
         age |  -.1107266   .4338427    -0.26   0.811    -1.315267    1.093814
       _cons |   118.0087   11.66545    10.12   0.001     85.62017    150.3971
-------------+----------------------------------------------------------------
bp2          |
         age |  -1.238754   .7815736    -1.58   0.188     -3.40875    .9312418
       _cons |    150.628   21.01547     7.17   0.002     92.27974    208.9763
-------------+----------------------------------------------------------------
bp3          |
         age |   .8313149   .4506595     1.84   0.139    -.4199164    2.082546
       _cons |   94.32958   12.11763     7.78   0.001     60.68565    127.9735
-------------+----------------------------------------------------------------
bp4          |
         age |  -.5069204   .5909761    -0.86   0.439    -2.147733    1.133892
       _cons |   131.3443   15.89055     8.27   0.001     87.22504    175.4635
-------------+----------------------------------------------------------------
bp5          |
         age |   .4775087   .6017591     0.79   0.472    -1.193242     2.14826
       _cons |   106.7439    16.1805     6.60   0.003     61.81969    151.6682
------------------------------------------------------------------------------

Long Format

There are several reasons we would want to use long format over wide format

Correlations

If we want correlations between blood pressure and pulse, a long format is preferable.

cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell"
use cardio_long, clear
correlate bp pl

/Users/Sam/Desktop/Econ 645/Data/Mitchell


(obs=30)

             |       bp       pl
-------------+------------------
          bp |   1.0000
          pl |  -0.0444   1.0000

bysort trial: correlate bp pl

-> trial = 1
(obs=6)

             |       bp       pl
-------------+------------------
          bp |   1.0000
          pl |   0.7958   1.0000


----------------------------------------------------------------------------------------------------------------------
-> trial = 2
(obs=6)

             |       bp       pl
-------------+------------------
          bp |   1.0000
          pl |  -0.1985   1.0000


----------------------------------------------------------------------------------------------------------------------
-> trial = 3
(obs=6)

             |       bp       pl
-------------+------------------
          bp |   1.0000
          pl |  -0.4790   1.0000


----------------------------------------------------------------------------------------------------------------------
-> trial = 4
(obs=6)

             |       bp       pl
-------------+------------------
          bp |   1.0000
          pl |   0.5639   1.0000


----------------------------------------------------------------------------------------------------------------------
-> trial = 5
(obs=6)

             |       bp       pl
-------------+------------------
          bp |   1.0000
          pl |  -0.3911   1.0000

Panel data

It is also more appropriate for panel data and the xt commands, such as xtset, xtreg, etc. I also prefer this for regular regressions where we can use factor variables \(i.var\) for different time periods.

xtset id trial
xtreg bp age
xtline bp, overlay

Recoding

It is also easier to recode in long format.

recode pl (min/89=0) (90/max=1), generate (plhi)

Recoding among many variables in wide format is more of a pain You can see this on page 293 in Mitchell.

Egen is easier

If you wanted to mean of each 5 trials, it just as easy with egen in long format.

bysort id: egen pl_avg=mean(pl)
bysort id: egen bp_avg=mean(bp)

For wide-format you can use rowmean with the wildcard

use cardio_wide, clear

We don’t need the bysort group, but we need to be careful which columns to include

egen pl_avg = rowmean(pl*)
egen bp_avg = rowmean(bp*)

It easier to explicitly difference trails with wide-format, but requires more typing.

generate pldiff2 = pl2-pl1
generate pldiff3 = pl3-pl2
generate pldiff4 = pl4-pl3
generate pldiff5 = pl5-pl4

4.1.0.1 Using subscripting/indexing, differences, lags with ease

I prefer using subscripting/indexing within groups or using xtset with panel data since we have a lot of flexibilities.

cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell"
use cardio_long, clear
sort id trial

We can use indexing

by id: generate pldiff = pl-pl[_n-1]

We can difference from the first trial

by id: generate pldiff_1trial = pl-pl[1]

We can use xtset as well with l.var

xtset id trial

       panel variable:  id (strongly balanced)
        time variable:  trial, 1 to 5
                delta:  1 unit

One lag

gen pldiff_other = pl-l.pl

We can use two lags

gen pldiff2_other = l2.pl-l1.pl

We can use three lags

gen pldiff3_other = l3.pl-l2.pl
list

(18 missing values generated)

     +-----------------------------------------------------------------------------------+
     | id   trial   age    bp    pl   pldiff   pldiff~l   ~f_other   p~2_ot~r   p~3_ot~r |
     |-----------------------------------------------------------------------------------|
  1. |  1       1    40   115    54        .          0          .          .          . |
  2. |  1       2    40    86    87       33         33         33          .          . |
  3. |  1       3    40   129    93        6         39          6        -33          . |
  4. |  1       4    40   105    81      -12         27        -12         -6        -33 |
  5. |  1       5    40   127    92       11         38         11         12         -6 |
     |-----------------------------------------------------------------------------------|
  6. |  2       1    30   123    92        .          0          .          .          . |
  7. |  2       2    30   136    88       -4         -4         -4          .          . |
  8. |  2       3    30   107   125       37         33         37          4          . |
  9. |  2       4    30   111    87      -38         -5        -38        -37          4 |
 10. |  2       5    30   120    58      -29        -34        -29         38        -37 |
     |-----------------------------------------------------------------------------------|
 11. |  3       1    16   124   105        .          0          .          .          . |
 12. |  3       2    16   122    97       -8         -8         -8          .          . |
 13. |  3       3    16   101   128       31         23         31          8          . |
 14. |  3       4    16   109    57      -71        -48        -71        -31          8 |
 15. |  3       5    16   112    68       11        -37         11         71        -31 |
     |-----------------------------------------------------------------------------------|
 16. |  4       1    23   105    52        .          0          .          .          . |
 17. |  4       2    23   115    79       27         27         27          .          . |
 18. |  4       3    23   121    71       -8         19         -8        -27          . |
 19. |  4       4    23   129   106       35         54         35          8        -27 |
 20. |  4       5    23   137    39      -67        -13        -67        -35          8 |
     |-----------------------------------------------------------------------------------|
 21. |  5       1    18   116    70        .          0          .          .          . |
 22. |  5       2    18   128    64       -6         -6         -6          .          . |
 23. |  5       3    18   112    52      -12        -18        -12          6          . |
 24. |  5       4    18   125    68       16         -2         16         12          6 |
 25. |  5       5    18   111    59       -9        -11         -9        -16         12 |
     |-----------------------------------------------------------------------------------|
 26. |  6       1    27   108    74        .          0          .          .          . |
 27. |  6       2    27   126    78        4          4          4          .          . |
 28. |  6       3    27   124    92       14         18         14         -4          . |
 29. |  6       4    27   131    99        7         25          7        -14         -4 |
 30. |  6       5    27   107    80      -19          6        -19         -7        -14 |
     +-----------------------------------------------------------------------------------+

Wide format can become unwieldy quickly and Stata does have a variable limit size, while your observation limit is typically your memory.In short, it is better in the long-run to learn to work with Stata in long-format.

However, Mitchell does bring up a good point that in our panel data, we mayneed multiple weights wt1-wtk. You can see this in the ACS PUMS with 250 or soreplicate weights. We do not want to reshape our data by weights, we need to keep the data in a cross-sectional unit i and time period t format.

In short, I agree with Mitchell that you should learn to work with your data in a long-format data structure.

4.2 Reshaping Long to Wide

There may be scenarios where we need to reshape our long-format data into a wide format. For example, it might be easier to match and merge our data If our using data set is in wide format and our master data set is in wide format

Let’s look at an example

cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell"
use cardio_long, clear
describe 
list

/Users/Sam/Desktop/Econ 645/Data/Mitchell



Contains data from cardio_long.dta
  obs:            30                          
 vars:             5                          10 Feb 2020 23:02
 size:           210                          
----------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
----------------------------------------------------------------------------------------------------------------------
id              byte    %3.0f                 ID of person
trial           byte    %9.0g                 Trial number
age             byte    %3.0f                 Age of person
bp              int     %3.0f                 Blood pressure (systolic)
pl              int     %3.0f                 Pulse
----------------------------------------------------------------------------------------------------------------------
Sorted by: id  trial

     +------------------------------+
     | id   trial   age    bp    pl |
     |------------------------------|
  1. |  1       1    40   115    54 |
  2. |  1       2    40    86    87 |
  3. |  1       3    40   129    93 |
  4. |  1       4    40   105    81 |
  5. |  1       5    40   127    92 |
     |------------------------------|
  6. |  2       1    30   123    92 |
  7. |  2       2    30   136    88 |
  8. |  2       3    30   107   125 |
  9. |  2       4    30   111    87 |
 10. |  2       5    30   120    58 |
     |------------------------------|
 11. |  3       1    16   124   105 |
 12. |  3       2    16   122    97 |
 13. |  3       3    16   101   128 |
 14. |  3       4    16   109    57 |
 15. |  3       5    16   112    68 |
     |------------------------------|
 16. |  4       1    23   105    52 |
 17. |  4       2    23   115    79 |
 18. |  4       3    23   121    71 |
 19. |  4       4    23   129   106 |
 20. |  4       5    23   137    39 |
     |------------------------------|
 21. |  5       1    18   116    70 |
 22. |  5       2    18   128    64 |
 23. |  5       3    18   112    52 |
 24. |  5       4    18   125    68 |
 25. |  5       5    18   111    59 |
     |------------------------------|
 26. |  6       1    27   108    74 |
 27. |  6       2    27   126    78 |
 28. |  6       3    27   124    92 |
 29. |  6       4    27   131    99 |
 30. |  6       5    27   107    80 |
     +------------------------------+

Our reshape command requires reshape 1. What direction: wide or long 2. Variables to reshape (bp and pl) 3. Variables define an observation i(id) 4. Variables that defines the repeated observations for each person orvtime variable j(trial) for reshaping long to wide

Trial values will become the suffix values for our reshape values

reshape wide bp pl, i(id) j(trial)
describe
list

(note: j = 1 2 3 4 5)

Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                       30   ->       6
Number of variables                   5   ->      12
j variable (5 values)             trial   ->   (dropped)
xij variables:
                                     bp   ->   bp1 bp2 ... bp5
                                     pl   ->   pl1 pl2 ... pl5
-----------------------------------------------------------------------------


Contains data
  obs:             6                          
 vars:            12                          
 size:           132                          
----------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
----------------------------------------------------------------------------------------------------------------------
id              byte    %3.0f                 ID of person
bp1             int     %3.0f                 1 bp
pl1             int     %3.0f                 1 pl
bp2             int     %3.0f                 2 bp
pl2             int     %3.0f                 2 pl
bp3             int     %3.0f                 3 bp
pl3             int     %3.0f                 3 pl
bp4             int     %3.0f                 4 bp
pl4             int     %3.0f                 4 pl
bp5             int     %3.0f                 5 bp
pl5             int     %3.0f                 5 pl
age             byte    %3.0f                 Age of person
----------------------------------------------------------------------------------------------------------------------
Sorted by: id
     Note: Dataset has changed since last saved.

     +----------------------------------------------------------------------+
     | id   bp1   pl1   bp2   pl2   bp3   pl3   bp4   pl4   bp5   pl5   age |
     |----------------------------------------------------------------------|
  1. |  1   115    54    86    87   129    93   105    81   127    92    40 |
  2. |  2   123    92   136    88   107   125   111    87   120    58    30 |
  3. |  3   124   105   122    97   101   128   109    57   112    68    16 |
  4. |  4   105    52   115    79   121    71   129   106   137    39    23 |
  5. |  5   116    70   128    64   112    52   125    68   111    59    18 |
     |----------------------------------------------------------------------|
  6. |  6   108    74   126    78   124    92   131    99   107    80    27 |
     +----------------------------------------------------------------------+

Notice we excluded age. It is constant within the cross-sectional unit. It does not define the cross-sectional unit or the*time dimension unit. We do not want to separate it into repeated observations either.

Without doing any additional work, we can reshape it back to long with simply reshape long.

reshape long
describe

(note: j = 1 2 3 4 5)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                        6   ->      30
Number of variables                  12   ->       5
j variable (5 values)                     ->   trial
xij variables:
                        bp1 bp2 ... bp5   ->   bp
                        pl1 pl2 ... pl5   ->   pl
-----------------------------------------------------------------------------


Contains data
  obs:            30                          
 vars:             5                          
 size:           210                          
----------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
----------------------------------------------------------------------------------------------------------------------
id              byte    %3.0f                 ID of person
trial           byte    %9.0g                 Trial number
bp              int     %3.0f                 Blood pressure (systolic)
pl              int     %3.0f                 Pulse
age             byte    %3.0f                 Age of person
----------------------------------------------------------------------------------------------------------------------
Sorted by: id  trial
     Note: Dataset has changed since last saved.

4.3 Reshaping Long to Wide: Problems

Problem 1: A constant value within a cross-sectional unit is not constant

cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell"
use cardio_long2, clear
list in 1/10

/Users/Sam/Desktop/Econ 645/Data/Mitchell

     +------------------------------+
     | id   trial   age    bp    pl |
     |------------------------------|
  1. |  1       1    40   115    54 |
  2. |  1       2    40    86    87 |
  3. |  1       3    44   129    93 |
  4. |  1       4    40   105    81 |
  5. |  1       5    40   127    92 |
     |------------------------------|
  6. |  2       1    30   123    92 |
  7. |  2       2    30   136    88 |
  8. |  2       3    30   107   125 |
  9. |  2       4    30   111    87 |
 10. |  2       5    30   120    58 |
     +------------------------------+

We see that cross-sectional unit 1 has a data entry problem with age in the 3rd trial. If we reshape, we will get an error

reshape wide bp pl, i(id) j(trial)

(note: j = 1 2 3 4 5)
variable age not constant within id
    Your data are currently long.  You are performing a reshape wide.  You typed something like

        . reshape wide a b, i(id) j(trial)

    There are variables other than a, b, id, trial in your data.  They must be constant within id because that is
    the only way they can fit into wide data without loss of information.

    The variable or variables listed above are not constant within id.  Perhaps the values are in error.  Type
    reshape error for a list of the problem observations.

    Either that, or the values vary because they should vary, in which case you must either add the variables to the
    list of xij variables to be reshaped, or drop them.
r(9);

r(9);

Use the reshape error command to help with the problem

reshape error

It tells us that our constant variable is not constant within cross-sectional units.

Problem 2: Excluding a non-constant that we need to reshape

If we excluded pl from our reshape, we will get a similar error.

capture reshape wide bp, i(id) j(trial)
reshape error

(note: j = 1 2 3 4 5)

i (id) indicates the top-level grouping such as subject id.
j (trial) indicates the subgrouping such as time.
xij variable is bp.
Thus, the following variable(s) should be constant within i:
      age pl

age not constant within i (id) for 1 value of i:

     +------------------+
     | id   trial   age |
     |------------------|
  1. |  1       1    40 |
  2. |  1       2    40 |
  3. |  1       3    44 |
  4. |  1       4    40 |
  5. |  1       5    40 |
     +------------------+

pl not constant within i (id) for 6 values of i:

     +------------------+
     | id   trial    pl |
     |------------------|
  1. |  1       1    54 |
  2. |  1       2    87 |
  3. |  1       3    93 |
  4. |  1       4    81 |
  5. |  1       5    92 |
     |------------------|
  6. |  2       1    92 |
  7. |  2       2    88 |
  8. |  2       3   125 |
  9. |  2       4    87 |
 10. |  2       5    58 |
     |------------------|
 11. |  3       1   105 |
 12. |  3       2    97 |
 13. |  3       3   128 |
 14. |  3       4    57 |
 15. |  3       5    68 |
     |------------------|
 16. |  4       1    52 |
 17. |  4       2    79 |
 18. |  4       3    71 |
 19. |  4       4   106 |
 20. |  4       5    39 |
     |------------------|
 21. |  5       1    70 |
 22. |  5       2    64 |
 23. |  5       3    52 |
 24. |  5       4    68 |
 25. |  5       5    59 |
     |------------------|
 26. |  6       1    74 |
 27. |  6       2    78 |
 28. |  6       3    92 |
 29. |  6       4    99 |
 30. |  6       5    80 |
     +------------------+

(data now sorted by id trial)

4.4 Reshaping Wide to Long

In my personal opinion, this reshaping wide to long is a more common occurance. It is important to get data into a long-format so we can conduct panel analysis. I had to do this for a few of the Wooldridge datasets that were panel data, but in wide format.

cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell"
use cardio_wide, clear
describe
list

/Users/Sam/Desktop/Econ 645/Data/Mitchell



Contains data from cardio_wide.dta
  obs:             6                          
 vars:            12                          22 Dec 2009 20:43
 size:           120                          
----------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
----------------------------------------------------------------------------------------------------------------------
id              byte    %3.0f                 ID of person
age             byte    %3.0f                 Age of person
bp1             int     %3.0f                 Blood pressure systolic Trial 1
bp2             int     %3.0f                 Blood pressure systolic Trial 2
bp3             int     %3.0f                 Blood pressure systolic Trial 3
bp4             int     %3.0f                 Blood pressure systolic Trial 4
bp5             int     %3.0f                 Blood pressure systolic Trial 5
pl1             int     %3.0f                 Pulse: Trial 1
pl2             byte    %3.0f                 Pulse: Trial 2
pl3             int     %3.0f                 Pulse: Trial 3
pl4             int     %3.0f                 Pulse: Trial 4
pl5             byte    %3.0f                 Pulse: Trial 5
----------------------------------------------------------------------------------------------------------------------
Sorted by: 

     +----------------------------------------------------------------------+
     | id   age   bp1   bp2   bp3   bp4   bp5   pl1   pl2   pl3   pl4   pl5 |
     |----------------------------------------------------------------------|
  1. |  1    40   115    86   129   105   127    54    87    93    81    92 |
  2. |  2    30   123   136   107   111   120    92    88   125    87    58 |
  3. |  3    16   124   122   101   109   112   105    97   128    57    68 |
  4. |  4    23   105   115   121   129   137    52    79    71   106    39 |
  5. |  5    18   116   128   112   125   111    70    64    52    68    59 |
     |----------------------------------------------------------------------|
  6. |  6    27   108   126   124   131   107    74    78    92    99    80 |
     +----------------------------------------------------------------------+

Let’s reshape wide to long.

Our reshape values that are not constant: bp and pl.

Our cross-sectional unit to reshape upon or our list of variables that uniquely identify a cross-sectional unit. For example, we may need a State FIPS code and a County FIPS code to unique identify a county cross-sectional unit (of course we can merge the 2-digit State FIPS code and the 3-digit County FIPS code to create a 5-digit FIPS code in one variable).

Our time unit, where with reshape long, we need to create a new variable that takes on the values of the suffixes of bp and pl (For example trialnum). Our constant values within cross-sectional units do not need to be identified.

reshape long bp pl, i(id) j(trialnum)
describe
list in 1/10

(note: j = 1 2 3 4 5)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                        6   ->      30
Number of variables                  12   ->       5
j variable (5 values)                     ->   trialnum
xij variables:
                        bp1 bp2 ... bp5   ->   bp
                        pl1 pl2 ... pl5   ->   pl
-----------------------------------------------------------------------------


Contains data
  obs:            30                          
 vars:             5                          
 size:           210                          
----------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
----------------------------------------------------------------------------------------------------------------------
id              byte    %3.0f                 ID of person
trialnum        byte    %9.0g                 
age             byte    %3.0f                 Age of person
bp              int     %3.0f                 
pl              int     %3.0f                 
----------------------------------------------------------------------------------------------------------------------
Sorted by: id  trialnum
     Note: Dataset has changed since last saved.

     +---------------------------------+
     | id   trialnum   age    bp    pl |
     |---------------------------------|
  1. |  1          1    40   115    54 |
  2. |  1          2    40    86    87 |
  3. |  1          3    40   129    93 |
  4. |  1          4    40   105    81 |
  5. |  1          5    40   127    92 |
     |---------------------------------|
  6. |  2          1    30   123    92 |
  7. |  2          2    30   136    88 |
  8. |  2          3    30   107   125 |
  9. |  2          4    30   111    87 |
 10. |  2          5    30   120    58 |
     +---------------------------------+

To reshape it back to wide, we just simply state

reshape wide
describe
list

(note: j = 1 2 3 4 5)

Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                       30   ->       6
Number of variables                   5   ->      12
j variable (5 values)          trialnum   ->   (dropped)
xij variables:
                                     bp   ->   bp1 bp2 ... bp5
                                     pl   ->   pl1 pl2 ... pl5
-----------------------------------------------------------------------------


Contains data
  obs:             6                          
 vars:            12                          
 size:           132                          
----------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
----------------------------------------------------------------------------------------------------------------------
id              byte    %3.0f                 ID of person
bp1             int     %3.0f                 1 bp
pl1             int     %3.0f                 1 pl
bp2             int     %3.0f                 2 bp
pl2             int     %3.0f                 2 pl
bp3             int     %3.0f                 3 bp
pl3             int     %3.0f                 3 pl
bp4             int     %3.0f                 4 bp
pl4             int     %3.0f                 4 pl
bp5             int     %3.0f                 5 bp
pl5             int     %3.0f                 5 pl
age             byte    %3.0f                 Age of person
----------------------------------------------------------------------------------------------------------------------
Sorted by: id
     Note: Dataset has changed since last saved.

     +----------------------------------------------------------------------+
     | id   bp1   pl1   bp2   pl2   bp3   pl3   bp4   pl4   bp5   pl5   age |
     |----------------------------------------------------------------------|
  1. |  1   115    54    86    87   129    93   105    81   127    92    40 |
  2. |  2   123    92   136    88   107   125   111    87   120    58    30 |
  3. |  3   124   105   122    97   101   128   109    57   112    68    16 |
  4. |  4   105    52   115    79   121    71   129   106   137    39    23 |
  5. |  5   116    70   128    64   112    52   125    68   111    59    18 |
     |----------------------------------------------------------------------|
  6. |  6   108    74   126    78   124    92   131    99   107    80    27 |
     +----------------------------------------------------------------------+

4.5 Reshaping wide to long problems

Problem 1: Failing to list all of the varying variables to be reshaped.

This is a dangerous kind of problem, since it doesn’t throw an error. This requires the user to inspect the data after reshaping to make sure everything worked properly.

cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell"
use cardio_wide, clear
reshape long bp, i(id) j(trialnum)
list

/Users/Sam/Desktop/Econ 645/Data/Mitchell


(note: j = 1 2 3 4 5)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                        6   ->      30
Number of variables                  12   ->       9
j variable (5 values)                     ->   trialnum
xij variables:
                        bp1 bp2 ... bp5   ->   bp
-----------------------------------------------------------------------------

     +---------------------------------------------------------+
     | id   trialnum   age    bp   pl1   pl2   pl3   pl4   pl5 |
     |---------------------------------------------------------|
  1. |  1          1    40   115    54    87    93    81    92 |
  2. |  1          2    40    86    54    87    93    81    92 |
  3. |  1          3    40   129    54    87    93    81    92 |
  4. |  1          4    40   105    54    87    93    81    92 |
  5. |  1          5    40   127    54    87    93    81    92 |
     |---------------------------------------------------------|
  6. |  2          1    30   123    92    88   125    87    58 |
  7. |  2          2    30   136    92    88   125    87    58 |
  8. |  2          3    30   107    92    88   125    87    58 |
  9. |  2          4    30   111    92    88   125    87    58 |
 10. |  2          5    30   120    92    88   125    87    58 |
     |---------------------------------------------------------|
 11. |  3          1    16   124   105    97   128    57    68 |
 12. |  3          2    16   122   105    97   128    57    68 |
 13. |  3          3    16   101   105    97   128    57    68 |
 14. |  3          4    16   109   105    97   128    57    68 |
 15. |  3          5    16   112   105    97   128    57    68 |
     |---------------------------------------------------------|
 16. |  4          1    23   105    52    79    71   106    39 |
 17. |  4          2    23   115    52    79    71   106    39 |
 18. |  4          3    23   121    52    79    71   106    39 |
 19. |  4          4    23   129    52    79    71   106    39 |
 20. |  4          5    23   137    52    79    71   106    39 |
     |---------------------------------------------------------|
 21. |  5          1    18   116    70    64    52    68    59 |
 22. |  5          2    18   128    70    64    52    68    59 |
 23. |  5          3    18   112    70    64    52    68    59 |
 24. |  5          4    18   125    70    64    52    68    59 |
 25. |  5          5    18   111    70    64    52    68    59 |
     |---------------------------------------------------------|
 26. |  6          1    27   108    74    78    92    99    80 |
 27. |  6          2    27   126    74    78    92    99    80 |
 28. |  6          3    27   124    74    78    92    99    80 |
 29. |  6          4    27   131    74    78    92    99    80 |
 30. |  6          5    27   107    74    78    92    99    80 |
     +---------------------------------------------------------+

The remedy is easy: don’t forget all of your variables to be reshaped

use cardio_wide, clear
reshape long bp pl, i(id) j(trialnum)

Problem 2: Be careful when the time-varying value is embedded within the variable name and not at the end.

Typically you will find that the varying variable will have \(var_1, var_2, ..., var_k\) and you simply specify reshape long var, i(id) j(time). But, if the varying value is embedded like t1bp,…,t5bp and t1pl,…,t5pl, then we cannot simply use reshape long tbp tpl, i(id) j(trialnum), since Stata looks for the varying value at the end.

cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell"
use cardio_wide2, clear
describe
capture reshape long tpl tpb, i(id) j(trialnum)
reshape error

/Users/Sam/Desktop/Econ 645/Data/Mitchell



Contains data from cardio_wide2.dta
  obs:             6                          
 vars:            12                          31 Dec 2009 15:46
 size:           120                          
----------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
----------------------------------------------------------------------------------------------------------------------
id              byte    %3.0f                 ID of person
age             byte    %3.0f                 Age of person
t1bp            int     %3.0f                 Blood pressure systolic Trial 1
t2bp            int     %3.0f                 Blood pressure systolic Trial 2
t3bp            int     %3.0f                 Blood pressure systolic Trial 3
t4bp            int     %3.0f                 Blood pressure systolic Trial 4
t5bp            int     %3.0f                 Blood pressure systolic Trial 5
t1pl            int     %3.0f                 Pulse: Trial 1
t2pl            byte    %3.0f                 Pulse: Trial 2
t3pl            int     %3.0f                 Pulse: Trial 3
t4pl            int     %3.0f                 Pulse: Trial 4
t5pl            byte    %3.0f                 Pulse: Trial 5
----------------------------------------------------------------------------------------------------------------------
Sorted by: 


no xij variables found
    You typed something like reshape wide a b, i(i) j(j).
    reshape looked for existing variables named a# and b# but could not find any.  Remember this picture:

         long                                wide
        +---------------+                   +------------------+
        | i   j   a   b |                   | i   a1 a2  b1 b2 |
        |---------------| <--- reshape ---> |------------------|
        | 1   1   1   2 |                   | 1   1   3   2  4 |
        | 1   2   3   4 |                   | 2   5   7   6  8 |
        | 2   1   5   6 |                   +------------------+
        | 2   2   7   8 |
        +---------------+

        long to wide: reshape wide a b, i(i) j(j)    (j existing variable)
        wide to long: reshape long a b, i(i) j(j)    (j    new   variable)

We need the operator @ to indicate to Stata that our varying-values are in that part of the variable names.

reshape long t@pl t@bp, i(id) j(trialnum)
describe
list in 1/10

(note: j = 1 2 3 4 5)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                        6   ->      30
Number of variables                  12   ->       5
j variable (5 values)                     ->   trialnum
xij variables:
                     t1pl t2pl ... t5pl   ->   tpl
                     t1bp t2bp ... t5bp   ->   tbp
-----------------------------------------------------------------------------


Contains data
  obs:            30                          
 vars:             5                          
 size:           210                          
----------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
----------------------------------------------------------------------------------------------------------------------
id              byte    %3.0f                 ID of person
trialnum        byte    %9.0g                 
age             byte    %3.0f                 Age of person
tbp             int     %3.0f                 
tpl             int     %3.0f                 
----------------------------------------------------------------------------------------------------------------------
Sorted by: id  trialnum
     Note: Dataset has changed since last saved.

     +---------------------------------+
     | id   trialnum   age   tbp   tpl |
     |---------------------------------|
  1. |  1          1    40   115    54 |
  2. |  1          2    40    86    87 |
  3. |  1          3    40   129    93 |
  4. |  1          4    40   105    81 |
  5. |  1          5    40   127    92 |
     |---------------------------------|
  6. |  2          1    30   123    92 |
  7. |  2          2    30   136    88 |
  8. |  2          3    30   107   125 |
  9. |  2          4    30   111    87 |
 10. |  2          5    30   120    58 |
     +---------------------------------+

Problem 3: Be specific with which variables to reshape and don’t mistakenly constant variables that have the same prefix as the varying variables

cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell"
use cardio_wide3, clear
clist bp* pl*, noobs

/Users/Sam/Desktop/Econ 645/Data/Mitchell



bp1  bp2  bp3  bp4  bp5     bp2005  pl1  pl2  pl3  pl4  pl5     pl2005
115   86  129  105  127        112   54   87   93   81   92         81
123  136  107  111  120        119   92   88  125   87   58         90
124  122  101  109  112        113  105   97  128   57   68         91
105  115  121  129  137        121   52   79   71  106   39         69
116  128  112  125  111        118   70   64   52   68   59         62
108  126  124  131  107        119   74   78   92   99   80         84

Our \(bp2005\) and \(pl2005\) are constant variables within our cross-section, but have the same prefix as our value-varying variables.

reshape long bp pl, i(id) j(trialnum)
describe
list in 1/12, sepby(id)

(note: j = 1 2 3 4 5 2005)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                        6   ->      36
Number of variables                  14   ->       5
j variable (6 values)                     ->   trialnum
xij variables:
                     bp1 bp2 ... bp2005   ->   bp
                     pl1 pl2 ... pl2005   ->   pl
-----------------------------------------------------------------------------


Contains data
  obs:            36                          
 vars:             5                          
 size:           432                          
----------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
----------------------------------------------------------------------------------------------------------------------
id              byte    %3.0f                 ID of person
trialnum        int     %9.0g                 
age             byte    %3.0f                 Age of person
bp              float   %9.0g                 
pl              float   %9.0g                 
----------------------------------------------------------------------------------------------------------------------
Sorted by: id  trialnum
     Note: Dataset has changed since last saved.

     +---------------------------------+
     | id   trialnum   age    bp    pl |
     |---------------------------------|
  1. |  1          1    40   115    54 |
  2. |  1          2    40    86    87 |
  3. |  1          3    40   129    93 |
  4. |  1          4    40   105    81 |
  5. |  1          5    40   127    92 |
  6. |  1       2005    40   112    81 |
     |---------------------------------|
  7. |  2          1    30   123    92 |
  8. |  2          2    30   136    88 |
  9. |  2          3    30   107   125 |
 10. |  2          4    30   111    87 |
 11. |  2          5    30   120    58 |
 12. |  2       2005    30   119    90 |
     +---------------------------------+

We did not intend to reshape \(bp2005\) and \(pl2005\).

To rememdy, we need to specify the values within j() with j(trialnum 1-5) to indicate that we only want to reshape suffixes with 1-5.

cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell"
use cardio_wide3, clear
reshape long bp pl, i(id) j(trialnum 1-5)
describe
list in 1/10, sepby(id)

/Users/Sam/Desktop/Econ 645/Data/Mitchell



Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                        6   ->      30
Number of variables                  14   ->       7
j variable (5 values)                     ->   trialnum
xij variables:
                        bp1 bp2 ... bp5   ->   bp
                        pl1 pl2 ... pl5   ->   pl
-----------------------------------------------------------------------------


Contains data
  obs:            30                          
 vars:             7                          
 size:           450                          
----------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
----------------------------------------------------------------------------------------------------------------------
id              byte    %3.0f                 ID of person
trialnum        byte    %9.0g                 
age             byte    %3.0f                 Age of person
bp              int     %3.0f                 
bp2005          float   %9.0g                 
pl              int     %3.0f                 
pl2005          float   %9.0g                 
----------------------------------------------------------------------------------------------------------------------
Sorted by: id  trialnum
     Note: Dataset has changed since last saved.

     +---------------------------------------------------+
     | id   trialnum   age    bp   bp2005    pl   pl2005 |
     |---------------------------------------------------|
  1. |  1          1    40   115      112    54       81 |
  2. |  1          2    40    86      112    87       81 |
  3. |  1          3    40   129      112    93       81 |
  4. |  1          4    40   105      112    81       81 |
  5. |  1          5    40   127      112    92       81 |
     |---------------------------------------------------|
  6. |  2          1    30   123      119    92       90 |
  7. |  2          2    30   136      119    88       90 |
  8. |  2          3    30   107      119   125       90 |
  9. |  2          4    30   111      119    87       90 |
 10. |  2          5    30   120      119    58       90 |
     +---------------------------------------------------+

Problem 4: No id varible to identify the cross-sectional unit

If each wide-format observation is a unique cross-sectional unit, then we can use _n to generate a unique id variable.

cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell"
use cardio_wide3, clear
describe
gen idcode = _n
describe
reshape long bp pl, i(id idcode) j(trialnum 1-5)
describe
list in 1/10, sepby(id)

/Users/Sam/Desktop/Econ 645/Data/Mitchell



Contains data from cardio_wide3.dta
  obs:             6                          
 vars:            14                          31 Dec 2009 15:46
 size:           168                          
----------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
----------------------------------------------------------------------------------------------------------------------
id              byte    %3.0f                 ID of person
age             byte    %3.0f                 Age of person
bp1             int     %3.0f                 Blood pressure systolic Trial 1
bp2             int     %3.0f                 Blood pressure systolic Trial 2
bp3             int     %3.0f                 Blood pressure systolic Trial 3
bp4             int     %3.0f                 Blood pressure systolic Trial 4
bp5             int     %3.0f                 Blood pressure systolic Trial 5
bp2005          float   %9.0g                 
pl1             int     %3.0f                 Pulse: Trial 1
pl2             byte    %3.0f                 Pulse: Trial 2
pl3             int     %3.0f                 Pulse: Trial 3
pl4             int     %3.0f                 Pulse: Trial 4
pl5             byte    %3.0f                 Pulse: Trial 5
pl2005          float   %9.0g                 
----------------------------------------------------------------------------------------------------------------------
Sorted by: 



Contains data from cardio_wide3.dta
  obs:             6                          
 vars:            15                          31 Dec 2009 15:46
 size:           216                          
----------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
----------------------------------------------------------------------------------------------------------------------
id              byte    %3.0f                 ID of person
age             byte    %3.0f                 Age of person
bp1             int     %3.0f                 Blood pressure systolic Trial 1
bp2             int     %3.0f                 Blood pressure systolic Trial 2
bp3             int     %3.0f                 Blood pressure systolic Trial 3
bp4             int     %3.0f                 Blood pressure systolic Trial 4
bp5             int     %3.0f                 Blood pressure systolic Trial 5
bp2005          float   %9.0g                 
pl1             int     %3.0f                 Pulse: Trial 1
pl2             byte    %3.0f                 Pulse: Trial 2
pl3             int     %3.0f                 Pulse: Trial 3
pl4             int     %3.0f                 Pulse: Trial 4
pl5             byte    %3.0f                 Pulse: Trial 5
pl2005          float   %9.0g                 
idcode          double  %10.0g                
----------------------------------------------------------------------------------------------------------------------
Sorted by: 
     Note: Dataset has changed since last saved.


Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                        6   ->      30
Number of variables                  15   ->       8
j variable (5 values)                     ->   trialnum
xij variables:
                        bp1 bp2 ... bp5   ->   bp
                        pl1 pl2 ... pl5   ->   pl
-----------------------------------------------------------------------------


Contains data
  obs:            30                          
 vars:             8                          
 size:           690                          
----------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
----------------------------------------------------------------------------------------------------------------------
id              byte    %3.0f                 ID of person
idcode          double  %10.0g                
trialnum        byte    %9.0g                 
age             byte    %3.0f                 Age of person
bp              int     %3.0f                 
bp2005          float   %9.0g                 
pl              int     %3.0f                 
pl2005          float   %9.0g                 
----------------------------------------------------------------------------------------------------------------------
Sorted by: id  idcode  trialnum
     Note: Dataset has changed since last saved.

     +------------------------------------------------------------+
     | id   idcode   trialnum   age    bp   bp2005    pl   pl2005 |
     |------------------------------------------------------------|
  1. |  1        1          1    40   115      112    54       81 |
  2. |  1        1          2    40    86      112    87       81 |
  3. |  1        1          3    40   129      112    93       81 |
  4. |  1        1          4    40   105      112    81       81 |
  5. |  1        1          5    40   127      112    92       81 |
     |------------------------------------------------------------|
  6. |  2        2          1    30   123      119    92       90 |
  7. |  2        2          2    30   136      119    88       90 |
  8. |  2        2          3    30   107      119   125       90 |
  9. |  2        2          4    30   111      119    87       90 |
 10. |  2        2          5    30   120      119    58       90 |
     +------------------------------------------------------------+

4.6 Multilevel datasets

Mitchell discusses when we have multiple levels of data, such as cross-sectional unit i and time dimension t, where data is level 1 data are constant across time within unit, but vary across cross-sectional units and level 2 data are time-varying data that vary within a cross-sectional units

Sometimes we have multiple levels, such as school district, school, classroom, and student.

If you are interested, please review pages 311-314

4.7 Collapsing datasets

The collapse command can be useful finding statistics at aggregate levels for different groups. You don’t have to use egen but you can if you would like. After using the egen command, such as egen sumvar = sum(var), or egen avgvar = mean(var) the collapse command can be useful.

Let’s say we want to aggregate mean wages by state in the CPS, we can use the egen mean command to find average wages in the state for month m in year y. We can then keep state month year and avgvar for all 50 states for each month and year of interest.

We need to be careful with the collapse command, since your original data are lost from memory unless you use a tempfile or a data frame.

cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell"
use cardio_long, clear
list, sepby(id)

/Users/Sam/Desktop/Econ 645/Data/Mitchell

     +------------------------------+
     | id   trial   age    bp    pl |
     |------------------------------|
  1. |  1       1    40   115    54 |
  2. |  1       2    40    86    87 |
  3. |  1       3    40   129    93 |
  4. |  1       4    40   105    81 |
  5. |  1       5    40   127    92 |
     |------------------------------|
  6. |  2       1    30   123    92 |
  7. |  2       2    30   136    88 |
  8. |  2       3    30   107   125 |
  9. |  2       4    30   111    87 |
 10. |  2       5    30   120    58 |
     |------------------------------|
 11. |  3       1    16   124   105 |
 12. |  3       2    16   122    97 |
 13. |  3       3    16   101   128 |
 14. |  3       4    16   109    57 |
 15. |  3       5    16   112    68 |
     |------------------------------|
 16. |  4       1    23   105    52 |
 17. |  4       2    23   115    79 |
 18. |  4       3    23   121    71 |
 19. |  4       4    23   129   106 |
 20. |  4       5    23   137    39 |
     |------------------------------|
 21. |  5       1    18   116    70 |
 22. |  5       2    18   128    64 |
 23. |  5       3    18   112    52 |
 24. |  5       4    18   125    68 |
 25. |  5       5    18   111    59 |
     |------------------------------|
 26. |  6       1    27   108    74 |
 27. |  6       2    27   126    78 |
 28. |  6       3    27   124    92 |
 29. |  6       4    27   131    99 |
 30. |  6       5    27   107    80 |
     +------------------------------+

Let’s say we want to find mean blood pressure and pluse for each cross-sectional unit, then we can use the egen command or the collapse commands.

sort id
by id: egen meanbp = mean(bp)
by id: egen meanpl = mean(pl)
by id: egen maxbp = max(bp)
by id: egen maxpl = max(bp)
by id: egen minbp = min(bp)
by id: egen minpl = min(bp)
by id: egen medbp = pctile(bp), p(50)
by id: egen medpl = pctile(pl), p(50)

collapse meanbp meanpl maxbp maxpl minbp minpl medbp medpl, by(id)

Or you can just use the collapse command

cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell"
use cardio_long, clear

collapse (mean) meanbp=bp meanpl=pl (max) maxbp=bp maxpl=pl (min) minbp=bp minpl=pl (p50) medbp=bp medpl=pl, by(id)

/Users/Sam/Desktop/Econ 645/Data/Mitchell

Notice that our data are rounded to the nearest 1, unlike our egen commands, we can format our data to reflect that decimal point to two places instead of rounding up by default

cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell"
use cardio_long, clear
format bp* pl* %5.2f
collapse (mean) meanbp=bp meanpl=pl (max) maxbp=bp maxpl=pl (min) minbp=bp minpl=pl (p50) medbp=bp medpl=pl, by(id)

/Users/Sam/Desktop/Econ 645/Data/Mitchell

4.8 Exercise

Pull BLS monthly inflation data, reshape it into long, and merge with CPS