Chapter 5 Changing the order of variables in a dataset

I personally find reordering the order of variables with the order command to be useful. This is especially true when working with panel data. I like to order the panel data to have the cross-sectional unit first, such as personal id, firm id, etc.first and then have the time period second, so we have our N and T next to one another.

Let’s pull our survey of graduate students and describe our dataset

cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell"
use survey6, clear
describe
/Users/Sam/Desktop/Econ 645/Data/Mitchell

(Survey of graduate students)


Contains data from survey6.dta
  obs:             8                          Survey of graduate students
 vars:             9                          11 Mar 2024 14:40
 size:           416                          (_dta has notes)
----------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
----------------------------------------------------------------------------------------------
id              float   %9.0g                 Unique identification variable
gender          float   %9.0g      mf         Gender of student
race            float   %19.0g     racelab  * Race of student
havechild       float   %18.0g     havelab  * Given birth to a child?
ksex            float   %15.0g     mfkid    * Sex of child
bdays           str10   %10s                  Birthday of student
income          float   %12.2fc               Income of student
kidname         str10   %-10s                 Name of child
kbday           double  %td                   
                                            * indicated variables have notes
----------------------------------------------------------------------------------------------
Sorted by: 

We might want to group our variables with similar types of variables. This can be helpful when you have a large dataset with hundreds of variables, such as the CPS.

order id gender race bday income havechild
describe
Contains data from survey6.dta
  obs:             8                          Survey of graduate students
 vars:             9                          11 Mar 2024 14:40
 size:           416                          (_dta has notes)
----------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
----------------------------------------------------------------------------------------------
id              float   %9.0g                 Unique identification variable
gender          float   %9.0g      mf         Gender of student
race            float   %19.0g     racelab  * Race of student
bdays           str10   %10s                  Birthday of student
income          float   %12.2fc               Income of student
havechild       float   %18.0g     havelab  * Given birth to a child?
ksex            float   %15.0g     mfkid    * Sex of child
kidname         str10   %-10s                 Name of child
kbday           double  %td                   
                                            * indicated variables have notes
----------------------------------------------------------------------------------------------
Sorted by: 

The variables that we leave off will remain in the same order as before after the new variables are moved to the left.

With the before option, we can move variable(s) before a defined variable. Let’s move kidname before ksex

order kidname, before(ksex)
describe
Contains data from survey6.dta
  obs:             8                          Survey of graduate students
 vars:             9                          11 Mar 2024 14:40
 size:           416                          (_dta has notes)
----------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
----------------------------------------------------------------------------------------------
id              float   %9.0g                 Unique identification variable
gender          float   %9.0g      mf         Gender of student
race            float   %19.0g     racelab  * Race of student
bdays           str10   %10s                  Birthday of student
income          float   %12.2fc               Income of student
havechild       float   %18.0g     havelab  * Given birth to a child?
kidname         str10   %-10s                 Name of child
ksex            float   %15.0g     mfkid    * Sex of child
kbday           double  %td                   
                                            * indicated variables have notes
----------------------------------------------------------------------------------------------
Sorted by: 

We can move newly created variables with the before and after options with the generate command

generate STUDENTVARS = ., before(gender)
generate KIDSVARS = ., after(havechild)
describe
(8 missing values generated)

(8 missing values generated)


Contains data from survey6.dta
  obs:             8                          Survey of graduate students
 vars:            11                          11 Mar 2024 14:40
 size:           544                          (_dta has notes)
----------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
----------------------------------------------------------------------------------------------
id              float   %9.0g                 Unique identification variable
STUDENTVARS     double  %10.0g                
gender          float   %9.0g      mf         Gender of student
race            float   %19.0g     racelab  * Race of student
bdays           str10   %10s                  Birthday of student
income          float   %12.2fc               Income of student
havechild       float   %18.0g     havelab  * Given birth to a child?
KIDSVARS        double  %10.0g                
kidname         str10   %-10s                 Name of child
ksex            float   %15.0g     mfkid    * Sex of child
kbday           double  %td                   
                                            * indicated variables have notes
----------------------------------------------------------------------------------------------
Sorted by: 
     Note: Dataset has changed since last saved.

5.1 Practice

Let’s bring in the CPS: https://www.census.gov/data/datasets/time-series/demo/cps/cps-basic.html

cd "/Users/Sam/Desktop/Econ 645/Data/CPS/
use jul25pub.dta, clear
  1. Generate a new variable from pemlr called employed where employed = 1 if the individual is employed (present or absent) and employed = 0 if the individual is unemployed. The value should be missing if the individual is not in the labor force.
  2. Label the variable “Currently employed”.
  3. Label the values for 0 “Not employed” 1 “Employed” . “Not in the Labor Force”.
  4. Move the variable after pemlr.
  5. Generate a date that appends hrmonth (month of interview), the string “12”, and the hryear4 (year of interview). We use 12 because the week of the 12th is the reference period.
  6. Now format the date so it is like 07/12/2025