Chapter 4 Formatting the display of variables

Formatting data will be more common than you expect. It can be a pain when dealing with numbers in the millions or billions and you lack commas. We can format our data with the format command.

4.1 Format numerics

Let’s get our survey data and list the first 5 observations for id and income

Let’s look at the format of income.

cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell"
use survey5, clear
describe income
list id income in 1/5
/Users/Sam/Desktop/Econ 645/Data/Mitchell

(Survey of graduate students)

              storage   display    value
variable name   type    format     label      variable label
----------------------------------------------------------------------------------------------
income          float   %9.0g                 Income of student

     +---------------+
     | id     income |
     |---------------|
  1. |  1   10500.93 |
  2. |  2   45234.13 |
  3. |  3    1284355 |
  4. |  4   124313.5 |
  5. |  5   120102.3 |
     +---------------+

The format is %9.0g. We always have % in front of our format and g is a general way of displaying incomes using a width of nine digits and decides for us the best way to display the values. g means general here.

%.0g means general - find the best way to show the decimals.

Note: %.g will change the format to exponent if necessary. Also is usually set to 0 with g.

%.f means fixed - w is the width, d is the decimals, and f means fixed

%.fc means fixed with commas - w is the width, d is the decimals, f means fixed, and c means comma

%.0gc means general with commas - w is the width, setting 0 means Stata will decide the decimals, g means general, and c means comma

The manual is helpful for formatting: https://www.stata.com/manuals/dformat.pdf

Example 1: format v1 %10.0g - Width of 10 digits and decimals will be decided.

Example 2: format v2 %4.1f - Show 3 digits in v3 and 1 decimal

Example 3: format v3 %6.1fc - Show 4 digits plus the comma plus 1 digit

Let’s get more control over the income format and use the %w.df format. We want a total of 12 digits with 2 decimals places, which means we have 10 digits on the left side of the “.”

format income %12.2f
list income in 1/5
     |     income |
     |------------|
  1. |   10500.93 |
  2. |   45234.13 |
  3. | 1284354.50 |
  4. |  124313.45 |
  5. |  120102.32 |
     +------------+

Notice that we now can see observation #3’s decimal places.

If we don’t care to see the decimal place (even though it is still there).

format income %7.0f
list income in 1/5
     |  income |
     |---------|
  1. |   10501 |
  2. |   45234 |
  3. | 1284354 |
  4. |  124313 |
  5. |  120102 |
     +---------+

We we want to see one decimal place

format income %9.1f
list income in 1/5
     |    income |
     |-----------|
  1. |   10500.9 |
  2. |   45234.1 |
  3. | 1284354.5 |
  4. |  124313.5 |
  5. |  120102.3 |
     +-----------+

Now let’s add commas, but we need to add two additional digit widths for the commas and we’ll add two decimal places.

format income %12.2fc
list income in 1/5
     |       income |
     |--------------|
  1. |    10,500.93 |
  2. |    45,234.13 |
  3. | 1,284,354.50 |
  4. |   124,313.45 |
  5. |   120,102.32 |
     +--------------+

4.2 Format Strings

Let’s use the format command with strings.

describe kidname
              storage   display    value
variable name   type    format     label      variable label
----------------------------------------------------------------------------------------------
kidname         str10   %10s                  Name of child

The format is %10s, which is a (s)tring of 10 characters wide that is right-justified.

list kidname
     |   kidname |
     |-----------|
  1. |           |
  2. |     Sally |
  3. | Catherine |
  4. |           |
  5. |   Samuell |
     |-----------|
  6. |           |
  7. |     Robin |
  8. |           |
     +-----------+

If we wanted to left-justify the string, we can add a ‘-’ in between % and #s.

format kidname %-10s
list kidname
     | kidname   |
     |-----------|
  1. |           |
  2. | Sally     |
  3. | Catherine |
  4. |           |
  5. | Samuell   |
     |-----------|
  6. |           |
  7. | Robin     |
  8. |           |
     +-----------+

4.3 Format dates

Dates in Stata are a bit of a pain, so learning how to format the dates will be helpful in the future.

list bdays kbdays
     |      bdays      kbdays |
     |------------------------|
  1. |  1/24/1961             |
  2. |  4/15/1968   4/15/1995 |
  3. |  5/23/1971   8/15/2003 |
  4. |  6/25/1973             |
  5. |  9/22/1981   1/12/1999 |
     |------------------------|
  6. | 10/15/1973             |
  7. |   7/1/1977   5/20/1998 |
  8. |   8/3/1976             |
     +------------------------+

Our birthdays are in a MM/DD/YYYY format currently. Let’s generate a new variable with the date function. The date function will convert a string that is in a date format into a Stata date, but it still needs to be formatted. The option “MDY” tells Stata that the string is in the Month-Day-Year format and needs to be converted.

generate bday = date(bdays, "MDY")
generate kbday = date(kbdays, "MDY")

Let’s list the days.

list bdays bday kbdays kbday
     |      bdays   bday      kbdays   kbday |
     |---------------------------------------|
  1. |  1/24/1961    389                   . |
  2. |  4/15/1968   3027   4/15/1995   12888 |
  3. |  5/23/1971   4160   8/15/2003   15932 |
  4. |  6/25/1973   4924                   . |
  5. |  9/22/1981   7935   1/12/1999   14256 |
     |---------------------------------------|
  6. | 10/15/1973   5036                   . |
  7. |   7/1/1977   6391   5/20/1998   14019 |
  8. |   8/3/1976   6059                   . |
     +---------------------------------------+

The Stata dates are actually stored as the number of days from Jan 1, 1960. This method is convenient for the computer storing and performing date computations, but is difficult for us to read.

Let’s use the %td format - for example 01Jan2000

format bday %td 
list bdays bday kbdays kbday
     |      bdays        bday      kbdays   kbday |
     |--------------------------------------------|
  1. |  1/24/1961   24jan1961                   . |
  2. |  4/15/1968   15apr1968   4/15/1995   12888 |
  3. |  5/23/1971   23may1971   8/15/2003   15932 |
  4. |  6/25/1973   25jun1973                   . |
  5. |  9/22/1981   22sep1981   1/12/1999   14256 |
     |--------------------------------------------|
  6. | 10/15/1973   15oct1973                   . |
  7. |   7/1/1977   01jul1977   5/20/1998   14019 |
  8. |   8/3/1976   03aug1976                   . |
     +--------------------------------------------+

Let’s use the %tdNN/DD/YY format…NN is used for 01-12 and nn is for 1-12,DD for the day 01-31, and YY is for the last two digits of the year.

format bday %tdNN/DD/YY
list bdays bday kbdays kbday
     |      bdays       bday      kbdays   kbday |
     |-------------------------------------------|
  1. |  1/24/1961   01/24/61                   . |
  2. |  4/15/1968   04/15/68   4/15/1995   12888 |
  3. |  5/23/1971   05/23/71   8/15/2003   15932 |
  4. |  6/25/1973   06/25/73                   . |
  5. |  9/22/1981   09/22/81   1/12/1999   14256 |
     |-------------------------------------------|
  6. | 10/15/1973   10/15/73                   . |
  7. |   7/1/1977   07/01/77   5/20/1998   14019 |
  8. |   8/3/1976   08/03/76                   . |
     +-------------------------------------------+

Mon is Jan-Dec, and Month is January-December.

format bday %tdMonth/DD/YY 
list bdays bday kbdays kbday
     |      bdays              bday      kbdays   kbday |
     |--------------------------------------------------|
  1. |  1/24/1961     January/24/61                   . |
  2. |  4/15/1968       April/15/68   4/15/1995   12888 |
  3. |  5/23/1971         May/23/71   8/15/2003   15932 |
  4. |  6/25/1973        June/25/73                   . |
  5. |  9/22/1981   September/22/81   1/12/1999   14256 |
     |--------------------------------------------------|
  6. | 10/15/1973     October/15/73                   . |
  7. |   7/1/1977        July/01/77   5/20/1998   14019 |
  8. |   8/3/1976      August/03/76                   . |
     +--------------------------------------------------+

We can use a standard Month DD, YYYY with the format %tdMonth_DD,CCYY. Where Month is the full name of the month, DD is our days in digits, and CC is Century, such as 19- and 20- and YY is 2-digit year, such as -88, -97

format bday %tdMonth_DD,CCYY
list bdays bday kbdays kbday
     |      bdays                bday      kbdays   kbday |
     |----------------------------------------------------|
  1. |  1/24/1961     January 24,1961                   . |
  2. |  4/15/1968       April 15,1968   4/15/1995   12888 |
  3. |  5/23/1971         May 23,1971   8/15/2003   15932 |
  4. |  6/25/1973        June 25,1973                   . |
  5. |  9/22/1981   September 22,1981   1/12/1999   14256 |
     |----------------------------------------------------|
  6. | 10/15/1973     October 15,1973                   . |
  7. |   7/1/1977        July 01,1977   5/20/1998   14019 |
  8. |   8/3/1976      August 03,1976                   . |
     +----------------------------------------------------+

Let’s use a standard format, but don’t use YYYY - it just repeats the 2-digit year twice.

format bday %tdNN/DD/YYYY
list bdays bday kbdays kbday
     |      bdays         bday      kbdays   kbday |
     |---------------------------------------------|
  1. |  1/24/1961   01/24/6161                   . |
  2. |  4/15/1968   04/15/6868   4/15/1995   12888 |
  3. |  5/23/1971   05/23/7171   8/15/2003   15932 |
  4. |  6/25/1973   06/25/7373                   . |
  5. |  9/22/1981   09/22/8181   1/12/1999   14256 |
     |---------------------------------------------|
  6. | 10/15/1973   10/15/7373                   . |
  7. |   7/1/1977   07/01/7777   5/20/1998   14019 |
  8. |   8/3/1976   08/03/7676                   . |
     +---------------------------------------------+

Use %tdNN/DD/CCYY instead for the desired result.

format bday %tdNN/DD/CCYY
list bdays bday kbdays kbday

label variable bday "Date of birth of student"
label variable kbdays "Date of birth of child"
     |      bdays         bday      kbdays   kbday |
     |---------------------------------------------|
  1. |  1/24/1961   01/24/1961                   . |
  2. |  4/15/1968   04/15/1968   4/15/1995   12888 |
  3. |  5/23/1971   05/23/1971   8/15/2003   15932 |
  4. |  6/25/1973   06/25/1973                   . |
  5. |  9/22/1981   09/22/1981   1/12/1999   14256 |
     |---------------------------------------------|
  6. | 10/15/1973   10/15/1973                   . |
  7. |   7/1/1977   07/01/1977   5/20/1998   14019 |
  8. |   8/3/1976   08/03/1976                   . |
     +---------------------------------------------+

bday and bdays are redundent and we’ll only keep one.

drop bday kbdays
save survey6, replace