Chapter 4 Formatting the display of variables
Formatting data will be more common than you expect. It can be a pain when dealing with numbers in the millions or billions and you lack commas. We can format our data with the format command.
4.1 Format numerics
Let’s get our survey data and list the first 5 observations for id and income
Let’s look at the format of income.
cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell"
use survey5, clear
describe income
list id income in 1/5/Users/Sam/Desktop/Econ 645/Data/Mitchell
(Survey of graduate students)
storage display value
variable name type format label variable label
----------------------------------------------------------------------------------------------
income float %9.0g Income of student
+---------------+
| id income |
|---------------|
1. | 1 10500.93 |
2. | 2 45234.13 |
3. | 3 1284355 |
4. | 4 124313.5 |
5. | 5 120102.3 |
+---------------+
The format is %9.0g. We always have % in front of our format and g is a general way of displaying incomes using a width of nine digits and decides for us the best way to display the values. g means general here.
%
Note: %
%
%
%
The manual is helpful for formatting: https://www.stata.com/manuals/dformat.pdf
Example 1: format v1 %10.0g - Width of 10 digits and decimals will be decided.
Example 2: format v2 %4.1f - Show 3 digits in v3 and 1 decimal
Example 3: format v3 %6.1fc - Show 4 digits plus the comma plus 1 digit
Let’s get more control over the income format and use the %w.df format. We want a total of 12 digits with 2 decimals places, which means we have 10 digits on the left side of the “.”
| income |
|------------|
1. | 10500.93 |
2. | 45234.13 |
3. | 1284354.50 |
4. | 124313.45 |
5. | 120102.32 |
+------------+
Notice that we now can see observation #3’s decimal places.
If we don’t care to see the decimal place (even though it is still there).
| income |
|---------|
1. | 10501 |
2. | 45234 |
3. | 1284354 |
4. | 124313 |
5. | 120102 |
+---------+
We we want to see one decimal place
| income |
|-----------|
1. | 10500.9 |
2. | 45234.1 |
3. | 1284354.5 |
4. | 124313.5 |
5. | 120102.3 |
+-----------+
Now let’s add commas, but we need to add two additional digit widths for the commas and we’ll add two decimal places.
| income |
|--------------|
1. | 10,500.93 |
2. | 45,234.13 |
3. | 1,284,354.50 |
4. | 124,313.45 |
5. | 120,102.32 |
+--------------+
4.2 Format Strings
Let’s use the format command with strings.
storage display value
variable name type format label variable label
----------------------------------------------------------------------------------------------
kidname str10 %10s Name of child
The format is %10s, which is a (s)tring of 10 characters wide that is right-justified.
| kidname |
|-----------|
1. | |
2. | Sally |
3. | Catherine |
4. | |
5. | Samuell |
|-----------|
6. | |
7. | Robin |
8. | |
+-----------+
If we wanted to left-justify the string, we can add a ‘-’ in between % and #s.
| kidname |
|-----------|
1. | |
2. | Sally |
3. | Catherine |
4. | |
5. | Samuell |
|-----------|
6. | |
7. | Robin |
8. | |
+-----------+
4.3 Format dates
Dates in Stata are a bit of a pain, so learning how to format the dates will be helpful in the future.
| bdays kbdays |
|------------------------|
1. | 1/24/1961 |
2. | 4/15/1968 4/15/1995 |
3. | 5/23/1971 8/15/2003 |
4. | 6/25/1973 |
5. | 9/22/1981 1/12/1999 |
|------------------------|
6. | 10/15/1973 |
7. | 7/1/1977 5/20/1998 |
8. | 8/3/1976 |
+------------------------+
Our birthdays are in a MM/DD/YYYY format currently. Let’s generate a new variable with the date function. The date function will convert a string that is in a date format into a Stata date, but it still needs to be formatted. The option “MDY” tells Stata that the string is in the Month-Day-Year format and needs to be converted.
Let’s list the days.
| bdays bday kbdays kbday |
|---------------------------------------|
1. | 1/24/1961 389 . |
2. | 4/15/1968 3027 4/15/1995 12888 |
3. | 5/23/1971 4160 8/15/2003 15932 |
4. | 6/25/1973 4924 . |
5. | 9/22/1981 7935 1/12/1999 14256 |
|---------------------------------------|
6. | 10/15/1973 5036 . |
7. | 7/1/1977 6391 5/20/1998 14019 |
8. | 8/3/1976 6059 . |
+---------------------------------------+
The Stata dates are actually stored as the number of days from Jan 1, 1960. This method is convenient for the computer storing and performing date computations, but is difficult for us to read.
Let’s use the %td format - for example 01Jan2000
| bdays bday kbdays kbday |
|--------------------------------------------|
1. | 1/24/1961 24jan1961 . |
2. | 4/15/1968 15apr1968 4/15/1995 12888 |
3. | 5/23/1971 23may1971 8/15/2003 15932 |
4. | 6/25/1973 25jun1973 . |
5. | 9/22/1981 22sep1981 1/12/1999 14256 |
|--------------------------------------------|
6. | 10/15/1973 15oct1973 . |
7. | 7/1/1977 01jul1977 5/20/1998 14019 |
8. | 8/3/1976 03aug1976 . |
+--------------------------------------------+
Let’s use the %tdNN/DD/YY format…NN is used for 01-12 and nn is for 1-12,DD for the day 01-31, and YY is for the last two digits of the year.
| bdays bday kbdays kbday |
|-------------------------------------------|
1. | 1/24/1961 01/24/61 . |
2. | 4/15/1968 04/15/68 4/15/1995 12888 |
3. | 5/23/1971 05/23/71 8/15/2003 15932 |
4. | 6/25/1973 06/25/73 . |
5. | 9/22/1981 09/22/81 1/12/1999 14256 |
|-------------------------------------------|
6. | 10/15/1973 10/15/73 . |
7. | 7/1/1977 07/01/77 5/20/1998 14019 |
8. | 8/3/1976 08/03/76 . |
+-------------------------------------------+
Mon is Jan-Dec, and Month is January-December.
| bdays bday kbdays kbday |
|--------------------------------------------------|
1. | 1/24/1961 January/24/61 . |
2. | 4/15/1968 April/15/68 4/15/1995 12888 |
3. | 5/23/1971 May/23/71 8/15/2003 15932 |
4. | 6/25/1973 June/25/73 . |
5. | 9/22/1981 September/22/81 1/12/1999 14256 |
|--------------------------------------------------|
6. | 10/15/1973 October/15/73 . |
7. | 7/1/1977 July/01/77 5/20/1998 14019 |
8. | 8/3/1976 August/03/76 . |
+--------------------------------------------------+
We can use a standard Month DD, YYYY with the format %tdMonth_DD,CCYY. Where Month is the full name of the month, DD is our days in digits, and CC is Century, such as 19- and 20- and YY is 2-digit year, such as -88, -97
| bdays bday kbdays kbday |
|----------------------------------------------------|
1. | 1/24/1961 January 24,1961 . |
2. | 4/15/1968 April 15,1968 4/15/1995 12888 |
3. | 5/23/1971 May 23,1971 8/15/2003 15932 |
4. | 6/25/1973 June 25,1973 . |
5. | 9/22/1981 September 22,1981 1/12/1999 14256 |
|----------------------------------------------------|
6. | 10/15/1973 October 15,1973 . |
7. | 7/1/1977 July 01,1977 5/20/1998 14019 |
8. | 8/3/1976 August 03,1976 . |
+----------------------------------------------------+
Let’s use a standard format, but don’t use YYYY - it just repeats the 2-digit year twice.
| bdays bday kbdays kbday |
|---------------------------------------------|
1. | 1/24/1961 01/24/6161 . |
2. | 4/15/1968 04/15/6868 4/15/1995 12888 |
3. | 5/23/1971 05/23/7171 8/15/2003 15932 |
4. | 6/25/1973 06/25/7373 . |
5. | 9/22/1981 09/22/8181 1/12/1999 14256 |
|---------------------------------------------|
6. | 10/15/1973 10/15/7373 . |
7. | 7/1/1977 07/01/7777 5/20/1998 14019 |
8. | 8/3/1976 08/03/7676 . |
+---------------------------------------------+
Use %tdNN/DD/CCYY instead for the desired result.
format bday %tdNN/DD/CCYY
list bdays bday kbdays kbday
label variable bday "Date of birth of student"
label variable kbdays "Date of birth of child" | bdays bday kbdays kbday |
|---------------------------------------------|
1. | 1/24/1961 01/24/1961 . |
2. | 4/15/1968 04/15/1968 4/15/1995 12888 |
3. | 5/23/1971 05/23/1971 8/15/2003 15932 |
4. | 6/25/1973 06/25/1973 . |
5. | 9/22/1981 09/22/1981 1/12/1999 14256 |
|---------------------------------------------|
6. | 10/15/1973 10/15/1973 . |
7. | 7/1/1977 07/01/1977 5/20/1998 14019 |
8. | 8/3/1976 08/03/1976 . |
+---------------------------------------------+
bday and bdays are redundent and we’ll only keep one.