Chapter 2 Ingesting Data
Mitchell Chapter 2
Note when your working directory, I recommend using “/” instead of “" Stata will use both, and the backslash”" often runs into string issues.
2.2 Reading Stata data
We use the use command to read Stata data. Use the clear option to clear the data from memory so you can read the data file. Otherwise, you will get an error
We’ll add the clear option to clear our memory to provide an error
/Users/Sam/Desktop/Econ 645/Data/Mitchell
+----------------------------------------------+
| name years fulltime recom |
|----------------------------------------------|
1. | Y. Don Uflossmore 7.25 0 1 |
2. | Olive Tu'Drill 10.25 1 1 |
3. | Isaac O'Yerbreath 32.75 1 1 |
4. | Ruth Canaale 22 1 1 |
5. | Mike Avity 8.5 0 0 |
+----------------------------------------------+
We can read Stata data from websources. From Stata Press website:
| name years fulltime recom |
|----------------------------------------------|
1. | Y. Don Uflossmore 7.25 0 1 |
2. | Olive Tu'Drill 10.25 1 1 |
3. | Isaac O'Yerbreath 32.75 1 1 |
4. | Ruth Canaale 22 1 1 |
5. | Mike Avity 8.5 0 0 |
+----------------------------------------------+
From NBER website
| lfsr94 stfips sex |
|--------------------------------------------|
1. | Employed-At Work AL 2 |
2. | Employed-At Work AL 1 |
3. | Retired-Not In Labor Force AL 1 |
4. | Retired-Not In Labor Force AL 2 |
5. | Other-Not In Labor Force AL 2 |
|--------------------------------------------|
6. | Disabled-Not In Labor Force AL 1 |
7. | Retired-Not In Labor Force AL 1 |
8. | Retired-Not In Labor Force AL 2 |
9. | Retired-Not In Labor Force AL 2 |
10. | Retired-Not In Labor Force AL 1 |
+--------------------------------------------+
2.3 Reading Stata data with subsets
Subsetting can be a helpful method to read in extremely large data sets, such as American Community Survey (ACS) public use file. We can subset by variables, observations, or both.
2.3.1 Subsetting by columns (variables)
You can read in a subset of Stata data by columns. This can be helpful when you have a large dataset but only need a few columns of data. However, you need to know the variable names. We will need to have a data dictionary in order to do this. The January CPS data dictionary can be found on Census’s website. https://www2.census.gov/programs-surveys/cps/datasets/2025/basic/2025_Basic_CPS_Public_Use_Record_Layout_plus_IO_Code_list.txt
quietly cd "/Users/Sam/Desktop/Data/CPS/"
use gestfips pemlr pternwa using "smalljul25pub.dta", clear
list in 1/3 | gestfips pemlr pternwa |
|----------------------------|
1. | 1 . . |
2. | 1 5 -1 |
3. | 1 6 -1 |
+----------------------------+
If it is a large dataset, I don’t recommend using list by itself. Use in 1/N
2.3.2 Subsetting by rows (observations)
You can read in a subset of data by rows. In my opinion, this is a better method to subset than by columns. You are able to see the entire dataset. You can use a qualifiers if or in
quietly cd "/Users/Sam/Desktop/Data/CPS/"
use "smalljul25pub.dta" if gestfip == 24 & prtage == 30, clear
list in 1/10 1. | hrhhid2 | hrmis | hrmonth | hryear4 | gestfips | gediv | pulineno | pudis | puiodp1 | puiodp2 | puiodp3 |
| 17011 | 7 | 7 | 2025 | 24 | 5 | 3 | -1 | -1 | -1 | -1 |
|-------------------------------------+----------+---------------------------------------------------------|
| perrp | pesex | pemaritl | pehspnon | peeduca | peafever | ptdtrace | prpertyp | prtage | pehractt |
| 49 | 1 | 6 | 2 | 40 | 2 | 2 | 2 | 30 | 12 |
|------------------------------------------------------------------------------------+---------------------|
| pehract1 | pehract2 | pemlr | peio1icd | peio1cow | peio2cow | prmjind1 | prmjind2 | prmjocc1 | prmjocc2 |
| 12 | -1 | 1 | 6380 | 4 | -1 | 6 | -1 | 5 | -1 |
|----------+----------+-------------------------------------------------------------------------+----------|
| prdtind1 | prdtind2 | prdtocc1 | prdtocc2 | prsjmj | peernhry | peernhro | peernlab | prerelg | prdisflg |
| 23 | -1 | 17 | -1 | 1 | -1 | -1 | -1 | 0 | 2 |
|----------+----------+----------+-------------------------------------------------------------------------|
| pecert1 | pecert2 | pecert3 | pwcmpwgt | pternwa | ptio1ocd | hrhhid | ptwk | gtmetsta |
| 2 | -1 | -1 | 99149562 | -1 | 5510 | 9.401e+14 | 0 | 1 |
+----------------------------------------------------------------------------------------------------------+
+----------------------------------------------------------------------------------------------------------+
2. | hrhhid2 | hrmis | hrmonth | hryear4 | gestfips | gediv | pulineno | pudis | puiodp1 | puiodp2 | puiodp3 |
| 17011 | 6 | 7 | 2025 | 24 | 5 | 2 | -1 | 1 | 2 | 1 |
|-------------------------------------+----------+---------------------------------------------------------|
| perrp | pesex | pemaritl | pehspnon | peeduca | peafever | ptdtrace | prpertyp | prtage | pehractt |
| 51 | 1 | 1 | 2 | 39 | 2 | 4 | 2 | 30 | 52 |
|------------------------------------------------------------------------------------+---------------------|
| pehract1 | pehract2 | pemlr | peio1icd | peio1cow | peio2cow | prmjind1 | prmjind2 | prmjocc1 | prmjocc2 |
| 44 | 8 | 1 | 770 | 4 | -1 | 3 | -1 | 8 | -1 |
|----------+----------+-------------------------------------------------------------------------+----------|
| prdtind1 | prdtind2 | prdtocc1 | prdtocc2 | prsjmj | peernhry | peernhro | peernlab | prerelg | prdisflg |
| 4 | -1 | 20 | -1 | 2 | -1 | -1 | -1 | 0 | 2 |
|----------+----------+----------+-------------------------------------------------------------------------|
| pecert1 | pecert2 | pecert3 | pwcmpwgt | pternwa | ptio1ocd | hrhhid | ptwk | gtmetsta |
| 2 | -1 | -1 | 48285071 | -1 | 7315 | 4.700e+13 | 0 | 1 |
+----------------------------------------------------------------------------------------------------------+
+----------------------------------------------------------------------------------------------------------+
3. | hrhhid2 | hrmis | hrmonth | hryear4 | gestfips | gediv | pulineno | pudis | puiodp1 | puiodp2 | puiodp3 |
| 17012 | 8 | 7 | 2025 | 24 | 5 | 1 | -1 | 1 | 2 | 1 |
|-------------------------------------+----------+---------------------------------------------------------|
| perrp | pesex | pemaritl | pehspnon | peeduca | peafever | ptdtrace | prpertyp | prtage | pehractt |
| 41 | 1 | 6 | 2 | 42 | 2 | 2 | 2 | 30 | 40 |
|------------------------------------------------------------------------------------+---------------------|
| pehract1 | pehract2 | pemlr | peio1icd | peio1cow | peio2cow | prmjind1 | prmjind2 | prmjocc1 | prmjocc2 |
| 40 | -1 | 1 | 7280 | 4 | -1 | 9 | -1 | 5 | -1 |
|----------+----------+-------------------------------------------------------------------------+----------|
| prdtind1 | prdtind2 | prdtocc1 | prdtocc2 | prsjmj | peernhry | peernhro | peernlab | prerelg | prdisflg |
| 36 | -1 | 17 | -1 | 1 | 2 | -1 | 2 | 1 | 2 |
|----------+----------+----------+-------------------------------------------------------------------------|
| pecert1 | pecert2 | pecert3 | pwcmpwgt | pternwa | ptio1ocd | hrhhid | ptwk | gtmetsta |
| 2 | -1 | -1 | 113138434 | 183000 | 5120 | 3.411e+14 | 0 | 1 |
+----------------------------------------------------------------------------------------------------------+
+----------------------------------------------------------------------------------------------------------+
4. | hrhhid2 | hrmis | hrmonth | hryear4 | gestfips | gediv | pulineno | pudis | puiodp1 | puiodp2 | puiodp3 |
| 19011 | 3 | 7 | 2025 | 24 | 5 | 4 | -1 | -1 | -1 | -1 |
|-------------------------------------+----------+---------------------------------------------------------|
| perrp | pesex | pemaritl | pehspnon | peeduca | peafever | ptdtrace | prpertyp | prtage | pehractt |
| 49 | 1 | 6 | 2 | 39 | 2 | 1 | 2 | 30 | 32 |
|------------------------------------------------------------------------------------+---------------------|
| pehract1 | pehract2 | pemlr | peio1icd | peio1cow | peio2cow | prmjind1 | prmjind2 | prmjocc1 | prmjocc2 |
| 32 | -1 | 1 | 770 | 4 | -1 | 3 | -1 | 10 | -1 |
|----------+----------+-------------------------------------------------------------------------+----------|
| prdtind1 | prdtind2 | prdtocc1 | prdtocc2 | prsjmj | peernhry | peernhro | peernlab | prerelg | prdisflg |
| 4 | -1 | 22 | -1 | 1 | -1 | -1 | -1 | 0 | 2 |
|----------+----------+----------+-------------------------------------------------------------------------|
| pecert1 | pecert2 | pecert3 | pwcmpwgt | pternwa | ptio1ocd | hrhhid | ptwk | gtmetsta |
| 2 | -1 | -1 | 44333596 | -1 | 9130 | 9.007e+14 | 0 | 1 |
+----------------------------------------------------------------------------------------------------------+
+----------------------------------------------------------------------------------------------------------+
5. | hrhhid2 | hrmis | hrmonth | hryear4 | gestfips | gediv | pulineno | pudis | puiodp1 | puiodp2 | puiodp3 |
| 19011 | 2 | 7 | 2025 | 24 | 5 | 1 | -1 | 1 | 2 | 1 |
|-------------------------------------+----------+---------------------------------------------------------|
| perrp | pesex | pemaritl | pehspnon | peeduca | peafever | ptdtrace | prpertyp | prtage | pehractt |
| 40 | 1 | 1 | 2 | 39 | 2 | 1 | 2 | 30 | 50 |
|------------------------------------------------------------------------------------+---------------------|
| pehract1 | pehract2 | pemlr | peio1icd | peio1cow | peio2cow | prmjind1 | prmjind2 | prmjocc1 | prmjocc2 |
| 50 | -1 | 1 | 170 | 4 | -1 | 1 | -1 | 6 | -1 |
|----------+----------+-------------------------------------------------------------------------+----------|
| prdtind1 | prdtind2 | prdtocc1 | prdtocc2 | prsjmj | peernhry | peernhro | peernlab | prerelg | prdisflg |
| 1 | -1 | 18 | -1 | 1 | -1 | -1 | -1 | 0 | 2 |
|----------+----------+----------+-------------------------------------------------------------------------|
| pecert1 | pecert2 | pecert3 | pwcmpwgt | pternwa | ptio1ocd | hrhhid | ptwk | gtmetsta |
| 2 | -1 | -1 | 31435370 | -1 | 6050 | 9.450e+14 | 0 | 1 |
+----------------------------------------------------------------------------------------------------------+
+----------------------------------------------------------------------------------------------------------+
6. | hrhhid2 | hrmis | hrmonth | hryear4 | gestfips | gediv | pulineno | pudis | puiodp1 | puiodp2 | puiodp3 |
| 17011 | 5 | 7 | 2025 | 24 | 5 | 1 | -1 | -1 | -1 | -1 |
|-------------------------------------+----------+---------------------------------------------------------|
| perrp | pesex | pemaritl | pehspnon | peeduca | peafever | ptdtrace | prpertyp | prtage | pehractt |
| 41 | 1 | 6 | 1 | 45 | 2 | 4 | 2 | 30 | 40 |
|------------------------------------------------------------------------------------+---------------------|
| pehract1 | pehract2 | pemlr | peio1icd | peio1cow | peio2cow | prmjind1 | prmjind2 | prmjocc1 | prmjocc2 |
| 40 | -1 | 1 | 7980 | 4 | -1 | 10 | -1 | 2 | -1 |
|----------+----------+-------------------------------------------------------------------------+----------|
| prdtind1 | prdtind2 | prdtocc1 | prdtocc2 | prsjmj | peernhry | peernhro | peernlab | prerelg | prdisflg |
| 42 | -1 | 10 | -1 | 1 | -1 | -1 | -1 | 0 | 2 |
|----------+----------+----------+-------------------------------------------------------------------------|
| pecert1 | pecert2 | pecert3 | pwcmpwgt | pternwa | ptio1ocd | hrhhid | ptwk | gtmetsta |
| 1 | 1 | 1 | 75007444 | -1 | 3010 | 4.260e+14 | 0 | 1 |
+----------------------------------------------------------------------------------------------------------+
+----------------------------------------------------------------------------------------------------------+
7. | hrhhid2 | hrmis | hrmonth | hryear4 | gestfips | gediv | pulineno | pudis | puiodp1 | puiodp2 | puiodp3 |
| 19011 | 3 | 7 | 2025 | 24 | 5 | 2 | 1 | -1 | -1 | -1 |
|-------------------------------------+----------+---------------------------------------------------------|
| perrp | pesex | pemaritl | pehspnon | peeduca | peafever | ptdtrace | prpertyp | prtage | pehractt |
| 58 | 1 | 6 | 2 | 39 | 2 | 2 | 2 | 30 | -1 |
|------------------------------------------------------------------------------------+---------------------|
| pehract1 | pehract2 | pemlr | peio1icd | peio1cow | peio2cow | prmjind1 | prmjind2 | prmjocc1 | prmjocc2 |
| -1 | -1 | 6 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
|----------+----------+-------------------------------------------------------------------------+----------|
| prdtind1 | prdtind2 | prdtocc1 | prdtocc2 | prsjmj | peernhry | peernhro | peernlab | prerelg | prdisflg |
| -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | 0 | 1 |
|----------+----------+----------+-------------------------------------------------------------------------|
| pecert1 | pecert2 | pecert3 | pwcmpwgt | pternwa | ptio1ocd | hrhhid | ptwk | gtmetsta |
| 2 | -1 | -1 | 106522944 | -1 | -1 | 6.604e+14 | 0 | 1 |
+----------------------------------------------------------------------------------------------------------+
+----------------------------------------------------------------------------------------------------------+
8. | hrhhid2 | hrmis | hrmonth | hryear4 | gestfips | gediv | pulineno | pudis | puiodp1 | puiodp2 | puiodp3 |
| 17011 | 6 | 7 | 2025 | 24 | 5 | 1 | -1 | -1 | -1 | -1 |
|-------------------------------------+----------+---------------------------------------------------------|
| perrp | pesex | pemaritl | pehspnon | peeduca | peafever | ptdtrace | prpertyp | prtage | pehractt |
| 40 | 1 | 6 | 2 | 43 | 2 | 1 | 2 | 30 | 43 |
|------------------------------------------------------------------------------------+---------------------|
| pehract1 | pehract2 | pemlr | peio1icd | peio1cow | peio2cow | prmjind1 | prmjind2 | prmjocc1 | prmjocc2 |
| 43 | -1 | 1 | 6481 | 4 | -1 | 7 | -1 | 4 | -1 |
|----------+----------+-------------------------------------------------------------------------+----------|
| prdtind1 | prdtind2 | prdtocc1 | prdtocc2 | prsjmj | peernhry | peernhro | peernlab | prerelg | prdisflg |
| 25 | -1 | 16 | -1 | 1 | -1 | -1 | -1 | 0 | 2 |
|----------+----------+----------+-------------------------------------------------------------------------|
| pecert1 | pecert2 | pecert3 | pwcmpwgt | pternwa | ptio1ocd | hrhhid | ptwk | gtmetsta |
| 2 | -1 | -1 | 34698967 | -1 | 4850 | 4.223e+14 | 0 | 1 |
+----------------------------------------------------------------------------------------------------------+
+----------------------------------------------------------------------------------------------------------+
9. | hrhhid2 | hrmis | hrmonth | hryear4 | gestfips | gediv | pulineno | pudis | puiodp1 | puiodp2 | puiodp3 |
| 19011 | 3 | 7 | 2025 | 24 | 5 | 1 | -1 | 1 | 2 | 1 |
|-------------------------------------+----------+---------------------------------------------------------|
| perrp | pesex | pemaritl | pehspnon | peeduca | peafever | ptdtrace | prpertyp | prtage | pehractt |
| 41 | 1 | 6 | 1 | 43 | 2 | 1 | 2 | 30 | 50 |
|------------------------------------------------------------------------------------+---------------------|
| pehract1 | pehract2 | pemlr | peio1icd | peio1cow | peio2cow | prmjind1 | prmjind2 | prmjocc1 | prmjocc2 |
| 50 | -1 | 1 | 7071 | 4 | -1 | 8 | -1 | 1 | -1 |
|----------+----------+-------------------------------------------------------------------------+----------|
| prdtind1 | prdtind2 | prdtocc1 | prdtocc2 | prsjmj | peernhry | peernhro | peernlab | prerelg | prdisflg |
| 34 | -1 | 2 | -1 | 1 | -1 | -1 | -1 | 0 | 2 |
|----------+----------+----------+-------------------------------------------------------------------------|
| pecert1 | pecert2 | pecert3 | pwcmpwgt | pternwa | ptio1ocd | hrhhid | ptwk | gtmetsta |
| 2 | -1 | -1 | 50143408 | -1 | 800 | 1.913e+14 | 0 | 1 |
+----------------------------------------------------------------------------------------------------------+
+----------------------------------------------------------------------------------------------------------+
10. | hrhhid2 | hrmis | hrmonth | hryear4 | gestfips | gediv | pulineno | pudis | puiodp1 | puiodp2 | puiodp3 |
| 17011 | 5 | 7 | 2025 | 24 | 5 | 1 | -1 | -1 | -1 | -1 |
|-------------------------------------+----------+---------------------------------------------------------|
| perrp | pesex | pemaritl | pehspnon | peeduca | peafever | ptdtrace | prpertyp | prtage | pehractt |
| 40 | 2 | 1 | 2 | 44 | 2 | 1 | 2 | 30 | 5 |
|------------------------------------------------------------------------------------+---------------------|
| pehract1 | pehract2 | pemlr | peio1icd | peio1cow | peio2cow | prmjind1 | prmjind2 | prmjocc1 | prmjocc2 |
| 5 | -1 | 1 | 7460 | 5 | -1 | 9 | -1 | 2 | -1 |
|----------+----------+-------------------------------------------------------------------------+----------|
| prdtind1 | prdtind2 | prdtocc1 | prdtocc2 | prsjmj | peernhry | peernhro | peernlab | prerelg | prdisflg |
| 36 | -1 | 3 | -1 | 1 | -1 | -1 | -1 | 0 | 2 |
|----------+----------+----------+-------------------------------------------------------------------------|
| pecert1 | pecert2 | pecert3 | pwcmpwgt | pternwa | ptio1ocd | hrhhid | ptwk | gtmetsta |
| 2 | -1 | -1 | 87833706 | -1 | 1006 | 2.911e+14 | 0 | 1 |
+----------------------------------------------------------------------------------------------------------+
1. | hrhhid2 | hrmis | hrmonth | hryear4 | gestfips | gediv | pulineno | pudis | puiodp1 | puiodp2 | puiodp3 |
| 17011 | 8 | 7 | 2025 | 1 | 6 | . | . | . | . | . |
|-------------------------------------+----------+---------------------------------------------------------|
| perrp | pesex | pemaritl | pehspnon | peeduca | peafever | ptdtrace | prpertyp | prtage | pehractt |
| . | . | . | . | . | . | . | . | . | . |
|------------------------------------------------------------------------------------+---------------------|
| pehract1 | pehract2 | pemlr | peio1icd | peio1cow | peio2cow | prmjind1 | prmjind2 | prmjocc1 | prmjocc2 |
| . | . | . | . | . | . | . | . | . | . |
|----------+----------+-------------------------------------------------------------------------+----------|
| prdtind1 | prdtind2 | prdtocc1 | prdtocc2 | prsjmj | peernhry | peernhro | peernlab | prerelg | prdisflg |
| . | . | . | . | . | . | . | . | . | . |
|----------+----------+----------+-------------------------------------------------------------------------|
| pecert1 | pecert2 | pecert3 | pwcmpwgt | pternwa | ptio1ocd | hrhhid | ptwk | gtmetsta |
| . | . | . | . | . | . | 1.617e+14 | 0 | 1 |
+----------------------------------------------------------------------------------------------------------+
+----------------------------------------------------------------------------------------------------------+
2. | hrhhid2 | hrmis | hrmonth | hryear4 | gestfips | gediv | pulineno | pudis | puiodp1 | puiodp2 | puiodp3 |
| 17011 | 8 | 7 | 2025 | 1 | 6 | 1 | -1 | -1 | -1 | -1 |
|-------------------------------------+----------+---------------------------------------------------------|
| perrp | pesex | pemaritl | pehspnon | peeduca | peafever | ptdtrace | prpertyp | prtage | pehractt |
| 40 | 2 | 1 | 2 | 43 | 2 | 1 | 2 | 61 | -1 |
|------------------------------------------------------------------------------------+---------------------|
| pehract1 | pehract2 | pemlr | peio1icd | peio1cow | peio2cow | prmjind1 | prmjind2 | prmjocc1 | prmjocc2 |
| -1 | -1 | 5 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
|----------+----------+-------------------------------------------------------------------------+----------|
| prdtind1 | prdtind2 | prdtocc1 | prdtocc2 | prsjmj | peernhry | peernhro | peernlab | prerelg | prdisflg |
| -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | 0 | 2 |
|----------+----------+----------+-------------------------------------------------------------------------|
| pecert1 | pecert2 | pecert3 | pwcmpwgt | pternwa | ptio1ocd | hrhhid | ptwk | gtmetsta |
| 2 | -1 | -1 | 18467177 | -1 | -1 | 1.090e+14 | 0 | 2 |
+----------------------------------------------------------------------------------------------------------+
+----------------------------------------------------------------------------------------------------------+
3. | hrhhid2 | hrmis | hrmonth | hryear4 | gestfips | gediv | pulineno | pudis | puiodp1 | puiodp2 | puiodp3 |
| 17011 | 8 | 7 | 2025 | 1 | 6 | 2 | 1 | -1 | -1 | -1 |
|-------------------------------------+----------+---------------------------------------------------------|
| perrp | pesex | pemaritl | pehspnon | peeduca | peafever | ptdtrace | prpertyp | prtage | pehractt |
| 42 | 1 | 1 | 2 | 43 | 1 | 1 | 2 | 61 | -1 |
|------------------------------------------------------------------------------------+---------------------|
| pehract1 | pehract2 | pemlr | peio1icd | peio1cow | peio2cow | prmjind1 | prmjind2 | prmjocc1 | prmjocc2 |
| -1 | -1 | 6 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
|----------+----------+-------------------------------------------------------------------------+----------|
| prdtind1 | prdtind2 | prdtocc1 | prdtocc2 | prsjmj | peernhry | peernhro | peernlab | prerelg | prdisflg |
| -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | 0 | 2 |
|----------+----------+----------+-------------------------------------------------------------------------|
| pecert1 | pecert2 | pecert3 | pwcmpwgt | pternwa | ptio1ocd | hrhhid | ptwk | gtmetsta |
| 2 | -1 | -1 | 27264237 | -1 | -1 | 1.090e+14 | 0 | 2 |
+----------------------------------------------------------------------------------------------------------+
+----------------------------------------------------------------------------------------------------------+
4. | hrhhid2 | hrmis | hrmonth | hryear4 | gestfips | gediv | pulineno | pudis | puiodp1 | puiodp2 | puiodp3 |
| 17011 | 8 | 7 | 2025 | 1 | 6 | 1 | -1 | -1 | -1 | -1 |
|-------------------------------------+----------+---------------------------------------------------------|
| perrp | pesex | pemaritl | pehspnon | peeduca | peafever | ptdtrace | prpertyp | prtage | pehractt |
| 40 | 1 | 1 | 2 | 43 | 1 | 1 | 2 | 66 | -1 |
|------------------------------------------------------------------------------------+---------------------|
| pehract1 | pehract2 | pemlr | peio1icd | peio1cow | peio2cow | prmjind1 | prmjind2 | prmjocc1 | prmjocc2 |
| -1 | -1 | 5 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
|----------+----------+-------------------------------------------------------------------------+----------|
| prdtind1 | prdtind2 | prdtocc1 | prdtocc2 | prsjmj | peernhry | peernhro | peernlab | prerelg | prdisflg |
| -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | 0 | 2 |
|----------+----------+----------+-------------------------------------------------------------------------|
| pecert1 | pecert2 | pecert3 | pwcmpwgt | pternwa | ptio1ocd | hrhhid | ptwk | gtmetsta |
| 2 | -1 | -1 | 24488360 | -1 | -1 | 6.101e+14 | 0 | 2 |
+----------------------------------------------------------------------------------------------------------+
+----------------------------------------------------------------------------------------------------------+
5. | hrhhid2 | hrmis | hrmonth | hryear4 | gestfips | gediv | pulineno | pudis | puiodp1 | puiodp2 | puiodp3 |
| 17011 | 8 | 7 | 2025 | 1 | 6 | 2 | -1 | 1 | 2 | 1 |
|-------------------------------------+----------+---------------------------------------------------------|
| perrp | pesex | pemaritl | pehspnon | peeduca | peafever | ptdtrace | prpertyp | prtage | pehractt |
| 42 | 2 | 1 | 2 | 39 | 2 | 1 | 2 | 64 | 4 |
|------------------------------------------------------------------------------------+---------------------|
| pehract1 | pehract2 | pemlr | peio1icd | peio1cow | peio2cow | prmjind1 | prmjind2 | prmjocc1 | prmjocc2 |
| 4 | -1 | 1 | 7390 | 4 | -1 | 9 | -1 | 1 | -1 |
|----------+----------+-------------------------------------------------------------------------+----------|
| prdtind1 | prdtind2 | prdtocc1 | prdtocc2 | prsjmj | peernhry | peernhro | peernlab | prerelg | prdisflg |
| 36 | -1 | 2 | -1 | 1 | 1 | 4 | 2 | 1 | 2 |
|----------+----------+----------+-------------------------------------------------------------------------|
| pecert1 | pecert2 | pecert3 | pwcmpwgt | pternwa | ptio1ocd | hrhhid | ptwk | gtmetsta |
| 2 | -1 | -1 | 18946917 | 8800 | 710 | 6.101e+14 | 0 | 2 |
+----------------------------------------------------------------------------------------------------------+
+----------------------------------------------------------------------------------------------------------+
6. | hrhhid2 | hrmis | hrmonth | hryear4 | gestfips | gediv | pulineno | pudis | puiodp1 | puiodp2 | puiodp3 |
| 17011 | 7 | 7 | 2025 | 1 | 6 | 1 | -1 | -1 | -1 | -1 |
|-------------------------------------+----------+---------------------------------------------------------|
| perrp | pesex | pemaritl | pehspnon | peeduca | peafever | ptdtrace | prpertyp | prtage | pehractt |
| 41 | 2 | 3 | 2 | 36 | 2 | 1 | 2 | 63 | -1 |
|------------------------------------------------------------------------------------+---------------------|
| pehract1 | pehract2 | pemlr | peio1icd | peio1cow | peio2cow | prmjind1 | prmjind2 | prmjocc1 | prmjocc2 |
| -1 | -1 | 5 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
|----------+----------+-------------------------------------------------------------------------+----------|
| prdtind1 | prdtind2 | prdtocc1 | prdtocc2 | prsjmj | peernhry | peernhro | peernlab | prerelg | prdisflg |
| -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | 0 | 2 |
|----------+----------+----------+-------------------------------------------------------------------------|
| pecert1 | pecert2 | pecert3 | pwcmpwgt | pternwa | ptio1ocd | hrhhid | ptwk | gtmetsta |
| 2 | -1 | -1 | 20570322 | -1 | -1 | 9.101e+14 | 0 | 2 |
+----------------------------------------------------------------------------------------------------------+
+----------------------------------------------------------------------------------------------------------+
7. | hrhhid2 | hrmis | hrmonth | hryear4 | gestfips | gediv | pulineno | pudis | puiodp1 | puiodp2 | puiodp3 |
| 17011 | 8 | 7 | 2025 | 1 | 6 | . | . | . | . | . |
|-------------------------------------+----------+---------------------------------------------------------|
| perrp | pesex | pemaritl | pehspnon | peeduca | peafever | ptdtrace | prpertyp | prtage | pehractt |
| . | . | . | . | . | . | . | . | . | . |
|------------------------------------------------------------------------------------+---------------------|
| pehract1 | pehract2 | pemlr | peio1icd | peio1cow | peio2cow | prmjind1 | prmjind2 | prmjocc1 | prmjocc2 |
| . | . | . | . | . | . | . | . | . | . |
|----------+----------+-------------------------------------------------------------------------+----------|
| prdtind1 | prdtind2 | prdtocc1 | prdtocc2 | prsjmj | peernhry | peernhro | peernlab | prerelg | prdisflg |
| . | . | . | . | . | . | . | . | . | . |
|----------+----------+----------+-------------------------------------------------------------------------|
| pecert1 | pecert2 | pecert3 | pwcmpwgt | pternwa | ptio1ocd | hrhhid | ptwk | gtmetsta |
| . | . | . | . | . | . | 4.524e+13 | 0 | 1 |
+----------------------------------------------------------------------------------------------------------+
+----------------------------------------------------------------------------------------------------------+
8. | hrhhid2 | hrmis | hrmonth | hryear4 | gestfips | gediv | pulineno | pudis | puiodp1 | puiodp2 | puiodp3 |
| 17011 | 8 | 7 | 2025 | 1 | 6 | . | . | . | . | . |
|-------------------------------------+----------+---------------------------------------------------------|
| perrp | pesex | pemaritl | pehspnon | peeduca | peafever | ptdtrace | prpertyp | prtage | pehractt |
| . | . | . | . | . | . | . | . | . | . |
|------------------------------------------------------------------------------------+---------------------|
| pehract1 | pehract2 | pemlr | peio1icd | peio1cow | peio2cow | prmjind1 | prmjind2 | prmjocc1 | prmjocc2 |
| . | . | . | . | . | . | . | . | . | . |
|----------+----------+-------------------------------------------------------------------------+----------|
| prdtind1 | prdtind2 | prdtocc1 | prdtocc2 | prsjmj | peernhry | peernhro | peernlab | prerelg | prdisflg |
| . | . | . | . | . | . | . | . | . | . |
|----------+----------+----------+-------------------------------------------------------------------------|
| pecert1 | pecert2 | pecert3 | pwcmpwgt | pternwa | ptio1ocd | hrhhid | ptwk | gtmetsta |
| . | . | . | . | . | . | 1.101e+14 | 0 | 1 |
+----------------------------------------------------------------------------------------------------------+
+----------------------------------------------------------------------------------------------------------+
9. | hrhhid2 | hrmis | hrmonth | hryear4 | gestfips | gediv | pulineno | pudis | puiodp1 | puiodp2 | puiodp3 |
| 17011 | 8 | 7 | 2025 | 1 | 6 | 1 | -1 | -1 | -1 | -1 |
|-------------------------------------+----------+---------------------------------------------------------|
| perrp | pesex | pemaritl | pehspnon | peeduca | peafever | ptdtrace | prpertyp | prtage | pehractt |
| 40 | 2 | 1 | 2 | 44 | 2 | 1 | 2 | 67 | -1 |
|------------------------------------------------------------------------------------+---------------------|
| pehract1 | pehract2 | pemlr | peio1icd | peio1cow | peio2cow | prmjind1 | prmjind2 | prmjocc1 | prmjocc2 |
| -1 | -1 | 5 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
|----------+----------+-------------------------------------------------------------------------+----------|
| prdtind1 | prdtind2 | prdtocc1 | prdtocc2 | prsjmj | peernhry | peernhro | peernlab | prerelg | prdisflg |
| -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | 0 | 2 |
|----------+----------+----------+-------------------------------------------------------------------------|
| pecert1 | pecert2 | pecert3 | pwcmpwgt | pternwa | ptio1ocd | hrhhid | ptwk | gtmetsta |
| 2 | -1 | -1 | 19775533 | -1 | -1 | 5.314e+14 | 0 | 1 |
+----------------------------------------------------------------------------------------------------------+
+----------------------------------------------------------------------------------------------------------+
10. | hrhhid2 | hrmis | hrmonth | hryear4 | gestfips | gediv | pulineno | pudis | puiodp1 | puiodp2 | puiodp3 |
| 17011 | 8 | 7 | 2025 | 1 | 6 | 2 | -1 | -1 | -1 | -1 |
|-------------------------------------+----------+---------------------------------------------------------|
| perrp | pesex | pemaritl | pehspnon | peeduca | peafever | ptdtrace | prpertyp | prtage | pehractt |
| 42 | 1 | 1 | 2 | 43 | 2 | 1 | 2 | 68 | -1 |
|------------------------------------------------------------------------------------+---------------------|
| pehract1 | pehract2 | pemlr | peio1icd | peio1cow | peio2cow | prmjind1 | prmjind2 | prmjocc1 | prmjocc2 |
| -1 | -1 | 5 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
|----------+----------+-------------------------------------------------------------------------+----------|
| prdtind1 | prdtind2 | prdtocc1 | prdtocc2 | prsjmj | peernhry | peernhro | peernlab | prerelg | prdisflg |
| -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | 0 | 2 |
|----------+----------+----------+-------------------------------------------------------------------------|
| pecert1 | pecert2 | pecert3 | pwcmpwgt | pternwa | ptio1ocd | hrhhid | ptwk | gtmetsta |
| 2 | -1 | -1 | 24488360 | -1 | -1 | 5.314e+14 | 0 | 1 |
+----------------------------------------------------------------------------------------------------------+
Subsetting by row and columns
quietly cd "/Users/Sam/Desktop/Data/CPS/"
use gestfips pemlr pternwa prerelg using "smalljul25pub.dta" if gestfip == 24 & prerelg==1, clear
list in 60/69 | gestfips pemlr prerelg pternwa |
|--------------------------------------|
60. | 24 1 1 132000 |
61. | 24 1 1 84000 |
62. | 24 1 1 108000 |
63. | 24 1 1 80000 |
64. | 24 1 1 163000 |
|--------------------------------------|
65. | 24 1 1 170000 |
66. | 24 1 1 70000 |
67. | 24 1 1 54000 |
68. | 24 1 1 252000 |
69. | 24 1 1 60000 |
+--------------------------------------+
2.3.3 System data
There are data sets that are included in Stata and use the sysuse command.
(1978 Automobile Data)
+-----------------------------------------------------+
| make price mpg rep78 foreign |
|-----------------------------------------------------|
1. | AMC Concord 4,099 22 3 Domestic |
2. | AMC Pacer 4,749 17 3 Domestic |
3. | AMC Spirit 3,799 22 . Domestic |
4. | Buick Century 4,816 20 3 Domestic |
5. | Buick Electra 7,827 15 4 Domestic |
|-----------------------------------------------------|
6. | Buick LeSabre 5,788 18 3 Domestic |
7. | Buick Opel 4,453 26 . Domestic |
8. | Buick Regal 5,189 20 3 Domestic |
9. | Buick Riviera 10,372 16 3 Domestic |
10. | Buick Skylark 4,082 19 3 Domestic |
|-----------------------------------------------------|
11. | Cad. Deville 11,385 14 3 Domestic |
12. | Cad. Eldorado 14,500 14 2 Domestic |
13. | Cad. Seville 15,906 21 3 Domestic |
14. | Chev. Chevette 3,299 29 3 Domestic |
15. | Chev. Impala 5,705 16 4 Domestic |
|-----------------------------------------------------|
16. | Chev. Malibu 4,504 22 3 Domestic |
17. | Chev. Monte Carlo 5,104 22 2 Domestic |
18. | Chev. Monza 3,667 24 2 Domestic |
19. | Chev. Nova 3,955 19 3 Domestic |
20. | Dodge Colt 3,984 30 5 Domestic |
|-----------------------------------------------------|
21. | Dodge Diplomat 4,010 18 2 Domestic |
22. | Dodge Magnum 5,886 16 2 Domestic |
23. | Dodge St. Regis 6,342 17 2 Domestic |
24. | Ford Fiesta 4,389 28 4 Domestic |
25. | Ford Mustang 4,187 21 3 Domestic |
|-----------------------------------------------------|
26. | Linc. Continental 11,497 12 3 Domestic |
27. | Linc. Mark V 13,594 12 3 Domestic |
28. | Linc. Versailles 13,466 14 3 Domestic |
29. | Merc. Bobcat 3,829 22 4 Domestic |
30. | Merc. Cougar 5,379 14 4 Domestic |
|-----------------------------------------------------|
31. | Merc. Marquis 6,165 15 3 Domestic |
32. | Merc. Monarch 4,516 18 3 Domestic |
33. | Merc. XR-7 6,303 14 4 Domestic |
34. | Merc. Zephyr 3,291 20 3 Domestic |
35. | Olds 98 8,814 21 4 Domestic |
|-----------------------------------------------------|
36. | Olds Cutl Supr 5,172 19 3 Domestic |
37. | Olds Cutlass 4,733 19 3 Domestic |
38. | Olds Delta 88 4,890 18 4 Domestic |
39. | Olds Omega 4,181 19 3 Domestic |
40. | Olds Starfire 4,195 24 1 Domestic |
|-----------------------------------------------------|
41. | Olds Toronado 10,371 16 3 Domestic |
42. | Plym. Arrow 4,647 28 3 Domestic |
43. | Plym. Champ 4,425 34 5 Domestic |
44. | Plym. Horizon 4,482 25 3 Domestic |
45. | Plym. Sapporo 6,486 26 . Domestic |
|-----------------------------------------------------|
46. | Plym. Volare 4,060 18 2 Domestic |
47. | Pont. Catalina 5,798 18 4 Domestic |
48. | Pont. Firebird 4,934 18 1 Domestic |
49. | Pont. Grand Prix 5,222 19 3 Domestic |
50. | Pont. Le Mans 4,723 19 3 Domestic |
|-----------------------------------------------------|
51. | Pont. Phoenix 4,424 19 . Domestic |
52. | Pont. Sunbird 4,172 24 2 Domestic |
53. | Audi 5000 9,690 17 5 Foreign |
54. | Audi Fox 6,295 23 3 Foreign |
55. | BMW 320i 9,735 25 4 Foreign |
|-----------------------------------------------------|
56. | Datsun 200 6,229 23 4 Foreign |
57. | Datsun 210 4,589 35 5 Foreign |
58. | Datsun 510 5,079 24 4 Foreign |
59. | Datsun 810 8,129 21 4 Foreign |
60. | Fiat Strada 4,296 21 3 Foreign |
|-----------------------------------------------------|
61. | Honda Accord 5,799 25 5 Foreign |
62. | Honda Civic 4,499 28 4 Foreign |
63. | Mazda GLC 3,995 30 4 Foreign |
64. | Peugeot 604 12,990 14 . Foreign |
65. | Renault Le Car 3,895 26 3 Foreign |
|-----------------------------------------------------|
66. | Subaru 3,798 35 5 Foreign |
67. | Toyota Celica 5,899 18 5 Foreign |
68. | Toyota Corolla 3,748 31 5 Foreign |
69. | Toyota Corona 5,719 18 5 Foreign |
70. | VW Dasher 7,140 23 4 Foreign |
|-----------------------------------------------------|
71. | VW Diesel 5,397 41 5 Foreign |
72. | VW Rabbit 4,697 25 4 Foreign |
73. | VW Scirocco 6,850 25 4 Foreign |
74. | Volvo 260 11,995 17 5 Foreign |
+-----------------------------------------------------+
You can use Stata files that are available, but didn’t ship with Stata and they use the webuse command
(Automobile Models)
+-------------------------------------------+
| make price mpg rep78 foreign |
|-------------------------------------------|
1. | AMC 4099 22 Average Domestic |
2. | AMC 4749 17 Average Domestic |
3. | AMC 3799 22 . Domestic |
4. | Audi 6295 23 Average Foreign |
5. | Audi 9690 17 Excellent Foreign |
+-------------------------------------------+
2.4 Importing Excel files
Many times you will need to grab data from a website and these data may be in an Excel format such as “.xls” or “.xlsx”. Our main command is import excel. We need to specify the firstrow option if we want the headers as variable names. Be careful if the headers are numerics, such as years, in the firstrow. This will throw an error and we’ll need to rename them in Excel before importing these files. For example, 2015 becomes y2015.
quietly cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell/"
import excel using "dentists.xls", firstrow clear
list | name years fulltime recom |
|----------------------------------------------|
1. | Y. Don Uflossmore 7.25 0 1 |
2. | Olive Tu'Drill 10.25 1 1 |
3. | Isaac O'Yerbreath 32.75 1 1 |
4. | Ruth Canaale 22 1 1 |
5. | Mike Avity 8.5 0 0 |
+----------------------------------------------+
We need to take caution of the sheet names of the excel files we are importing. If you don’t pay attention you may import the wrong sheet.
quietly cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell/"
import excel using "dentists2.xls", firstrow clear
list
import excel using "dentists2.xls", firstrow clear sheet("dentists")
list | name years |
|--------------------|
1. | I. Sue Yoo 3 |
2. | A. Dewey 8 |
3. | B. Cheetem 11 |
4. | C. Howe 21 |
+--------------------+
+----------------------------------------------+
| name years fulltime recom |
|----------------------------------------------|
1. | Y. Don Uflossmore 7.25 0 1 |
2. | Olive Tu'Drill 10.25 1 1 |
3. | Isaac O'Yerbreath 32.75 1 1 |
4. | Ruth Canaale 22 1 1 |
5. | Mike Avity 8.5 0 0 |
+----------------------------------------------+
2.5 Importing SAS, SPSS, dbase files
In my opinion, importing files from other statistical packages is a pain. If you are using another statistical software packages like R, SAS, Python, or heaven-forbid SPSS, then export the data from those software package as CSV files. CSV files are fairly universal. There maybe some labeling issues, but if labels are not a concern then use CSV file formats.
Luckily, SAS data files are becoming less common. Even the Census Bureau has seen the light and provides their data in a csv format.
2.5.1 Importing .sas7bdat
For me, it is unavailable in Stata 14, but if you have Stata 16 or higher you can use this command.
Import sas is our main command to import SAS from version 7 or higher.
2.6 Importing Raw Data files
We use the import delimited command to read in csv, txt, dat, etc. files.
2.6.1 Import Delimited
Importing csv files will likely be more common than other types of files so far with the exception of Stata and Excel files. Note: there are other types of delimited files, but comman-delimited are the most common. Other types of delimiters include “tab”, “space”, or “|” delimited.
CSV file If you read the manual you’ll noticed that “,” and “ are the default delimiters Using”,” delimiter:
quietly cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell"
import delimited using "dentists.csv", clear
list
quietly cd "/Users/Sam/Desktop/Data/CPS/"
import delimited using "jul25pub.csv", clear
list gestfips pemlr pternwa in 1/10(4 vars, 5 obs)
+----------------------------------------------+
| name years fulltime recom |
|----------------------------------------------|
1. | Y. Don Uflossmore 7.25 0 1 |
2. | Olive Tu'Drill 10.25 1 1 |
3. | Isaac O'Yerbreath 32.75 1 1 |
4. | Ruth Canaale 22 1 1 |
5. | Mike Avity 8.5 0 0 |
+----------------------------------------------+
(393 vars, 122,252 obs)
+----------------------------+
| gestfips pemlr pternwa |
|----------------------------|
1. | 1 . . |
2. | 1 5 -1 |
3. | 1 6 -1 |
4. | 1 5 -1 |
5. | 1 1 8800 |
|----------------------------|
6. | 1 5 -1 |
7. | 1 . . |
8. | 1 . . |
9. | 1 5 -1 |
10. | 1 5 -1 |
+----------------------------+
CSV files may come in .txt files or .csv files using “,” delimiter
quietly cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell"
import delimited using "dentists1.txt", clear
list(4 vars, 5 obs)
+----------------------------------------------+
| name years fulltime recom |
|----------------------------------------------|
1. | Y. Don Uflossmore 7.25 0 1 |
2. | Olive Tu'Drill 10.25 1 1 |
3. | Isaac O'Yerbreath 32.75 1 1 |
4. | Ruth Canaale 22 1 1 |
5. | Mike Avity 8.5 0 0 |
+----------------------------------------------+
There may be “tab”-delimited files that use “ delimiter
quietly cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell"
import delimited using "dentists2.txt", clear
list(4 vars, 5 obs)
+----------------------------------------------+
| name years fulltime recom |
|----------------------------------------------|
1. | Y. Don Uflossmore 7.25 0 1 |
2. | Olive Tu'Drill 10.25 1 1 |
3. | Isaac O'Yerbreath 32.75 1 1 |
4. | Ruth Canaale 22 1 1 |
5. | Mike Avity 8.5 0 0 |
+----------------------------------------------+
This will produce the same results as above since “ is a default
quietly cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell"
import delimited using "dentists2.txt", clear delimiters("\t")
list(4 vars, 5 obs)
+----------------------------------------------+
| name years fulltime recom |
|----------------------------------------------|
1. | Y. Don Uflossmore 7.25 0 1 |
2. | Olive Tu'Drill 10.25 1 1 |
3. | Isaac O'Yerbreath 32.75 1 1 |
4. | Ruth Canaale 22 1 1 |
5. | Mike Avity 8.5 0 0 |
+----------------------------------------------+
If we set the delimiter to “:”, we can import a “:”-delimited file
quietly cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell"
import delimited using "dentists4.txt", clear delimiters(":")
list(4 vars, 5 obs)
+-------------------------------------+
| v1 v2 v3 v4 |
|-------------------------------------|
1. | Y. Don Uflossmore 7.25 0 1 |
2. | Olive Tu'Drill 10.25 1 1 |
3. | Isaac O'Yerbreath 32.75 1 1 |
4. | Ruth Canaale 22 1 1 |
5. | Mike Avity 8.5 0 0 |
+-------------------------------------+
We can subset columns and rows with import delimited, which can be helpful for large files. Variable names will be defaulted for the first few rows, so when we use rowrange(1:3) this will only include 2 rows of data.
quietly cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell"
import delimited using "dentists2.txt", clear rowrange(1:3)
list(4 vars, 2 obs)
+----------------------------------------------+
| name years fulltime recom |
|----------------------------------------------|
1. | Y. Don Uflossmore 7.25 0 1 |
2. | Olive Tu'Drill 10.25 1 1 |
+----------------------------------------------+
Import delimited from the web
2.6.2 Import delimited with subset of rows
You can use the rowrange option to subset large dataset. For example, the NPPES admin data is over 10 GB. If we want to inspect the data, we can just use rowrange. Another example would be the American Community Survey Public Use File.
2.6.3 Import space-separated files
Sometimes files have space-delimited files (which is a bad idea when strings are involved). Dentists5.txt is an example of such a file. We can use the infile command to import these data, but we need to specify that the name variable is a string at least 17 characters wide.
quietly cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell"
infile str17 name years full rec using "dentists5.txt", clear
list(5 observations read)
+----------------------------------------+
| name years full rec |
|----------------------------------------|
1. | Y. Don Uflossmore 7.25 0 1 |
2. | Olive Tu'Drill 10.25 1 1 |
3. | Isaac O'Yerbreath 32.75 1 1 |
4. | Ruth Canaale 22 1 1 |
5. | Mike Avity 8.5 0 0 |
+----------------------------------------+
Infile does not read in variable names in the first row.
We can also use import delimited with delimiters(” “)
quietly cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell"
import delimited using "dentists6.txt", delimiters(" ") clear
list(4 vars, 5 obs)
+----------------------------------------------+
| name years fulltime recom |
|----------------------------------------------|
1. | Y. Don Uflossmore 7.25 0 1 |
2. | Olive Tu'Drill 10.25 1 1 |
3. | Isaac O'Yerbreath 32.75 1 1 |
4. | Ruth Canaale 22 1 1 |
5. | Mike Avity 8.5 0 0 |
+----------------------------------------------+
2.6.4 Importing fixed-column files
Unfortunately, fixed-column files are more common than I prefer. They are a pain, since you need to specify each column length. I have found this with files with a “.dat” file extension.
We can use the infix command to read in fixed-column files. Let’s look at our fixed-column width files.
Y. Don Uflossmore 7.2501
Olive Tu'Drill 10.2511
Isaac O'Yerbreath32.7511
Ruth Canaale 22.0011
Mike Avity 8.5000
Our name variable is between 1 and 17, years are between 18 and 22. Our fulltime and recommend binaries are only length of 1 with 23 and 24.
quietly cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell/"
infix str name 1-17 years 18-22 fulltime 23 recom 24 using "dentists7.txt", clear
list(5 observations read)
+----------------------------------------------+
| name years fulltime recom |
|----------------------------------------------|
1. | Y. Don Uflossmore 7.25 0 1 |
2. | Olive Tu'Drill 10.25 1 1 |
3. | Isaac O'Yerbreath 32.75 1 1 |
4. | Ruth Canaale 22 1 1 |
5. | Mike Avity 8.5 0 0 |
+----------------------------------------------+
This is a real pain, but luckily Stata has dictionary files to help if one of these files are available. They have a .dct file extension. We already have a fixed-column dictionary with dentists1.dct and we can use it to open a fixed-column width file dentists7.txt
Let’s look at our dictionary file.
infix dictionary {
str name 1-17 years 18-22 fulltime 23 recom 24
}
Let’s look at our fixed-column width file
Y. Don Uflossmore 7.2501
Olive Tu'Drill 10.2511
Isaac O'Yerbreath32.7511
Ruth Canaale 22.0011
Mike Avity 8.5000
Let’s use the two together to import the data file
quietly cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell/"
infix using "dentists1.dct", using(dentists7.txt) clear
listinfix dictionary {
str name 1-17 years 18-22 fulltime 23 recom 24
}
(5 observations read)
+----------------------------------------------+
| name years fulltime recom |
|----------------------------------------------|
1. | Y. Don Uflossmore 7.25 0 1 |
2. | Olive Tu'Drill 10.25 1 1 |
3. | Isaac O'Yerbreath 32.75 1 1 |
4. | Ruth Canaale 22 1 1 |
5. | Mike Avity 8.5 0 0 |
+----------------------------------------------+
We can combine infile command with a dictionary file.
infile dictionary using dentists7.txt {
str17 name %17s "Name of dentist"
years %5f "Years in practice"
fulltime %1f "Full time?"
recom %1f "Recommend Quaddent?"
}
Notice that the first line of the dictionary file is infile dictionary using dentists7.txt and then the column widths
infile dictionary using dentists7.txt {
str17 name %17s "Name of dentist"
years %5f "Years in practice"
fulltime %1f "Full time?"
recom %1f "Recommend Quaddent?"
}
(5 observations read)
+----------------------------------------------+
| name years fulltime recom |
|----------------------------------------------|
1. | Y. Don Uflossmore 7.25 0 1 |
2. | Olive Tu'Drill 10.25 1 1 |
3. | Isaac O'Yerbreath 32.75 1 1 |
4. | Ruth Canaale 22 1 1 |
5. | Mike Avity 8.5 0 0 |
+----------------------------------------------+
Dictionary .dct files are very helpful when they are available with fixed-column width files.
2.6.5 Subsetting fixed-column width files
We can use the “in” qualifier to subset the number of rows to import with either infix or infile.
quietly cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell/"
infix str name 1-17 years 18-22 fulltime 23 recom 24 using "dentists7.txt" in 1/3, clear
list
infix using "dentists1.dct" in 1/3, using(dentists7.txt) clear
list(3 observations read)
+----------------------------------------------+
| name years fulltime recom |
|----------------------------------------------|
1. | Y. Don Uflossmore 7.25 0 1 |
2. | Olive Tu'Drill 10.25 1 1 |
3. | Isaac O'Yerbreath 32.75 1 1 |
+----------------------------------------------+
infix dictionary {
str name 1-17 years 18-22 fulltime 23 recom 24
}
(3 observations read)
+----------------------------------------------+
| name years fulltime recom |
|----------------------------------------------|
1. | Y. Don Uflossmore 7.25 0 1 |
2. | Olive Tu'Drill 10.25 1 1 |
3. | Isaac O'Yerbreath 32.75 1 1 |
+----------------------------------------------+
2.6.6 2.8 Common errors when reading and importing files
Tip: Don’t forget to clear your data. Either the line before or add it as an option , clear
Error 1 If you do not clear your data with the use command, then you will get the message “No; dataset in memory has changed since last saved” - Use clear in the line before ingesting the data, or use clear as an option.
Error 2 If you do not clear your data with the import, infix, or infile commands, then you will get the message: “You must start with an empty dataset”
2.6.7 Entering data directly into Stata Data Editor
My opinion: Please do not do this. If you need to manually enter data, I don’t recommend manually entering data into STATA. I recommend using a spreadsheet, then import the spreadsheet.
If for some reason you don’t have access to Excel or a spreadsheet application like OSX Numbers or Google Sheets, then read pages 43-50.