forked from lfkrebs/stata-cookbook
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathgraphing.do
237 lines (174 loc) · 9.04 KB
/
graphing.do
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
********************************************************************************
** GRAPHING **
********************************************************************************
* One of the best ways for understanding a dataset is to make graphs for its
* various variables. Stata offers a number of graphing commands
** GRAPHING CATEGORICAL DATA (PIE CHART) -- Pie charts are useful for showing
* how 100% of something are distributed among various categories. A draw-back of
* pie charts is that visual comparison between similarly sized categories is not
* easy -- but otherwise, they are easy to interpret.
* To devide information by categories, Stata uses the over option. In our
* dataset, the variable "priz" contains information about the terrain (flat vs.
* mountainous). Here is a pie chart based on the frequencies of each terrain
* category:
graph pie, over(priz)
* By default, the pie chart comes with a legend if the variable has value labels
* assigned. You can customize the appearance -- see below in this text, or in
* the documentation under
help graph pie
* The abbreviated version of "graph pie" is "gr pie":
gr pie, over(priz)
* You can order the slices from smallest to largest (starting at 12 o'clock and
* going clockwise) by adding the "sort" option:
graph pie, over(priz) sort
* If you prefer to go from largest to smallest, also add "descending":
graph pie, over(priz) sort descending
** GRAPHING CATEGORICAL DATA (BAR CHART) -- Bar charts can fulfill the same task
* for categorial data as pie charts, and have the added advantage that the size
* of similar categories can be more easily perceived.
* The command works the same way as "graph pie":
graph bar, over(priz)
* The abbreviated version of "graph bar" is "gr bar":
gr bar, over(priz)
* If the variable has value labels, they will be used to label the bars for each
* category.
* By default, bar charts have the categories along the x-axis, with bars growing
* upward. The y-axis is labeled in percent for the relative shares of each
* category. If you would like to change the orientation, with the categories
* along the y-axis and percentage along the x-axis, you can use "hbar" instead
* of "bar":
graph hbar, over(priz)
* You can customize the appearance of the bar chart further. See below in this
* text, or in the documentation under
help graph bar
** GRAPHING QUANTITATIVE DATA (HISTOGRAM) -- Histograms are very useful for
* quantitative information because break the entire range of values into
* individual bins and show the frequencies with which observations fall into the
* different bins.
* Our dataset has a variable "totx" which represents the total expenditures of
* households. Creating a histogram for it easy:
histogram totx
* The abbreviated version of "histogram" is "hist":
hist totx
* The histogram will show the entire range and the label of the variable along
* its x-axis. The y-axis is labeled with densities, which are not easy to
* interpret for "normal" earthlings. We recommend switching to percentages or
* frequencies:
histogram totx, percent
histogram totx, frequency
* Later in our course, we'll find that it may be useful to see how "normal"
* a distribution is. Stata allows us to visually inspect this by adding a
* normal distribution to the histogram with the "normal" option:
histogram totx, percent normal
* You can get more documentation on histograms with:
help histogram
* ... and you can customize its appearence (see below!)
** COMPARING QUANTITATIVE DATA BY CATEGORIES (BOX PLOT) -- Box plots are
* summary graphs that show the distribution of a variable in comparison for
* multiple categories, with indications for the center, central 50% of
* observations, and outliers.
* Our dataset identifies whether a household is located in a rural or urban
* area ("b002"). We can use a boxplot to compare rural and urban household
* expenditures:
graph box totx, over(b002)
* The abbreviated version of "graph box" is "gr box":
gr box totx, over(b002)
* By default, the graph is labeled for each category if the categorical variable
* has value labels, and the range and label of the quantitative variable is
* provided. You can customize the appearance of the graph: see below or read
* the documentation on box plots with:
help graph box
** PLOTTING TWO QUANTITATIVE VARIABLES AGAINST EACH OTHER (SCATTER PLOT) --
* Scatter plots are useful for showing the distribution of two quantitative
* variables in respect to each other.
* Our data contains variables on total household income and expenditures. A
* scatter plot for these two variables is easily created with
graph twoway scatter totx toty
* This command can be abbreviated in many ever-shorter versions:
twoway scatter totx toty
scatter totx toty
tw sc totx toty
sc totx toty
* We like the last one best -- given how frequently we use scatter plots, it's
* fitting to have this command super short.
* By default, the variable listed first will be plotted along the y-axis, and
* variable listed second will be plotted along the x-axis.
* Especially when you have lots of observations, relatively thick dots for each
* observation may obscure the pattern. You can opt for smaller markers with
sc totx toty, msize(small)
* Not small enough? Go for tiny:
sc totx toty, msize(tiny)
* Stata knows 12 different sizes. You can find a list here:
help markersizestyle
* You can get extensive documentation on scatter plots with:
help scatter
* Scatter plots will come in handy during Econometrics -- we'll be sure to
* revisit this subject...
** CUSTOMIZING AND LABELING YOUR CHARTS -- Stata graphs have sensible default
* options. Usually, the basic command produces a properly labeled, legible
* graph. There are situations where you want or need to fine-tune the settings:
* you may want to adjust the labels to better communicate what the chart shows,
* or you may want to correct default settings in Stata when they don't work well
* for the graph you are producing.
* Often, the most important changes apply to the labeling. We'll start with a
* simple scatter plot:
sc totx toty, msize(tiny)
* and add a title:
sc totx toty, msize(tiny) title("Household Income v Expenditure")
* We can also add subtitles and captions:
sc totx toty, msize(tiny) ///
title("Household Income v Expenditure") ///
subtitle("Kyrgyz Integrated Household Survey") ///
caption("Observations: 4984 households in 2009")
* Because graph commands tend to get longer and longer, here's a trick that
* allows us to break the command over several lines to keep for legibility:
* end each line with \\\ until the command is finished. This only works in do-
* files, not in the command line!
* We can also re-label our axes in the graph in case the variable labels don't
* do a perfect job:
sc totx toty, msize(tiny) ///
title("Household Income v Expenditure") ///
subtitle("Kyrgyz Integrated Household Survey") ///
caption("Observations: 4984 households in 2009") ///
xtitle("Total Annual Income") ///
ytitle("Total Annual Expenditures")
* Finally, we may want to change the layout and design of the graph. While it is
* possible to change colors, fonts, and positions manually, Stata has a powerful
* option called "scheme" which allows you to apply a consistent design to the
* entire graph. Two popular options are
* ... graphs that match the design used in The Economist:
sc totx toty, msize(tiny) ///
title("Household Income v Expenditure") ///
subtitle("Kyrgyz Integrated Household Survey") ///
caption("Observations: 4984 households in 2009") ///
xtitle("Total Annual Income") ///
ytitle("Total Annual Expenditures") ///
scheme(economist)
* ... and graphs optimized for gray-scale (black and white), low-ink printing:
sc totx toty, msize(tiny) ///
title("Household Income v Expenditure") ///
subtitle("Kyrgyz Integrated Household Survey") ///
caption("Observations: 4984 households in 2009") ///
xtitle("Total Annual Income") ///
ytitle("Total Annual Expenditures") ///
scheme(s1mono)
* Stata has 11 default schemes and can install more. Find out more at
help schemes
* All of the graphing commands have many options, far more than we want to cover
* here. Be sure to read the extensive documentation that comes with Stata for
* all of the graph commands by calling the help function:
help graph
** SAVING YOUR GRAPHS -- While you are exploring your data set, just looking at
* different graphs is enough. But as you get closer to writing the report, you
* will want to save graphs. The easiest way is to tack a "saving" option onto
* the end of your graph command
sc totx toty, saving(scatter)
* Unfortunately, this produces a graph in Stata's own file format .gph
* If you want to save a graph for inclusion in your written documents, it's best
* to export the graph in a useful file format. This is done with "graph export":
graph export "scatter.pdf"
* This produces a high-quality PDF version of your graph, which is great for
* inclusion in Word, LaTeX, PowerPoint, ...
* "graph export" can produce other file types (such png, tiff, eps). You can
* find more here:
help graph export