graphing.do

********************************************************************************
**                                  GRAPHING                                  **
********************************************************************************

* One of the best ways for understanding a dataset is to make graphs for its
* various variables. Stata offers a number of graphing commands 


** GRAPHING CATEGORICAL DATA (PIE CHART) -- Pie charts are useful for showing
* how 100% of something are distributed among various categories. A draw-back of
* pie charts is that visual comparison between similarly sized categories is not
* easy -- but otherwise, they are easy to interpret.

* To devide information by categories, Stata uses the over option. In our
* dataset, the variable "priz" contains information about the terrain (flat vs.
* mountainous). Here is a pie chart based on the frequencies of each terrain
* category:
graph pie, over(priz)

* By default, the pie chart comes with a legend if the variable has value labels
* assigned. You can customize the appearance -- see below in this text, or in
* the documentation under
help graph pie

* The abbreviated version of "graph pie" is "gr pie":
gr pie, over(priz)

* You can order the slices from smallest to largest (starting at 12 o'clock and
* going clockwise) by adding the "sort" option:
graph pie, over(priz) sort

* If you prefer to go from largest to smallest, also add "descending":
graph pie, over(priz) sort descending


** GRAPHING CATEGORICAL DATA (BAR CHART) -- Bar charts can fulfill the same task
* for categorial data as pie charts, and have the added advantage that the size
* of similar categories can be more easily perceived.

* The command works the same way as "graph pie":
graph bar, over(priz)

* The abbreviated version of "graph bar" is "gr bar":
gr bar, over(priz)

* If the variable has value labels, they will be used to label the bars for each
* category.

* By default, bar charts have the categories along the x-axis, with bars growing
* upward. The y-axis is labeled in percent for the relative shares of each
* category. If you would like to change the orientation, with the categories
* along the y-axis and percentage along the x-axis, you can use "hbar" instead
* of "bar":
graph hbar, over(priz)

* You can customize the appearance of the bar chart further. See below in this
* text, or in the documentation under
help graph bar


** GRAPHING QUANTITATIVE DATA (HISTOGRAM) -- Histograms are very useful for
* quantitative information because break the entire range of values into
* individual bins and show the frequencies with which observations fall into the
* different bins.

* Our dataset has a variable "totx" which represents the total expenditures of
* households. Creating a histogram for it easy:
histogram totx

* The abbreviated version of "histogram" is "hist":
hist totx

* The histogram will show the entire range and the label of the variable along
* its x-axis. The y-axis is labeled with densities, which are not easy to
* interpret for "normal" earthlings. We recommend switching to percentages or
* frequencies:
histogram totx, percent
histogram totx, frequency

* Later in our course, we'll find that it may be useful to see how "normal"
* a distribution is. Stata allows us to visually inspect this by adding a
* normal distribution to the histogram with the "normal" option:
histogram totx, percent normal

* You can get more documentation on histograms with:
help histogram
* ... and you can customize its appearence (see below!)


** COMPARING QUANTITATIVE DATA BY CATEGORIES (BOX PLOT) -- Box plots are 
* summary graphs that show the distribution of a variable in comparison for
* multiple categories, with indications for the center, central 50% of
* observations, and outliers.

* Our dataset identifies whether a household is located in a rural or urban
* area ("b002"). We can use a boxplot to compare rural and urban household
* expenditures:
graph box totx, over(b002)

* The abbreviated version of "graph box" is "gr box":
gr box totx, over(b002)

* By default, the graph is labeled for each category if the categorical variable
* has value labels, and the range and label of the quantitative variable is
* provided. You can customize the appearance of the graph: see below or read
* the documentation on box plots with:
help graph box


** PLOTTING TWO QUANTITATIVE VARIABLES AGAINST EACH OTHER (SCATTER PLOT) --
* Scatter plots are useful for showing the distribution of two quantitative
* variables in respect to each other.

* Our data contains variables on total household income and expenditures. A
* scatter plot for these two variables is easily created with
graph twoway scatter totx toty

* This command can be abbreviated in many ever-shorter versions:
twoway scatter totx toty
scatter totx toty
tw sc totx toty
sc totx toty
* We like the last one best -- given how frequently we use scatter plots, it's
* fitting to have this command super short.

* By default, the variable listed first will be plotted along the y-axis, and
* variable listed second will be plotted along the x-axis. 

* Especially when you have lots of observations, relatively thick dots for each
* observation may obscure the pattern. You can opt for smaller markers with
sc totx toty, msize(small)

* Not small enough? Go for tiny:
sc totx toty, msize(tiny)

* Stata knows 12 different sizes. You can find a list here:
help markersizestyle

* You can get extensive documentation on scatter plots with:
help scatter

* Scatter plots will come in handy during Econometrics -- we'll be sure to
* revisit this subject...


** CUSTOMIZING AND LABELING YOUR CHARTS --  Stata graphs have sensible default
* options. Usually, the basic command produces a properly labeled, legible
* graph. There are situations where you want or need to fine-tune the settings:
* you may want to adjust the labels to better communicate what the chart shows,
* or you may want to correct default settings in Stata when they don't work well
* for the graph you are producing.

* Often, the most important changes apply to the labeling. We'll start with a
* simple scatter plot:
sc totx toty, msize(tiny)

* and add a title:
sc totx toty, msize(tiny) title("Household Income v Expenditure")

* We can also add subtitles and captions:
sc totx toty, msize(tiny) ///
   title("Household Income v Expenditure") ///
   subtitle("Kyrgyz Integrated Household Survey") ///
   caption("Observations: 4984 households in 2009")
* Because graph commands tend to get longer and longer, here's a trick that
* allows us to break the command over several lines to keep for legibility:
* end each line with \\\ until the command is finished. This only works in do-
* files, not in the command line!

* We can also re-label our axes in the graph in case the variable labels don't
* do a perfect job:
sc totx toty, msize(tiny) ///
   title("Household Income v Expenditure") ///
   subtitle("Kyrgyz Integrated Household Survey") ///
   caption("Observations: 4984 households in 2009") ///
   xtitle("Total Annual Income") ///
   ytitle("Total Annual Expenditures")
   
* Finally, we may want to change the layout and design of the graph. While it is
* possible to change colors, fonts, and positions manually, Stata has a powerful
* option called "scheme" which allows you to apply a consistent design to the
* entire graph. Two popular options are

* ... graphs that match the design used in The Economist:
sc totx toty, msize(tiny) ///
   title("Household Income v Expenditure") ///
   subtitle("Kyrgyz Integrated Household Survey") ///
   caption("Observations: 4984 households in 2009") ///
   xtitle("Total Annual Income") ///
   ytitle("Total Annual Expenditures") ///
   scheme(economist)

* ... and graphs optimized for gray-scale (black and white), low-ink printing:
sc totx toty, msize(tiny) ///
   title("Household Income v Expenditure") ///
   subtitle("Kyrgyz Integrated Household Survey") ///
   caption("Observations: 4984 households in 2009") ///
   xtitle("Total Annual Income") ///
   ytitle("Total Annual Expenditures") ///
   scheme(s1mono)
   
* Stata has 11 default schemes and can install more. Find out more at
help schemes

* All of the graphing commands have many options, far more than we want to cover
* here. Be sure to read the extensive documentation that comes with Stata for
* all of the graph commands by calling the help function:
help graph


** SAVING YOUR GRAPHS -- While you are exploring your data set, just looking at
* different graphs is enough. But as you get closer to writing the report, you
* will want to save graphs. The easiest way is to tack a "saving" option onto
* the end of your graph command
sc totx toty, saving(scatter)

* Unfortunately, this produces a graph in Stata's own file format .gph

* If you want to save a graph for inclusion in your written documents, it's best
* to export the graph in a useful file format. This is done with "graph export":
graph export "scatter.pdf"

* This produces a high-quality PDF version of your graph, which is great for 
* inclusion in Word, LaTeX, PowerPoint, ...

* "graph export" can produce other file types (such png, tiff, eps). You can
* find more here:
help graph export