Section Title
DAY 1 Abstracts
List Title
Ridits right, left, center, native and foreign
Roger Newson
Cancer Prevention Group, School of Cancer & Pharmaceutical Sciences, King's College London
Ridit functions are specified with respect to an identified probability distribution. They are like ranks, only expressed on a scale from 0 to 1 (for unfolded ridits), or -1 to 1 ( for folded ridits). Ridit functions have generalised inverses called percentile functions. A native ridit is a ridit of a variable with respects to its own distribution. Native ridits can be computed using the ridit() function of Nick Cox's SSC package egenmore. Alternatively, weighted ridits can be computed using the SSC package wridit. This has a handedness() option, where handedness(right) specifies a right--continuous ridit (also known as a cumulative distribution function), handedness(left) specifies a left--continuous ridit, and handedness(center) (the default) specifies a ridit function discontinuous at its mass points. wridit now has a module fridit, computing foreign ridits of a variable with respect to a distribution other than its own, specifying the foreign distribution in another data frame. An application of ridits is ridit splines, which are splines in a ridit function, typically computed using the SSC package polyspline. As an example, we may fit a ridit spline to a training set, and use it for prediction in a test set, using foreign ridits of an X-variable in the test set with respect to the distribution of the X-variable in the training set. The model parameterss are typically values of an outcome variable corresponding to percentiles of the X-variable in the training set. This practice stabilises (or Winsorises) outcome values corresponding to X-values in the test set outside the range of X-values in the training set
List Title
The production process of the global MPI
Nicolai Suppa
Centre d'Estudis Demografics, Autonomous University of Barcelona
The Global Multidmensional Poverty Index (MPI) is a cross-country poverty measure published by the Oxford Poverty and Human Development Initiative since 2010. The estimation requires household survey data because multidimensional poverty measures seek to exploit the joint distribution of deprivations in the identification step of poverty measurement. Analyses of multidimensional poverty draw on several aggregate measures (e.g., the headcount ratio), dimensional quantities (e.g., indicator contributions), and auxiliary statistics (e.g., non-response rates). Robustness analyses of key parameters (e.g., poverty cutoffs) and several levels of analysis (e.g., subnational regions) further increase the number of estimates.
In 2018 the underlying workflow has been revised and subjected to continuous development, which for the first time allowed figures to be calculated for 105 countries in a single round. In 2021, this workflow was substantially expanded to include the estimation of changes over time. In 2021 the regular global MPI release includes 109 countries (with 1291 subnational regions) whereas changes over time are provided for 84 countries with 793 subnational regions over up to three years. In total this release builds on 220 micro datasets.
For a large-scale project like this, a clear and efficient workflow is essential. This presentation introduces key elements of the workflow and presents solutions with Stata for particular problems, including the structure of a comprehensive results file, which facilitates both analysis and production of deliverables, the usability of the estimation files, the collaborative nature of the project, the country briefing production, and how some of the additional challenges introduced by the incorporation of changes over time have been addressed so far. This presentation seeks to share the gained experience and to subject both the principal workflow and selected solutions to public scrutiny.
List Title
A unified Stata package for calculating sample sizes for trials with binary outcomes (artbin)
Ella Marley-Zagar
Ian R. White
Mahesh K. B. Parmar
Patrick Royston
Abdel G. Babiker
MCR Clinical Traits Unit at University College London, UK
Sample size calculation is essential in the design of a randomised clinical trial in order to ensure that there is adequate power to evaluate treatment. It is also used in the design of randomised experiments in other fields such as education, international development and social science. We describe the command artbin, to calculate sample size or power for a clinical trial or similar experiment with a binary outcome. A particular feature of artbin is that it can be used to design non-inferiority (NI) and substantial-superiority (SS) trials. Non-inferiority trials are used in the development of new treatment regimes, to test whether the experimental treatment is no worse than an existing treatment by more than a pre-specified amount. NI trials are used when the intervention is not expected to be superior, but has other benefits such as offering a shorter less complex regime that can reduce the risk of drug-resistant strains developing, of particular concern for countries without robust health care systems. We illustrate the command’s use in the STREAM trial, an NI design that demonstrated a shorter more intensive treatment for multi-drug resistant tuberculosis was only 1% less effective than the lengthier treatment recommended by the World Health Organisation. artbin also differs from the offical power command by allowing a wide range of statistical tests (score, Wald, conditional, trend across K groups), and offering calculations under local or distant alternatives, with or without continuity correction. artbin has been available since 2004 but recent updates include clearer syntax, clear documentation and some new features.
List Title
Instrumental variable estimation of large- T panel data models with common factors
Sebastian Kripfganz
University of Exeter Business School
Vasilis Sarafidis
BI Norwegian Business School
We introduce the xtivdfreg command in Stata, which implements a general instrumental variables (IV) approach for estimating panel data models with a large number of time series observations, T, and unobserved common factors or interactive effects, as developed by Norkute, Sarafidis, Yamagata, and Cui (2021, Journal of Econometrics) and Cui, Norkute, Sarafidis, and Yamagata (2020, ISER Discussion Paper). The underlying idea of this approach is to project out the common factors from exogenous covariates using principal components analysis, and to run IV regression in both of two stages, using defactored covariates as instruments. The resulting two-stage IV (2SIV) estimator is valid for models with homogeneous or heterogeneous slope coefficients, and has several advantages relative to existing popular approaches. In addition, the xtivdfreg command extends the 2SIV approach in two major ways. Firstly, the algorithm accommodates estimation of unbalanced panels. Secondly, the algorithm permits a flexible specification of instruments. It is shown that when one imposes zero factors, the xtivdfreg command can replicate the results of the popular ivregress Stata command. Notably, unlike ivregress, xtivdfreg permits estimation of the two-way error components panel data model with heterogeneous slope coefficients.
Estimating causal effects in the presence of competing events using regression standardisation with the Stata command standsurv
Elisavet Syriopoulou
Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
Sarwar I Mozumder
Mark J Rutherford
Biostatistics Research Group, Department of Health Sciences, University of Leicester, Leicester, UK
Paul C Lambert
Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden and Biostatistics Research Group, Department of Health Sciences, University of Leicester, Leicester, UK
When interested in a time-to-event outcome, competing events that prevent the occurrence of the event of interest may be present. In the presence of competing events, various statistical estimands have been suggested for defining the causal effect of treatment on the event of interest. Depending on the estimand, the competing events are either accommodated (total effects) or eliminated (direct effects), resulting in causal effects with different interpretation. Separable effects can also be defined for settings where the treatment effect can be partitioned into its effect on the event of interest and its effect on the competing event through different causal pathways. We outline various causal effects of interest in the presence of competing events, including total, direct and separable effects, and describe how to obtain estimates using regression standardisation with the Stata command standsurv. Regression standardisation is applied by obtaining the average of individual estimates across all individuals in a study population after fitting a survival model. standsurv supports several models including flexible parametric models. With standsurv several contrasts can be calculated: differences, ratios and other user-defined functions. Confidence intervals are obtained using the delta method. Throughout we use an example analysing a publicly available dataset on prostate cancer.
Covariate adjustment in a randomised trial with time-to-event outcomes
Ian R White
Tim P Morris
Deborah Ford
MRC Clinical Trials Unit at UCL, London, UK
Covariate adjustment in a randomised trial aims to provide more powerful comparisons of randomised groups. We describe the challenges of planning how to do this in the ODYSSEY trial, which compares two HIV treatment regimes in children. ODYSSEY presents three challenges: (1) the outcome is time-to-event (time to virological or clinical failure); (2) interest is in the risk at a landmark time (96 weeks after randomisation); and (3) the aim is to demonstrate non-inferiority (defined as the risk difference at 96 weeks being less than 10 percentage points). The statistical analysis plan is based on the Cox model with predefined adjustment for three covariates. We describe how to use the margins command in Stata to estimate the marginal risks and the risk difference. This analysis does not allow for uncertainty in the baseline survivor function. We compare confidence intervals produced by normal theory and by bootstrapping, and (for the risks) using the log-log transform. We compare these methods with Paul Lambert's standsurv, which is based on a parametric survival model. We also discuss an inverse probability of treatment weighting approach, where the weights are derived by regressing randomised treatment on the covariates.
Gravitational Effects of Culture on Internal Migration in Brazil
Daisy Assmann Lima
Philipp Ehrl
Universidade Católica de Brasília
This paper conducts empirical research about the role of culture on internal migration in Brazil. To do so, we deploy data from the Latin American Public Opinion Project (LAPOP) and the 2010 Brazilian Census. Against the background of the gravitational model, we adopt the method Poisson Pseudo-Maximum Likelihood with Fixed Effects (PPMLFE) to account for econometric issues. The results obtained provide new evidence on the influence of the migrant’s perceptions about the push-pull factors of Brazilian municipalities. Traditionally, gravitational models apply features such as Gross Domestic Product per capita, unemployment rate, and population density to measure the attractiveness of cities. All in all, these insights on the migrant’s traits and perceptions about culture pave the way to design appropriate migration policies at the municipal level once migration supports, among others, renewal of the socioeconomic tissue.
References
Akerlof, G. A. (1997). Social distance and social decisions. Econometrica: Journal of the Econometric Society, pp. 1005-1027.
Alesina, A., Baqir, R. and Easterly, W. (1999). Public goods and ethnic divisions. The Quarterly Journal of Economics, 114 (4), 1243-1284.
Alesina, A., Baqir, R., Easterly, W. and La Ferrara, E. (2002). Who trusts others? Journal of Public Economics, 85 (2), 207-234.
Anderson, J. E. (2011). The gravity model. Annu. Rev. Econ., 3 (1), 133-160.
Anderson, J. E., Larch, M. and Yotov, Y. V. (2018). geppml: General equilibrium analysis with ppml. The World Economy, 41 (10), 2750-2782.
Correia, S., Guimarães, P. and Zylkin, T. (2019a). ppmlhdfe: Fast poisson estimation with high-dimensional fixed effects. arXiv preprint arXiv:1903.01690.
Correia, S., Guimarães, P. and Zylkin, T. (2019b). ppmlhdfe: Stata module for poisson pseudo-likelihood regression with multiple levels of fixed effects. Boston College Department of Economics.
Kogut, B. and Singh, H. (1988). The effect of national culture on the choice of entry mode. Journal of International Business Studies, 19 (3), 411-432.
Molloy, R., Smith, C. L. and Wozniak, A. (2011). Internal migration in the United States. Journal of Economic Perspectives, 25 (3), 173-96.
Silva, J. S. and Tenreyro, S. (2006). The log of gravity. The Review of Economics and Statistics, 88 (4), 641-658.
Weber, S. and Péclat, M. (2017). A simple command to calculate travel distance and travel time. The Stata Journal, 17 (4), 962-971.
Two-stage sampling in the estimation of growth parameters and percentile norms: sample weights versus auxiliary variable estimation
George Vamvakas
Department of Biostatistics & Health Informatics, Institute of Psychiatry, King's College London
Background: The use of auxiliary variables with maximum likelihood parameter estimation for surveys that miss data by design is not a widespread approach. Although efficiency gains from the incorporation of Normal auxiliary variables in a model have been recorded in the literature, little is known about the effects of non-Normal auxiliary variables in the parameter estimation.
Methods: We simulate growth data to mimic SCALES, a two-stage longitudinal survey of language development. We allow a fully observed Poisson stratification criterion to be correlated with the partially observed model responses and develop five models that host the auxiliary information from this criterion. We compare these models with each other and with a weighted model in terms of bias, efficiency, and coverage. We apply our best performing model to SCALES data and show how to obtain growth parameters and population norms.
Results: Parameter estimation from a model that incorporates a non-Normal auxiliary variable is unbiased and more efficient than its weighted counterpart. The auxiliary variable method can produce efficient population percentile norms and velocities.
Conclusions: When a fully observed variable, which dominates the selection of the sample and which is strongly correlated with the incomplete variable of interest exists, its utilisation appears beneficial.
Integrating R Machine Learning Algorithms in Stata using rcall: A Tutorial
Ebad F. Haghish
Department of Psychology, University of Oslo, Norway
rcall is a Stata package that integrates R and R packages in Stata and supports seamless two-way data communication between R and Stata. The package offers two modes of data communication, which are 1) interactive and 2) non-interactive. In the first part of the presentation, I will introduce the latest updates of the package (version 3.0) and how to use it in practice for data analysis (interactive mode). The second part of the presentation concerns developing Stata packages with rcall (non-interactive mode) and how to defensively embed R and R packages within Stata programs. All the examples of the presentation, either for data analysis or package development, would be based on embedding R machine learning algorithms in Stata and using them in practice.
A bird's-eye view of Bayesian software in 2021: opportunities for Stata?
Robert Grant
BayesCamp Ltd
In this talk, I will review the range of current software that can be used for Bayesian analysis. By considering the features, interfaces and algorithms, the users and their backgrounds, the popular models and use cases, I will identify areas where Stata has a strategic or technical advantage, and where useful advances can be built into future versions or community-contributed commands, without excessive effort. Stata has developed Bayesian modelling within the framework of its own ado syntax, which has some strengths (for example, the bayes: prefix on familiar and tested commands) and some weaknesses (for example, the limitations to specifying a complex bespoke likelihood or prior). On the other hand, there are Stata components such as the SEM Builder GUI, which would potentially be very popular with beginners in Bayes if they were adapted. I will also examine the concept of a probabilistic programming language to specify a model in linked conditional formulas and probability distributions, and how it can work with Stata.
Using xtbreak to study the impacts of European Central Bank announcements on sovereign borrowing
Natalia Poiatti
Instituto de Relações Internacionais - USP
This paper investigates how the announcements of the European Central Bank have impacted the cost of sovereign borrowing in central and peripheral European countries. Using the xtbreak command (Ditzen, Karavias and Westerlund, 2021) in Stata, we tested whether the variations of European sovereign spreads can be explained by economic fundamentals in a model that allows for two structural breaks: the first, when investors realized the fiscal sustainability of the EMU should be understood in a decentralized fashion, when the ECB announced it would not bail out Greece; the second, when the ECB realized the existence of the euro was in check and announced it would be able to financial assist the countries in financial trouble. We show that a model that allows for structural breaks after the ECB announcements can explain most of the variations in European sovereign spreads.
PyStata - Python and Stata integration
Zhao Xu
Principal Software Engineer, StataCorp
Stata 16 introduced tight integration with Python, allowing users to embed and execute Python code from all of Stata's programming environments, such as Command Window, do-files and ado-files. Stata 17 introduced the pystata Python package. With this package, users can call Stata from various Python environments, including Jupyter Notebook, Jupyter Lab, Spyder IDE, PyCharm IDE, and system command-line environments that can access Python (Windows Command Prompt, macOS terminal, Unix terminal). In this talk, I will introduce two ways to run Stata from Python: the IPython magic commands and a suite of API functions. I will then demonstrate how to use them to seamlessly pass data and results between Stata and Python.
Computing score functions numerically using Mata
Álvaro A. Gutiérrez-Vargas
Research Centre for Operations Research and Statistics, KU Leuven
Specific econometric models - such as the Cox regression, conditional logistic regression, and panel-data models - have likelihood functions that do not meet the so-called linear-form requirement. That means that the model's overall log-likelihood function does not correspond to the sum of each observation's log-likelihood contribution. Stata's m1 command can fit said models using a particular group of evaluators: the d-family evaluators. Unfortunately, they have some limitations; one is that we cannot directly produce the score functions from the postestimation command predict. This missing feature triggers the need for tailored computational routines from developers that might need those functions to compute, for example, robust variance-covariance matrices. In this talk, I present a way to compute the score functions numerically using Mata's deriv() function with minimum extra programming other than the log-likelihood function. The procedure is exemplified by replicating the robust variance–covariance matrix produced by the clogit command using simulated data. The results show negligible numerical differences (e-09) between the clogit robust variance–covariance matrix and the numerically approximated one using Mata's deriv() function.
Analysing conjoint experiments in Stata: the conjoint command
Michael J. Frith
University College London
This talk presents conjoint, a new Stata command for analysing and visualising conjoint (factorial) experiments in Stata. Using examples of conjoint experiments from the growing literature - including two from political science involving choices between immigrants (Hainmueller et al., 2014) and between return locations for refugees (Ghosn et al., 2021) - I will briefly explain conjoint experiments and how they are used. Then, and with reference to existing packages and commands in other software, I will explain how conjoint functions to estimate and visualise the two common estimands: average marginal component effects (AMCE) and marginal means (MM). Limitations of conjoint and possible improvements to the command will also be discussed.
Introducing stipw: inverse probability weighted parametric survival models
Micki Hill
University of Leicester, Leicester, UK
Paul C Lambert
University of Leicester, Leicester, UK and Karolinska Institutet, Stockholm, Sweden
Michael J Crowther
Karolinska Institutet, Stockholm, Sweden
Inverse probability weighting (IPW) can be used to estimate marginal treatment effects from survival data. Currently, IPW analyses can be performed in a few steps in Stata (with robust or bootstrap standard errors) or by using stteffects ipw under some assumptions for a small number of marginal treatment effects. stipw has been developed to perform an IPW analysis on survival data and to provide a closed-form variance estimator of the model parameters using M-estimation. This method appropriately accounts for the estimation of the weights and provides a less computationally intensive alternative to bootstrapping. stipw implements the following steps: (1) A binary treatment/exposure variable is modelled against confounders using logistic regression. (2) Stabilised or unstabilised weights are estimated. (3) A weighted streg or stpm2 (Royston-Parmar) survival model is fitted with treatment/exposure as the only covariate. (4) Variance is estimated using M-estimation. As the stored variance matrix is updated, post-estimation can easily be performed with the appropriately estimated variance. Useful marginal measures, such as difference in marginal restricted survival time, can thus be calculated with uncertainties. stipw will be demonstrated on a commonly used dataset in primary biliary cirrhosis. Robust, bootstrap and M-estimation standard errors will be presented and compared.
The Stata module for CUB models for rating data analysis
G. Cerulli
IRCrES-CNR Rome, Italy
R. Simone
F. Di Iorio
D. Piccolo
Department of Political Sciences, University of Naples Federico II, Italy
C.F. Baum
Boston College, Chestnut Hill, MA, USA
Many survey questions are addressed as ordered rating variables to assess the extent by which a certain perception or opinion holds among respondents. These responses cannot be treated as objective measures, and a proper statistical analysis to account for their fuzziness is the class of CUB models, acronym of Combination of Uniform and Shifted Binomial (Piccolo and Simone 2019), establishing a different paradigm to model both individual perception (feeling) towards the items and uncertainty. Uncertainty can be considered as a noise for feeling measurement taking the form of heterogeneity of the distribution. CUB models are specified via a two-component discrete mixture to combine feeling and uncertainty modelling. In the baseline version, a shifted Binomial distribution accounts for the underlying feeling and a discrete Uniform accounts for heterogeneity, but different specifications are possible to encompass inflated frequencies, for instance. Featuring parameters can be linked to subjects' characteristics to derive response profiles. Then, different items (possibly measured on scales with different lengths) and groups of respondents can be represented and compared through effective visualization tools. Our contribution is tailored to present CUB modelling to the Stata community by discussing the CUB Module with different case studies to illustrate its applicative extent.
A robust regression estimator for pairwise-difference transformed data: xtrobreg
Vincenzo Verardi
Université Libre de Bruxelles
Ben Jann
University of Bern
Pairwise comparison-based estimators are commonly used in statistics. In the context of panel data fixed-effects estimations, Aquaro and Cizek (2013) have shown that a pairwise-differences based estimator is equivalent to the well-known within estimator. Relying on this result, they propose to "robustify" the F.E. estimator by applying a robust regression estimator to pairwise-difference transformed data. In collaboration with Ben Jann, we made available the xtrobreg command that implements this estimator in Stata for both balanced and unbalanced panels. As will be shown in the presentation, the flexibility of the xtrobreg command allows it to be used well beyond the context of panel robust regressions.
Drivers of COVID-19 Outcomes: Evidence from a Heterogeneous SAR Panel Data Model
Christopher F Baum
Boston College, DIW Berlin and CESIS
Miguel Henry
Greylock McKinnon Associates
In an extension of the standard spatial autoregressive (SAR) model, Aquaro, Bailey and Pesaran (ABP, Journal of Applied Econometrics, 2021) introduced a SAR panel model that allows to produce heterogeneous point estimates for each spatial unit. Their methodology has been implemented as the Stata routine hetsar (Belotti, 2021). As the COVID-19 pandemic has evolved in the U.S. since its first outbreak in February 2020 with following resurgences of multiple widespread and severe waves of the pandemic, the level of interactions between geographic units (e.g., states and counties) have differed greatly over time in terms of the prevalence of the disease. Applying ABP’s HETSAR model to 2020 and 2021 COVID-19 data outcomes (confirmed case and death rates) at the state level, we extend our previous spatial econometric analysis (Baum and Henry, 2021) on socioeconomic and demographic factors influencing the spatial spread of COVID-19 confirmed case and death rates in the U.S.A.
Panel Unit Root Tests with Structural Breaks
Pengyu Chen
Yiannis Karavias
Elias Tzavalis
University of Birmingham
This presentation introduces a new Stata command, xtbunitroot, which implements the panel data unit root tests developed by Karavias and Tzavalis (2014). These tests allow for one or two structural breaks in deterministic components of the series and can be seen as panel data counterparts of the tests by Zivot and Andrews (1992) and Lumsdaine and Papell (1997). The dates of the breaks can be known or unknown. The tests allow for intercepts and linear trends, non-normal errors, cross-section heteroskedasticity and dependence. They have power against homogeneous and heterogeneous alternatives, and can be applied to panels with small or large time series dimensions. We will describe the econometric theory and illustrate the syntax and options of the command, with some empirical examples.
References
Karavias, Y., and E. Tzavalis. 2014. Testing for unit roots in short panels allowing for a structural break. Computational Statistics & Data Analysis 76: 391–407.
Lumsdaine, R. L., and D. H. Papell. 1997. Multiple trend breaks and the unit-root hypothesis. Review of economics and Statistics 79(2): 212–218.
Zivot, E., and D. W. K. Andrews. 1992. Further evidence on the great crash, the oil price shock, and the unit-root hypothesis. Journal of business & economic statistics 10(3): 251–270.
rbprobit: Recursive bivariate probit estimation and decomposition of marginal effects
Mustafa Coban
Institute for Employment Research (IAB), Nürnberg (DE)
This article describes a new Stata command rbprobit for fitting recursive bivariate probit models, which differ from bivariate probit models in allowing the first dependent variable to appear on the right-hand side of the second dependent variable. Although the estimation of model parameters does not differ from the bivariate case, the existing commands biprobit and cmp do not consider the structural model’s recursive nature for postestimation commands. rbprobit estimates the model parameters, computes treatment effects of the first dependent variable and gives the marginal effects of independent variables. In addition, marginal effects can be decomposed into direct and indirect effects if covariates appear in both equations. Moreover, the postestimation commands incorporate the two community-contributed goodness-of-fit tests scoregof and bphltest. Dependent variables of the recursive probit model may be binary, ordinal, or a mixture of both. I present and explain the rbprobit command and the available postestimation commands using data from the European Social Survey.
Differences-in-differences in Stata 17
Enrique Pinzon
Associate Director of Econometrics, StataCorp
Stata 17 introduced two commands to fit difference-in-differences (DID) models and difference-in-difference-in-differences (DDD) models. One of the commands, didregress, is for repeated cross-section models, and the other command, xtdidregress, is for longitudinal or panel data. In this presentation, I will briefly talk about the theory of DID and DDD, and then there will be a practical application about how to fit the models using the new commands. Likewise, some aspects related to standard errors that are appropriate under different scenarios will be addressed. Graphical diagnostics and tests relevant to the DID and DDD specifications, as well as new areas of development in the DID literature, will also be discussed.
Graphics for ordinal outcomes or predictors
Nicholas J. Cox
University of Durham, UK
Ordered or ordinal variables, such as opinion grades from Strongly disagree to Strongly agree, are common in many fields and a leading data type in some. Alternatively, orderings may be sought in the data. In archaeology and various environmental sciences, there is a problem of seriation, at its simplest finding the best ordering of rows and columns given a data matrix. For example, the goal may be to place archaeological sites in approximate date order according to which artefacts have been found where. Graphics for such data may appear to range from obvious but limited (draw a bar chart if you must) to more powerful but obscure (enthusiasts for complicated mosaic plots or correspondence analyses need to convince the rest of us). Alternatively, graphics are avoided and the focus is only on tabular model output with estimates, standard errors, P-values and so forth. The need for descriptive or exploratory graphics remains. This presentation surveys various graphics commands by the author, made public through the Stata Journal or SSC, that should not seem too esoteric, principally friendlier and more flexible bar charts and dedicated distribution or quantile plots. Specific commands include tabplot, floatplot, qplot and distplot. Mapping grades to scores and considering frequencies, probabilities or cumulative probabilities on transformed scales are also discussed as simple strategies.
Advanced data visualizations with Stata
Asjad Naqvi
International Institute for Applied Systems Analysis (IIASA), Austria
The presentation will cover innovative use of Stata to create data visualizations that can compete with standard industrial languages like R and Python. Several existing and new concepts like heat plots, stacked area graphs, fully customized maps, streamplots, joy plots, polar plots, spider graphs, and several new visualization templates currently under development will be showcased. The presentation will also discuss the importance of customized color schemes to fine tune the graphs. Propositions for improvements in Stata will be highlighted.
Open panel discussion with Stata Developers
StataCorp, College Station, TX, USA
StataCorp representatives will be given the floor, aiming to report on recent developments at StataCorp, and to discuss wishes, grumbles, and suggestions for further development with users.