stata数据整理常用命令

stata数据整理常用命令

2023年7月26日发(作者:)

Stata常用指令

set more off

set virtual on

di exp(3.567)

Browse the data

tabmiss x1 x2 (findit tabmiss)

browse var1 var2 (if ….)

listblck in 1/10, repeat(1) (findit listblck)

解释

把虚拟内存打开

= display

显示MV的freq与比例

Look like editor window, but cannot edit

list, 但将版面精缩

repeat(1/n) => 前1(n)个重复出现after row 2(findit univar) univar chinese math science, boxplot

, by(gender) onehdr univar math, by(gender) onehdr

boxplot onescal

Summary Statistics & Tables

sum

we can use if : eg. (if crime==1)

tab x1, sort miss

(sort=按照distribution排列; miss=列出MV distribution as

well)

ta x1 x2, chi2 miss

, nof column (no frequency / column percentage)

, row (row percentage)

, all (all available statistics)

, exact (Fisher’s exact test)

ta maage_group, plot

tab1 x1 x2 x3 x4

tab2 x1 x2 x3 x4

ta paedu, sum(crime)

tabstat score, stats(mean sd n max min…) by (subject)

iqr=interquaritile range=p75-p25

q=quartiles= if we specify p25 p50 p75

table x1 x2, contents(mean y1 median y2)

univar (=sum) 但增加q25, midian, q75的呈现get a table with one header

onescale才能相比较

Summarize all variables

(mean, SD, freqency)

tab=tabulate

Chi2=Pearson chi-square test of independence

= tab x1 / tab x2…….

tab all possible two-way..

By levels of paedu, summarize crime

median, p10, p25, iqr, q…

Also min, max….etc…..

第 1 页,共 5 页2011-06-19Data Management

gen id=_n (then do something else)

sort id

browse var1 var2 (if ….)

edit var1 var2 var3 (if…)

label variable bw “birth weight”

drop if id==id[_n-1] & birthday==birthday[_n-1]

format id %9.0f

encode region, gen(region2)

tab region2 (looks the same but…)

tab region2, nolabel (now we see the numeric value)

mvdecode

mvencode

egen zscore=std(x)

egen avg=rmean(Chinese, English, math)

egen sum=rsum(x,y,z)

list population region, nolabel

(only for lebeled numeric variables, not string var)

[分组]

sort var

gen varnew=group(5)

egen iseicat=cut(isei), at(10, 40, 70, 90)

table iseicat, contents(min isei max isei) => 检查

If want to come back to the earlier order…..Look like editor window, but cannot edit

Or just replace delete=1, 就不用真的delete字符太多不够显现时….

It generate labeled- numeric var from a string

variable.

numeric value => mv

mv => numeric value

标准分数 (mean=0, V=1)

Row mean, ignore MV

Row sum, MV=0

Display numeric var instead of the labels

分成相同cases五组

分成10, 40, 70三组

不包括上限 (eg.90)

不被包括者 => MV

egen iseicat=cut(isei), at(10, 40, 70, 90) icodes

egen iseicat=cut(isei), at(10, 40, 70, 90) label

local x "st2 st3 " [for later use: type `x']

Importing data from other programs

infile str30 place population sex score using

=> 变成 0, 1, 2 三组

=> 跟icodes一样,但加了label (10- 40- 70- )

定义长字符串

String var之前要加str#, as many as

#characters

(clean Excel data following stata data format) Excel => stata data

第 2 页,共 5 页2011-06-19(save Excel as .csv file )

insheet using “c:/data/”

infix

reshape?

collapse?

Compare groups

ttest college, by(male)

Regression

by region3, sort:reg score paedu

sort region3

by region3:reg score paedu

reg y x1 x2 x3, beta

sw reg Y x1 x2 x3 x4 x5….., pr(.05)

pr=p to retain (backward elimination)

sw reg Y x1 x2 x3 x4 x5….., pe(.05)

After regression…

predict yhat

predict e, risid

sort e

list v1 v2 v3… in 1/10 (or in -10/l) (l=last, not one)lstat ?

Listcoef, help (要search & install: Long’s

spostdo)

After logistic regression

est store full

quietly logistic y x (nested model)

lrtest full

logit y x

predict phat

graph twoway connected phat x, sort

predict q, xb

=> Phat=predicted p

=exp(a+bx)/[1+exp(a+bx)]

=> xb = lg odd = ln(p/(1-p))

Likelihood-ratio test :

correct classification rate

列出 X(&Y)的标准化系数

standardized regression

Stepwise reg:

它自己remove不显著Xs

pe=p to enter

Residual

We can examine where the model fits poorly…第 3 页,共 5 页2011-06-19predict phat

graph twoway mspline phat x2

adjust, by(var1) exp 后者=前者*exp(b)

adjust, by(var1) pr p/(1-p)=odds (when

var1=n)

Interaction term的诠释: B1(Main)+B2(dummy)

For the group (dummy=1): the odds ratio of Main is

logistic y var1 var2 inter

lincom var1+inter

lincom [2]lbw+[2]inter10, or (for mlogit)

([2]=model)

用方便的方式得到 predicted probability

prchange (findit

spost)

prchange, fromto help (help: add 说明)

prtab

prtab, x(paedu=1 maedu=1) rest(min)

prgen isei, f(30) t(60) gen(ff) x(male=0)

prgen isei, f(30) t(60) gen(mm) x(male=1)

twoway (connected ffp1 ffx) (connected mmp1 mmx)

xi3: logit y i.x1*male

postgr3 male, by(x1) table (very useful for

obtain p)

postgr3 isei, by(area) (连续变项也可以)

mlogit

mlogit y x1 x2, rrr nolog base(2)

(ref group=> y=2)

Output

outreg using , nolabel replace (findit

outreg) & install

=> odds when var1=1,2,3..

=> p(y) when var1=1,2,3..

exp(B1) * exp(B2)

Get point estimation & CI of coefficient

combination

Changes in predicted probability

Predicted probability in n*n table

连续变项对y=1的影响 (于范围内自动取n[default=11]点来计算p)

有interaction term时……

=> male effect 因x1类别而不同

rrr=relative risk ratio (=OR)

Then convert text into table

储存时要click no另存新檔

第 4 页,共 5 页2011-06-19outreg using , nolabel append

outreg var1 var2 using , replace 10pct coefastr

se (se=st. error instead of t statistics)

log using , replace (don’t use t)

最后:log2html , replace

Graph

graph dir

graph use gender_gap

graph save filename

erase

其它

sgmediation var_y, mv(varx1) iv(varx2)

[Sobel-Goodman tests: use findit first]

省时

program define shortcut

command 1 … command 2

end

shortcut (自己跑一遍command 1, 2..)

超级常用

list, gen, recode, replace, rename, sort, drop, keep,

order……

merge, append

append = model 2 add on M1

可指定列出哪些系数

(+ p<.1) (* add on coef)

(先 findit log2html)

=> 可以把结果存成html

List all the graph files

<, is saved

test whether a mediator carries the influence

of an IV to a DV.

Shortcut=program name we set

=>shortcut 本身变成command

_merge=1 (from master data), 2=from using

data…

第 5 页,共 5 页2011-06-19

发布者:admin,转转请注明出处:http://www.yc00.com/xiaochengxu/1690363148a338117.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信