y*****y 发帖数: 98 | 1 如果是[0,1] bounded outcome score, 最简单的直接transform到[-infty,infty],然
后fit linear regression.
复杂点但更好的做法有, ordinal regression (McCullagh, 1980); binomial-logit-
normal, coarsened data model (Lesaffre, 2007). |
|
m******t 发帖数: 44 | 2 在用处理一个logistic regression(有多个解释变量 都是连续的),code 如下:
第一步
proc genmod data=datcom descend ;
model bidd = pdhdS1 pdnhS1 E age educyears D / dist=bin link=logit CovB;
by _Imputation_;
ods output ParameterEstimates=paraest CovB=covmat;
run;
这里生成了2个ods table.按理说,因为是multivariate inference,所以第二步
mianalyze应该采用如下code:
proc mianalyze parms=Paraest covb=covmat;
modeleffects intercept pdhdS1 pdnhS1 E age educyears D;
ods output ParameterEstimates=parameterest VarianceInfo=vinfo;
run;
在proc mianalyze输入data的时候,p... 阅读全帖 |
|
s**5 发帖数: 68 | 3 问个问题,如果在case-control study中用logistic regression,logit(Pr) 中的
Probability(Pr)是什么的probability? 我的理解是Pr(D=1|X,Z=1) which Z=1mans
this subject is sampled. 然后如果用contigency table 表示,Pr(D=1|X,Z=1) can
be estimated as Nd=1/(Nd=1+Nd=0) in each row (each setting of x) 我的理解对
吗?谢谢! |
|
a********s 发帖数: 188 | 4 Just as a reference: I did not use genmod for multinomial before, but did
use PROC LOGISTIC with option GLOGIT (generalized logit model), and specify
PREDPROB = I to get each level probabilities. |
|
|
r****t 发帖数: 276 | 6 诸位大虾,在下试用GENMOD with repeated option去估计一个率,看起来很简单的
model,比如这样:
proc genmod data = dsn;
class subjid y x;
model y = x/link = logit distribution = binomial;
repeated subject = subjid/type=exch;
estimate 'rate a' x 1 0;
run;
可是一旦使用了repeated option 也就是GEE model,出来的率非常邪门。如果直接
proc freq; tables y*x得到的y=1 & x=1的率是70%左右,用GEE model求出来的居然在
30%左右?我注意到用exchangeable option 出来的working correlation〉0.95,这是
不是如此离奇的rate的原因亚?那位大虾对这种情况有经验望不吝赐教,谢先 |
|
l***a 发帖数: 12410 | 7 对于logistic reg来说,就是DV的logit和IV之间线形关系不够强? |
|
s*********e 发帖数: 1051 | 8 Thanks for everyone helping me and sending me the paper.
The reason I'd like to have the paper is because this is the original paper
about fractional logit by QMLE proposed by Papke and Wooldridge. From what I
can see, this approach might be the most elegant solution to model loss
given default (LGD) in the credit risk content. (just thought some of you
might be interested)
Have a nice weekend.
to |
|
n*****1 发帖数: 172 | 9 我也想知道。在Stata里面应该没有现成的command可以做two way cluster for logit |
|
|
n*****1 发帖数: 172 | 11 谢谢!明天去下载来玩一玩。不过其实我更想要的是在multinomial logit下搞two
cluster。。。 |
|
z*******n 发帖数: 15481 | 12 R 有一个function可以直接算出结果的哈
glm(y~x1+x2+x3+...,family=binomial(link="logit"))
如需更多的设置 你可以用?glm在R里面查看glm function的细节 |
|
A*******s 发帖数: 3942 | 13 it is still bounded... say the min and max p is 0.01 and 0.99 respectively
in the data, after logit transformation it would be log(1/99) and log(99).
if the data are dense near the boundaries u will still see the angle like
distribution of residuals.
u can check out this example I worked on before:
http://archive.ics.uci.edu/ml/datasets/Communities+and+Crime
the outcome is standardized and scaled between 0 and 1.
no matter what type of transformation i used, residual plots would present
unconsta... 阅读全帖 |
|
A*******s 发帖数: 3942 | 14 hey, look at my post:
if data are DENSY near boundaries...
which suggest some truncated/censored nature behind what u see. that is the
origin of latent variable models for bounded outcome.
if u don't buy it, try my example after u delete all the 0 and 1 Y, and
print out the residual plot.
or u can just simply google beta regression. Here is some explanation about
the need of beta regression:;
How should one perform a regression analysis in which the dependent variable
(or response
variable), y, ... 阅读全帖 |
|
j*****e 发帖数: 182 | 15 Suppose Y~Binomial(n,p),logit(p)=b0+b1x1+b2x2.
Parameter b0,b1,b2 are estimated by likelihood maximization, which is
essentially to solve a weighted least square equaltion. Here, the weight
dependends on b0,b1,b2 and also the binomial size n. Because the weight
depends on b0-b2, b0-b2 have to be solved iteratively (no close form
expression). This is known as the iterative reweighted least square method.
Without knowing n, estimation of b0-b2 can not be reproduced. Also, R-square
can no longer be... 阅读全帖 |
|
p***r 发帖数: 920 | 16 You can't fit to find the result, except you know how many data points you
obtained at each logit ratio.
★ 发自iPhone App: ChineseWeb - 中文网站浏览器 |
|
t**c 发帖数: 539 | 17 可不可以这么来看这个问题:
每个candidate可以拿到0-10个offer,所以用multinomial logistic regression.但是
因为是ordinal的data,这样子predictor的coefficient只有一个,而不是10个。会有10
个intercept。
coefficient |
|
A*******s 发帖数: 3942 | 18 your original question about "model学校会给谁发offer" is way too general,
which leads to thousands of possible candidate models. you must ask a
specific one. For example:
do u wanna predict the joint distribution of offers from various school?
wanna test marginal association between offer and certain attributes?
wanna estimate school-specific or applicant-specific effects?
I kinda feel that you wanna estimate the marginal association between prob.
of offer and some applicants' attributes, in which cas... 阅读全帖 |
|
A*******s 发帖数: 3942 | 19 my 2 cents is GEE/mixed models fit exactly this scenario--"这20个candidate每
年也会变", in which case marginal effect is of interest while subject-
specific effect is nuisance.
the |
|
t**c 发帖数: 539 | 20 请教:
在这种情况下,谁是random factor,谁是fixed factor?
看楼主开始的描述,是要把candidate作为fixed,学校是random。
现在这20个candidate每年也会变,那是说candidate也是random的?
而且还要加一个时间变量?
在这种情况下学校还是random吗? |
|
A*******s 发帖数: 3942 | 21 Don't 请教, i am also a beginner. :)
you have a good catch. actually i have no experience to model such nested
cluster correlations-- within-school and within-candidate. Can we just use
two random statement with _residual_ options to model two R-side effects in
GLIMMIX? any thought? |
|
t**c 发帖数: 539 | 22 拜读过你的找工经验,非常非常佩服。是我的奋斗目标,:)
我没有试过在SAS里define two random factors,应该可以吧。
不过对于楼主这种情况,貌似他不关心系数随学校或者申请人的变化。刚google了一下,楼主所说的PROC QLIM 可能比较靠谱。
in |
|
|
n**m 发帖数: 156 | 24 有没有人知道在proc glimmix里怎么跳出 likelihood ratio or AIC, 谢谢 |
|
n**m 发帖数: 156 | 25 找到了,请问
-2 Res Log Pseudo-Likelihood
这个res是什么意思 |
|
r*****y 发帖数: 199 | 26 restricted吧,就像很多procedure里面的reml吧~ |
|
j*****e 发帖数: 182 | 27 data one;
input choice gender $ count;
cards;
1 f 17
1 m 19
2 f 14
2 m 21
...;
run;
/*CMH test for norminalXordinal table
Use the chi-square test for "Row mean score differ". This is equivalent to
ANOVA */
proc freq data=one;
table gender*choice/cmh;
weight count;
run;
/*Cumulative logit model.
You have to understand the model before your interpretating the SAS output.
*/
proc genmod data=one;
class gender/param=ref;
model choice=gender/dist=multi link=clogit;
weight count;
run; |
|
j******n 发帖数: 2206 | 28 The odds ratio for a $1000 increase in income is 1.074.
This means that for every $1000 increase in income
a. the probability of the event increases 7.4%.
b. the logit increases 7.4%
c. the odds of the event increases 7.4%
d. the log of the odds of the event increases 7.4% |
|
s*r 发帖数: 2757 | 29 Wilcoxon rank sum test shows the distribution of the x variable is different
by a location shift in y=1 group and y=0 group
logistic regression shows there is no significant linear increasing of logit
(pr(y)) as each unit increase of x |
|
z******n 发帖数: 397 | 30 有趣,设想这样一个例子:
在变量x轴上有如下y的0-1排列:
x --0--1--0--1--0--1--0--1--0--1--
貌似logistic reg不显著,而wilcoxon rank sum显著
没有验证过
different
logit |
|
p********a 发帖数: 5352 | 31 ☆─────────────────────────────────────☆
littlebirds (dreamer) 于 (Fri Aug 5 08:14:32 2011, 美东) 提到:
it is just another job
☆─────────────────────────────────────☆
WEIMINGSPACE (想也白想) 于 (Fri Aug 5 10:18:51 2011, 美东) 提到:
可你的JOB偏偏和这些只知道分组做个T检验,RUN个REGRESSION就当自己是FISHER转世
的脑残们戚戚相关,想着就悲
☆─────────────────────────────────────☆
littlebirds (dreamer) 于 (Fri Aug 5 10:35:12 2011, 美东) 提到:
Business/industries are not much better either. People don't trust things
they do not understand. You ... 阅读全帖 |
|
p********2 发帖数: 9939 | 32 2SLS难道不是先一个logit啥的,把fitted value 丢到第二个regression里面来, so
that就不会和error correlated。请大虾指正。
自己批评一下。好像没法用2SLS,因为这个case没有那个vairable需要IV,而是你得sample有可能是biased。为啥不用Heckman呵?
顺便弱问一下,对于大多数regression,我发现OLS的结果和很多fancy的treatment的结果都很相近。难道大多数fancy的econometric model都是用来抵御可有可无的质疑的吗? |
|
t*****e 发帖数: 2228 | 33 多谢,还是不行
>> M = dlmread('def.dat', '', 5, 2);
Error using dlmread (line 141)
Mismatch between file and format string.
Trouble reading number from file (row 1u, field 14u) ==> 我数据里有几列是字
符串
M = csvread('def.dat', 5, 2);
Error using dlmread (line 141)
Number of HeaderColumns is greater than number of columns in file.
Error in csvread (line 50)
m=dlmread(filename, ',', r, c);
不知道错哪里了.
还有一个更重要的问题. 我的主要任务是把一个sas project 在matlab里实现,把sas
data读入matlab只是第一步. sas里用的是proc genmod link = logit etc 那我
matl... 阅读全帖 |
|
w*******9 发帖数: 1433 | 34 Since you said the sensitivity was important, it helps to select different
threshold (usually it's 0.5) to trade off between specificity and
sensitivity. |
|
P****D 发帖数: 11146 | 35 不符合应用要求的就是不好,任何场合任何学科都适用。 |
|
z**********i 发帖数: 12276 | 36 我曾经有一个RATE/COUNT DATA,用NLMIXED比较了BINOMIAL, POISSON, NEGATIVE
BINOMIAL, BETA BINOMIAL, ZERO-INFLATION.
后来,用GEE给了结果.
很愿意学习你的TOBIT MODEL, BETA MODEL, SIMPLEX MODEL AND FRACTIONAL LOGIT
MODEL.
你如何来比较这些方法的优缺点,或哪个是最优的MODEL?
期待... |
|
z**********i 发帖数: 12276 | 37 我曾经有一个RATE/COUNT DATA,用NLMIXED比较了BINOMIAL, POISSON, NEGATIVE
BINOMIAL, BETA BINOMIAL, ZERO-INFLATION.
后来,用GEE给了结果.
很愿意学习你的TOBIT MODEL, BETA MODEL, SIMPLEX MODEL AND FRACTIONAL LOGIT
MODEL.
你如何来比较这些方法的优缺点,或哪个是最优的MODEL?
期待... |
|
s*********e 发帖数: 1051 | 38 repeated measures for fractional logit |
|
h******n 发帖数: 190 | 39 Okay, I think the question is about how do you treat a continues variable in
your model ?
First of all, you need plot to see the distribution of this variable, and
another plot to see its relationship with outcome - with this plot, you may
want to try use logit(Y) as Y-axis in addition to a binary Y.
Then, based on your plots, you may decide the way you want to use it - you
can use it as lienar, as with this way, it might be better to be
centered, or be devided by a unit such as 10 or 100 or 100... 阅读全帖 |
|
g*******u 发帖数: 148 | 40 My bad. Now I got you. The quick answer is, yes, people in my field are serious in the use of Bayesian . You can go check:
http://www.sawtoothsoftware.com/products/cbc/cbchb.shtml
As you can see, this is a module for the implementation of hierarchical
Bayes. The module is only 5 MB in size but asks for USD $2,000!
In the design phase we have several different methods to create surveys, while in the analytics phase the underlying method is all about random-effect logit model using hierarchical Ba... 阅读全帖 |
|
A*******s 发帖数: 3942 | 41 sorry i misunderstood ur post. for binary outcome, generally the choice of
link function depends on interpretation. e.g, logit for log-odds
interpretation, probit for normally distributed censored latent variables, complementary log log if the
censored latent variable is survival time. |
|
n*********e 发帖数: 318 | 42 谢谢, try 了一遍, 结果一样。
> pred2<-predict(logit.1, validation[,-1], type='response')
> sum(pred-pred2)
[1] 0
----------------------------
直说已想问一下, 是因为之后还要照样做svn, neural net, tree, random forest
看过网上的 例子, 有的人从validation set 中删掉 dependent variable, 然后
再 run prediction; 有的人直接在完整的validation set 上 run prediction
想知道有没有 Good R Coding Practice? |
|
k*****u 发帖数: 1688 | 43 在sas里面,据说multinomial logit模型handle不了,老是崩溃。
discriminative analysis效果不好。
对这种category很多的模型,一般用什么办法?有没有兄弟有经验的? |
|
|
a****g 发帖数: 8131 | 45 首先,你可以从model中得到odds, odds的改变对于x是线性的,但是相对应的prob(y)应
该不是线性的
sorry, odds也不是线性的,logit才是
in |
|
g******d 发帖数: 231 | 46 在做multinomial logit model时,需要将observation restriction到一个subset。
我知道在SAS中,直接写:
if sex eq 1;
proc something; 就可以了。
那么在Stata下这个"if sex eq 1"该怎么操作?
谢谢! |
|
c****e 发帖数: 1842 | 47 dont thinks so
why dont you try fixed effect logit ?
xtlogit, fe |
|
g******d 发帖数: 231 | 48 发现现有的Stata command都是在其他变量held at their means的情况下计算的。我现
在需要的是用特定的指定值来计算,比如mode,或者其他的一个arbitrary number。请
问有什么方便的command没有?
目前更prefer Stata, 但是如果有SAS的方法,也可以考虑。 |
|
|
T*********n 发帖数: 36 | 50 四种实验动物(3种是mutants,1种是control),四种药物,这样就有16种药物动物组
合。对每种组合,在8个药物浓度下进行实验,得到一条响应-浓度曲线(响应总在0-1
之间)。继续对这种药物动物组合再重复以上实验9次,共得到10条实验曲线。
按照理论建议的公式,对每条实验曲线的数据做回归 (logit analysis). 在回归得到
的曲线上,对应响应值0.5的浓度是我们所关心的,记做C. 这样对每一药物动物组合
就得到10个C值,记为Ci,i=1, …, 10。然后定义一个量S,Si= [Ci / mean of C(
control) ] -1,这里mean of C(control)是control动物在相应药物下的C值的平均。
这样Si实际就是Ci值相对于control动物的偏移。
现在,我们需要比较不同药物或不同动物的Si(类似于做ANOVA)。为此,需要有每个
Si的variance或error(因为每个Si都是从回归结果计算来的). 请问如何计算每个Si
的variance/error呢?
请大家多帮忙啊,谢谢了! |
|