由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Statistics版 - Sample size for clustering analysis
相关主题
PCA (principle component analysis) analysis有没有大牛来classifiy一下 PCA用法吗? (转载)
Urgent: Hierarchical / Kmeans Clustering Analysis in R 哪个更快?请教一个R:K-means的问题
请教一个multi colinearity的问题Clustering analysis with categorical variables
接着问统计问题(有包子答谢)Clustering algorithm for categorical data
请问categorical data怎么做 clustering呀Joint test for difference in a groups of variables between
T家面试题目求解答~~在线等,请教一个SAS关于cluster命令的输出结果问题
Principal component analysis报两个offer-updated-附面试心得 (转载)
PCA 可以用在mixture of continuous 和categorical variablesSAS PROC VARCLUS 问题求救
相关话题的讨论汇总
话题: clustering话题: sample话题: what话题: variables话题: analysis
进入Statistics版参与讨论
1 (共1页)
h**t
发帖数: 1678
1
What formula can I use to determine the right sample size for clustering
analysis with 100-300 variables?
What sampling methodology can be used for k-means or hierarchical clustering
on categorical fields so that all values of the categorical fields are
included in the sample?
Thanks a lot!
c***z
发帖数: 6348
2
A side question, how does K-mean decide the distance if some regressors are
binary?
h**t
发帖数: 1678
3
k-means is baed on Euclidean distance calculations. So what ever the data is
, it still calculates the distance.
h***x
发帖数: 586
4
Use Varclus (SAS) and PCA to do variable reduction first before running
clustering. When you only have 10-20 variables, you won't JiuJie to ask the
sampling strategies.
I do not like kmeans. Everytime when I reset the seeds, or even reorder the
dataset, and I will have different results, but the pros is I can get the
results I desire after trying and trying... Not sure if it is kind of
cheating...
Non-parameter clustering (modeclus) is a better choice most of the time. It
can handle the situation the kmeans cannot handle well because of the data
structure problems.
Another good way is to combine KMeans with hierarchical method to make two
stage clustering.

clustering

【在 h**t 的大作中提到】
: k-means is baed on Euclidean distance calculations. So what ever the data is
: , it still calculates the distance.

c***z
发帖数: 6348
5
also don't forget to normalize the variables
s*********h
发帖数: 6288
6
twp step clustering 在R里有吗?

the
the
It

【在 h***x 的大作中提到】
: Use Varclus (SAS) and PCA to do variable reduction first before running
: clustering. When you only have 10-20 variables, you won't JiuJie to ask the
: sampling strategies.
: I do not like kmeans. Everytime when I reset the seeds, or even reorder the
: dataset, and I will have different results, but the pros is I can get the
: results I desire after trying and trying... Not sure if it is kind of
: cheating...
: Non-parameter clustering (modeclus) is a better choice most of the time. It
: can handle the situation the kmeans cannot handle well because of the data
: structure problems.

g******2
发帖数: 234
7
you can use sparse K-means
h**t
发帖数: 1678
8
Thank you!
I know model based clustering and two step clustering are more appropriate
for my data. For some reason, I can only use k-means or hierarchical
clustering to do finsih some demos...
PCA or FA is not preferred; actually we want to keep these many variables...


the
the
It

【在 h***x 的大作中提到】
: Use Varclus (SAS) and PCA to do variable reduction first before running
: clustering. When you only have 10-20 variables, you won't JiuJie to ask the
: sampling strategies.
: I do not like kmeans. Everytime when I reset the seeds, or even reorder the
: dataset, and I will have different results, but the pros is I can get the
: results I desire after trying and trying... Not sure if it is kind of
: cheating...
: Non-parameter clustering (modeclus) is a better choice most of the time. It
: can handle the situation the kmeans cannot handle well because of the data
: structure problems.

h**t
发帖数: 1678
9
this is done already..

【在 c***z 的大作中提到】
: also don't forget to normalize the variables
c***z
发帖数: 6348
10
too many variables will cause the dimensionality curse..

..

【在 h**t 的大作中提到】
: Thank you!
: I know model based clustering and two step clustering are more appropriate
: for my data. For some reason, I can only use k-means or hierarchical
: clustering to do finsih some demos...
: PCA or FA is not preferred; actually we want to keep these many variables...
:
:
: the
: the
: It

b********r
发帖数: 764
11
请问这个的算法是怎样的?

【在 g******2 的大作中提到】
: you can use sparse K-means
1 (共1页)
进入Statistics版参与讨论
相关主题
SAS PROC VARCLUS 问题求救请问categorical data怎么做 clustering呀
Hierarchical linear regressionT家面试题目求解答~~
k means clustering numberPrincipal component analysis
请教一个关于PCA的问题PCA 可以用在mixture of continuous 和categorical variables
PCA (principle component analysis) analysis有没有大牛来classifiy一下 PCA用法吗? (转载)
Urgent: Hierarchical / Kmeans Clustering Analysis in R 哪个更快?请教一个R:K-means的问题
请教一个multi colinearity的问题Clustering analysis with categorical variables
接着问统计问题(有包子答谢)Clustering algorithm for categorical data
相关话题的讨论汇总
话题: clustering话题: sample话题: what话题: variables话题: analysis