Sample size for clustering analysis - Statistics版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Statistics版 - Sample size for clustering analysis

相关主题
● PCA (principle component analysis) analysis	● 有没有大牛来classifiy一下 PCA用法吗？ (转载)
● Urgent: Hierarchical / Kmeans Clustering Analysis in R 哪个更快？	● 请教一个R:K-means的问题
● 请教一个multi colinearity的问题	● Clustering analysis with categorical variables
● 接着问统计问题（有包子答谢）	● Clustering algorithm for categorical data
● 请问categorical data怎么做 clustering呀	● Joint test for difference in a groups of variables between
● T家面试题目求解答～～	● 在线等，请教一个SAS关于cluster命令的输出结果问题
● Principal component analysis	● 报两个offer-updated-附面试心得 (转载)
● PCA 可以用在mixture of continuous 和categorical variables	● SAS PROC VARCLUS 问题求救

相关话题的讨论汇总
话题: clustering话题: sample话题: what话题: variables话题: analysis

进入Statistics版参与讨论

1

(共1页)

h**t 发帖数: 1678	1 What formula can I use to determine the right sample size for clustering analysis with 100-300 variables? What sampling methodology can be used for k-means or hierarchical clustering on categorical fields so that all values of the categorical fields are included in the sample? Thanks a lot!
c***z 发帖数: 6348	2 A side question, how does K-mean decide the distance if some regressors are binary?
h**t 发帖数: 1678	3 k-means is baed on Euclidean distance calculations. So what ever the data is , it still calculates the distance.
h***x 发帖数: 586	4 Use Varclus (SAS) and PCA to do variable reduction first before running clustering. When you only have 10-20 variables, you won't JiuJie to ask the sampling strategies. I do not like kmeans. Everytime when I reset the seeds, or even reorder the dataset, and I will have different results, but the pros is I can get the results I desire after trying and trying... Not sure if it is kind of cheating... Non-parameter clustering (modeclus) is a better choice most of the time. It can handle the situation the kmeans cannot handle well because of the data structure problems. Another good way is to combine KMeans with hierarchical method to make two stage clustering. clustering 【在 h**t 的大作中提到】 : k-means is baed on Euclidean distance calculations. So what ever the data is : , it still calculates the distance.
c***z 发帖数: 6348	5 also don't forget to normalize the variables
s*********h 发帖数: 6288	6 twp step clustering 在R里有吗？ the the It 【在 h***x 的大作中提到】 : Use Varclus (SAS) and PCA to do variable reduction first before running : clustering. When you only have 10-20 variables, you won't JiuJie to ask the : sampling strategies. : I do not like kmeans. Everytime when I reset the seeds, or even reorder the : dataset, and I will have different results, but the pros is I can get the : results I desire after trying and trying... Not sure if it is kind of : cheating... : Non-parameter clustering (modeclus) is a better choice most of the time. It : can handle the situation the kmeans cannot handle well because of the data : structure problems.
g******2 发帖数: 234	7 you can use sparse K-means
h**t 发帖数: 1678	8 Thank you! I know model based clustering and two step clustering are more appropriate for my data. For some reason, I can only use k-means or hierarchical clustering to do finsih some demos... PCA or FA is not preferred; actually we want to keep these many variables... the the It 【在 h***x 的大作中提到】 : Use Varclus (SAS) and PCA to do variable reduction first before running : clustering. When you only have 10-20 variables, you won't JiuJie to ask the : sampling strategies. : I do not like kmeans. Everytime when I reset the seeds, or even reorder the : dataset, and I will have different results, but the pros is I can get the : results I desire after trying and trying... Not sure if it is kind of : cheating... : Non-parameter clustering (modeclus) is a better choice most of the time. It : can handle the situation the kmeans cannot handle well because of the data : structure problems.
h**t 发帖数: 1678	9 this is done already.. 【在 c***z 的大作中提到】 : also don't forget to normalize the variables
c***z 发帖数: 6348	10 too many variables will cause the dimensionality curse.. .. 【在 h**t 的大作中提到】 : Thank you! : I know model based clustering and two step clustering are more appropriate : for my data. For some reason, I can only use k-means or hierarchical : clustering to do finsih some demos... : PCA or FA is not preferred; actually we want to keep these many variables... : : : the : the : It
b********r 发帖数: 764	11 请问这个的算法是怎样的？【在 g******2 的大作中提到】 : you can use sparse K-means

1

(共1页)

进入Statistics版参与讨论

相关主题
● SAS PROC VARCLUS 问题求救	● 请问categorical data怎么做 clustering呀
● Hierarchical linear regression	● T家面试题目求解答～～
● k means clustering number	● Principal component analysis
● 请教一个关于PCA的问题	● PCA 可以用在mixture of continuous 和categorical variables
● PCA (principle component analysis) analysis	● 有没有大牛来classifiy一下 PCA用法吗？ (转载)
● Urgent: Hierarchical / Kmeans Clustering Analysis in R 哪个更快？	● 请教一个R:K-means的问题
● 请教一个multi colinearity的问题	● Clustering analysis with categorical variables
● 接着问统计问题（有包子答谢）	● Clustering algorithm for categorical data

相关话题的讨论汇总
话题: clustering话题: sample话题: what话题: variables话题: analysis

未名新帖统计// 7月16日

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

* 这里只显示发帖超过25的版面，努力灌水吧:-)