Comments for Jörn's Blog

Comment on SciPy Hierarchical Clustering and Dendrogram Tutorial by Hemendra

Hemendra — Fri, 18 Jun 2021 16:07:21 +0000

Thanks a lot for this tutorial. It really cleared up this topic for me even though i did not have much prior knowledge.

Thanks a lot.

Comment on SciPy Hierarchical Clustering and Dendrogram Tutorial by Simon Watson

Simon Watson — Fri, 11 Jun 2021 09:30:06 +0000

Thanks so much for this Jorn!

There are so few people who have actually taken the time to go through why the table of outputs looks like it does as well as just looking at the way the chart works.

I was really struggling but a few simple steps cleared the confusion away.

Really well done – thanks so much from a fellow traveler.

Simon

Comment on SciPy Hierarchical Clustering and Dendrogram Tutorial by valentino

valentino — Wed, 16 Sep 2020 13:14:01 +0000

In reply to Elaine. it is the same, just that the numbers are in a different format!

Comment on SciPy Hierarchical Clustering and Dendrogram Tutorial by Catarina Nogueira

Catarina Nogueira — Mon, 25 May 2020 07:35:04 +0000

Hi! Great tutorial, thanks!

I have a question though, is it not possible to retrieve the data from the cut done on the dendrogram?
Ex: In your third dendrogram image you have (23), so can’t I know which labels were grouped on this (23) so I don’t have to apply fcluster to cut?

Thanks!!

Comment on SciPy Hierarchical Clustering and Dendrogram Tutorial by hans

hans — Tue, 28 Apr 2020 20:52:07 +0000

Still helping after all these years, thanks.
Hans

Comment on SciPy Hierarchical Clustering and Dendrogram Tutorial by Alejandro

Alejandro — Tue, 31 Mar 2020 18:52:37 +0000

Great explanation!! Thank you.

If I have more data than my computer can manage, is it possible to split data in batches? How “linkage” could be used using data batches? Thank you!

Comment on SciPy Hierarchical Clustering and Dendrogram Tutorial by Asma

Asma — Mon, 24 Feb 2020 05:31:44 +0000

This was so helpful! Thanks so much.

Comment on SciPy Hierarchical Clustering and Dendrogram Tutorial by Eugene

Eugene — Wed, 31 Jul 2019 13:43:34 +0000

Sorry if I wasn’t clear. In my dendrogram output, I have 1200 data labels across the bottom for each stock ticker. depending on where I put the max_d, it will create clusters in different colors. I want to export the tickers themselves into excel, grouped by the cluster they fell into.

Comment on SciPy Hierarchical Clustering and Dendrogram Tutorial by joern

joern — Wed, 31 Jul 2019 07:57:29 +0000

In reply to Eugene. I don't really understand the problem, as clusters just gives cluster ids to the indices of your X. If you want all indices per cluster all you have to do is invert them... e.g., like this

from itertools import groupby ; {cid:[cp[1] for cp in cl] for cid, cl in groupby(sorted((cid, idx) for idx, cid in enumerate(clusters)), lambda x: x[0])}

Comment on SciPy Hierarchical Clustering and Dendrogram Tutorial by Eugene

Eugene — Tue, 30 Jul 2019 20:38:29 +0000

This is super helpful in my analysis. Question for you. If I have a dendrogram consisting of 1200 instruments. And whether it auto clusters or I choose a distance threshold upon which to cluster and I get X number of clusters, is there a way to pull out the labels of the instruments that fall within each of the X clusters by cluster? Let’s say I’m doing this with stocks, and I put in 1000 stocks and it clusters everything into 5 clusters. How can I pull out the list of tickers that are in cluster 1, cluster 2, etc…