i can do some code.

Text Segmentation: Supervised approach

Here is probably the last article about text segmentation. Previously I’ve explored three approaches: GraphSeg, TextTiling, and an improved TextTiling with an embeddings-based similarity metric.

Measuring Text Segmentation Success: A Deep Dive into Key Metrics

Task

I’m currently working on a new pet project centered around podcasts. The initial objective of this project is to identify timestamps when podcast hosts transition to discussing new topics. This task is commonly referred to as Text Segmentation:

Text segmentation involves breaking down written text into meaningful units, such as words, sentences, or topics. [Wikipedia]

To test this concept, I’ll use the Russian podcast DevZen as an example. Over the past 9 years, the hosts have been discussing various subjects related to technology, software engineering, and databases on a weekly basis.