i can do some code.
Text Segmentation: Supervised approach
Here is probably the last article about text segmentation. Previously I’ve explored three approaches: GraphSeg, TextTiling, and an improved TextTiling with an embeddings-based similarity metric.
Text Segmentation: A Closer Look at Clustering and Beyond
This is the third installment in a series about text segmentation.
Exploring Text Segmentation Algorithms and Their Performance
Intro In recent years, the internet has seen the emergence of numerous new podcasts.
Measuring Text Segmentation Success: A Deep Dive into Key Metrics
Task
I’m currently working on a new pet project centered around podcasts. The initial objective of this project is to identify timestamps when podcast hosts transition to discussing new topics. This task is commonly referred to as Text Segmentation:
Text segmentation involves breaking down written text into meaningful units, such as words, sentences, or topics. [Wikipedia]
To test this concept, I’ll use the Russian podcast DevZen as an example. Over the past 9 years, the hosts have been discussing various subjects related to technology, software engineering, and databases on a weekly basis.
OK, I have sklearn classifier and now I want to extend it. Is it possible?
In our product there is scikit-learn classifier (sklearn.linear_model.SGDClassifier to be more precise).
Log rotation with Python
Some time ago I got the task to implement log rotation in Torando web service.