Skip to content

Commit f080ca4

Browse files
committed
release: v1.0.0 — word cloud, revisions, statistics, table of contents
v1.0.0 Highlights: - Word Cloud & Frequency Analysis: TF-IDF, stop words, HTML/CSV/markdown output - Document Revision Tracking: LCS diff, rollback, changelog, merge histories - Comprehensive Statistics: 25+ metrics, Flesch-Kincaid/ARI reading level - Table of Contents: auto-generate, inject, numbered, HTML output - 136 public API exports (+27 new) - 1126 tests passing - ~19,000 lines of code
1 parent a25c298 commit f080ca4

4 files changed

Lines changed: 52 additions & 3 deletions

File tree

CHANGELOG.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
## [1.0.0] - 2025-02-25
11+
12+
### Added
13+
14+
- **Word Cloud & Frequency Analysis** (`wordcloud.py`): Generate word frequency data and cloud visualizations. `WordFrequency` dataclass with count, frequency, rank, TF-IDF, weight. `WordCloudData` with multiple output formats (markdown table, inline HTML cloud, CSV, size map). `generate_word_cloud()` with 130+ built-in stop words, configurable max_words, min_length, min_count, custom stop words. `compare_word_clouds()` for frequency distribution comparison. `tfidf_cloud()` for multi-document TF-IDF analysis. Markdown stripping and code block/URL removal in tokenizer.
15+
- **Document Revision Tracking** (`revisions.py`): Track changes between document versions with full history management. `Revision` with SHA-256 content hashing, word/line counts. `RevisionDiff` with LCS-based diff algorithm, unified diff output, markdown format. `RevisionHistory` with add/get/rollback/changelog/statistics. `compute_diff()` with modification detection (adjacent delete+add merging). `track_changes()` for quick two-version comparison. `merge_revisions()` with chronological ordering and deduplication.
16+
- **Comprehensive Statistics** (`statistics.py`): 25+ document metrics with markdown awareness. `TextStatistics` covering characters, words, sentences, paragraphs, vocabulary richness, hapax legomena, reading/speaking time (238/150 WPM). `compare_statistics()` for side-by-side document comparison with diff. `vocabulary_analysis()` with frequency distribution, rare words, type-token ratio. `section_statistics()` for per-heading breakdown. `reading_level()` with Flesch-Kincaid Grade Level and Automated Readability Index.
17+
- **Table of Contents** (`toc.py`): Generate, customize, and inject table of contents from markdown headings. `TocEntry` with auto-anchor slugification, depth tracking. `TableOfContents` with flat view, level filtering, max_depth. Multiple output formats: markdown, numbered markdown (hierarchical 1, 1.1, 1.2), HTML. `extract_toc()` with duplicate anchor handling. `inject_toc()` with marker-based or auto-placement insertion. `merge_tocs()` for combining multiple ToCs.
18+
- **New Exports**: 27 new public API exports. Total public API: 136 exports.
19+
1020
## [0.9.0] - 2025-02-25
1121

1222
### Added

README.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -515,7 +515,11 @@ tracker.complete()
515515
| Bibliography mgmt | APA/MLA/BibTeX + auto-extraction | No |
516516
| Sentiment analysis | Lexicon-based + bias detection | No |
517517
| Cross-referencing | Sections/figures/tables + validation | No |
518-
| Lines of code | ~16,000 | ~10,000+ |
518+
| Word cloud | TF-IDF, frequency analysis, HTML output | No |
519+
| Revision tracking | LCS diff, rollback, changelog | No |
520+
| Document statistics | 25+ metrics, reading level, vocabulary | No |
521+
| Table of contents | Auto-generate, inject, numbered, HTML | No |
522+
| Lines of code | ~19,000 | ~10,000+ |
519523

520524
deepworm is intentionally simple. If you need a web UI, multi-agent orchestration, or enterprise features, use gpt-researcher. If you want a research tool that just works, use deepworm.
521525

@@ -586,6 +590,10 @@ deepworm is intentionally simple. If you need a web UI, multi-agent orchestratio
586590
- **Bibliography management** — APA, MLA, BibTeX formatting with auto-extraction
587591
- **Sentiment analysis** — lexicon-based sentiment, tone, and bias detection
588592
- **Cross-referencing** — internal section/figure/table references with validation
593+
- **Word cloud** — word frequency analysis, TF-IDF, HTML cloud, CSV export
594+
- **Revision tracking** — LCS-based diff, rollback, changelog, merge histories
595+
- **Document statistics** — 25+ metrics, Flesch-Kincaid/ARI reading level, vocabulary analysis
596+
- **Table of contents** — auto-generate from headings, inject with markers, numbered, HTML
589597

590598
## License
591599

deepworm/__init__.py

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
import logging
44

5-
__version__ = "0.9.0"
5+
__version__ = "1.0.0"
66

77
from .annotations import AnnotationSet, AnnotationType, annotate_report, auto_annotate, extract_annotations
88
from .async_api import AsyncResearcher, async_research
@@ -36,12 +36,16 @@
3636
from .readability import ReadabilityResult, analyze_readability
3737
from .references import Reference, Bibliography, extract_references, create_reference, inject_bibliography, merge_bibliographies
3838
from .researcher import DeepResearcher
39+
from .revisions import Revision, Change, RevisionDiff, RevisionHistory, compute_diff, create_revision, create_history, track_changes, merge_revisions
3940
from .scoring import QualityScore, score_report
4041
from .sentiment import SentimentScore, SentimentReport, ToneAnalysis, analyze_sentiment, analyze_tone, analyze_report_sentiment, sentiment_diff
4142
from .similarity import SimilarityResult, compare_texts, cosine_similarity, detect_plagiarism, find_similar
43+
from .statistics import TextStatistics, ComparisonResult, compute_statistics, compare_statistics, vocabulary_analysis, section_statistics, reading_level
4244
from .summary import Summary, extract_key_findings, extract_topics, summarize
4345
from .timeline import Timeline, TimelineEvent, extract_timeline, create_timeline, compare_timelines
46+
from .toc import TocEntry, TableOfContents, extract_toc, generate_toc, inject_toc, merge_tocs
4447
from .validator import ValidationResult, validate_topic
48+
from .wordcloud import WordFrequency, WordCloudData, generate_word_cloud, compare_word_clouds, tfidf_cloud
4549

4650
__all__ = [
4751
"APIKeyError",
@@ -53,6 +57,8 @@
5357
"BatchStatus",
5458
"BatchTask",
5559
"Bibliography",
60+
"Change",
61+
"ComparisonResult",
5662
"ConfigError",
5763
"ContentExtractionError",
5864
"CredibilityReport",
@@ -89,16 +95,24 @@
8995
"ReportOutline",
9096
"ResearchPlan",
9197
"ResearchStage",
98+
"Revision",
99+
"RevisionDiff",
100+
"RevisionHistory",
92101
"SearchError",
93102
"SentimentReport",
94103
"SentimentScore",
95104
"SessionError",
96105
"SimilarityResult",
97106
"Summary",
107+
"TableOfContents",
108+
"TextStatistics",
98109
"Timeline",
99110
"TimelineEvent",
111+
"TocEntry",
100112
"ToneAnalysis",
101113
"ValidationResult",
114+
"WordCloudData",
115+
"WordFrequency",
102116
"__version__",
103117
"add_footnotes",
104118
"analyze_readability",
@@ -110,11 +124,17 @@
110124
"auto_annotate",
111125
"batch_export",
112126
"build_crossref_index",
127+
"compare_statistics",
113128
"compare_texts",
114129
"compare_timelines",
130+
"compare_word_clouds",
131+
"compute_diff",
132+
"compute_statistics",
115133
"cosine_similarity",
116134
"create_batch",
135+
"create_history",
117136
"create_reference",
137+
"create_revision",
118138
"create_timeline",
119139
"detect_plagiarism",
120140
"estimate_complexity",
@@ -127,32 +147,43 @@
127147
"extract_references",
128148
"extract_tags",
129149
"extract_timeline",
150+
"extract_toc",
130151
"extract_topics",
131152
"find_similar",
132153
"generate_list_of_figures",
133154
"generate_list_of_tables",
134155
"generate_outline",
135156
"generate_plan",
157+
"generate_toc",
158+
"generate_word_cloud",
136159
"get_language",
137160
"inject_bibliography",
138161
"inject_crossrefs",
139162
"inject_glossary",
163+
"inject_toc",
140164
"list_languages",
141165
"markdown_to_notion",
142166
"merge_bibliographies",
143167
"merge_footnotes",
168+
"merge_revisions",
169+
"merge_tocs",
144170
"outline_from_report",
171+
"reading_level",
145172
"renumber_footnotes",
146173
"research",
147174
"research_chain",
148175
"run_batch",
149176
"score_report",
150177
"score_source",
151178
"score_sources",
179+
"section_statistics",
152180
"sentiment_diff",
153181
"strip_footnotes",
154182
"summarize",
183+
"tfidf_cloud",
184+
"track_changes",
155185
"validate_topic",
186+
"vocabulary_analysis",
156187
]
157188

158189
# Set up default logging (NullHandler to avoid "No handlers" warnings)

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "deepworm"
7-
version = "0.9.0"
7+
version = "1.0.0"
88
description = "AI-powered deep research agent. Open-source alternative to OpenAI Deep Research."
99
readme = "README.md"
1010
license = {text = "MIT"}

0 commit comments

Comments
 (0)