feat: add honeycomb-news-sentiment skill#2
feat: add honeycomb-news-sentiment skill#2mebishnusahu0595 wants to merge 3 commits intoaden-hive:mainfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughAdds a new skill Changes
Sequence Diagram(s)sequenceDiagram
participant User as User/CLI
participant Script as sentiment.py
participant News as Google News RSS
participant Parser as XML Parser
participant Analyzer as Sentiment Analyzer
participant Output as JSON Output
User->>Script: provide ticker
Script->>Script: map ticker → job_title
Script->>News: GET "{job_title} AI automation" (HTTP)
News-->>Script: RSS XML response
Script->>Parser: parse RSS items (title, description)
Parser-->>Script: article list
loop per article (up to 10)
Script->>Analyzer: analyze_article(title, description, job_title)
Analyzer->>Analyzer: compute VADER score + rule-based matches
Analyzer-->>Script: article sentiment
end
Script->>Script: aggregate & clamp overall sentiment [-1,1]
Script->>Output: emit JSON result
Output-->>User: printed JSON
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Pull request overview
Adds a new honeycomb-news-sentiment skill to the HoneyComb Exchange agent-skill set, intended to fetch Google News RSS results for a job ticker and compute an AI/automation sentiment score.
Changes:
- Introduces
honeycomb-news-sentimentskill documentation (SKILL.md) with usage, output format, and ticker mapping. - Adds a Python script that queries Google News RSS and scores articles using VADER + custom keyword/context rules.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| honeycomb-news-sentiment/SKILL.md | Adds skill definition + usage/docs for news-based AI automation sentiment scoring |
| honeycomb-news-sentiment/scripts/sentiment.py | Implements RSS fetching and per-article + aggregate sentiment scoring |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - `CPA` -> Accountant | ||
| - `HR` -> Human Resources | ||
| - `NURSE` -> Nurse | ||
| - `TEACHER` -> Teacher |
There was a problem hiding this comment.
The documented ticker mapping list doesn’t match the mapping implemented in get_job_title() (the script also supports TRADER and LAWYER). Please keep the SKILL.md mapping section in sync with the script to avoid users thinking some tickers are unsupported.
| - `TEACHER` -> Teacher | |
| - `TEACHER` -> Teacher | |
| - `TRADER` -> Trader | |
| - `LAWYER` -> Lawyer |
| def analyze_article(title, description, job_title): | ||
| analyzer = SentimentIntensityAnalyzer() | ||
|
|
||
| # 1. Provide VADER sentiment base (General NLP scoring) | ||
| title_vader = analyzer.polarity_scores(title)['compound'] | ||
| desc_vader = analyzer.polarity_scores(description)['compound'] |
There was a problem hiding this comment.
analyze_article() constructs a new SentimentIntensityAnalyzer() for every article. This is relatively expensive and can dominate runtime when analyzing multiple items. Instantiate the analyzer once (e.g., in main() or as a module-level singleton) and pass/reuse it for each article.
| words = set(re.findall(r'\b\w+\b', text)) | ||
|
|
||
| # We use exact phrase matching for list items that have multiple words, and word matching for single words. | ||
| for phrase in automation_negative + context['negative']: | ||
| if phrase in text: | ||
| # title count twice, description once (since title is duplicated in text string) | ||
| occurrences = text.count(phrase) | ||
| custom_score -= (0.15 * occurrences) | ||
|
|
||
| for phrase in automation_positive + context['positive']: | ||
| if phrase in text: | ||
| occurrences = text.count(phrase) | ||
| custom_score += (0.15 * occurrences) |
There was a problem hiding this comment.
The comment says you do “word matching for single words”, but the code uses if phrase in text / text.count(phrase) for all entries. This causes false positives for single-word keywords (substring matches, e.g. matching risk inside brisk) and also leaves words = ... unused. Either implement token-based checks for single-word entries using words (and phrase matching for multi-word entries) or remove the unused tokenization + correct the comment to reflect substring matching.
| words = set(re.findall(r'\b\w+\b', text)) | |
| # We use exact phrase matching for list items that have multiple words, and word matching for single words. | |
| for phrase in automation_negative + context['negative']: | |
| if phrase in text: | |
| # title count twice, description once (since title is duplicated in text string) | |
| occurrences = text.count(phrase) | |
| custom_score -= (0.15 * occurrences) | |
| for phrase in automation_positive + context['positive']: | |
| if phrase in text: | |
| occurrences = text.count(phrase) | |
| custom_score += (0.15 * occurrences) | |
| tokens = re.findall(r'\b\w+\b', text) | |
| words = set(tokens) | |
| # We use exact phrase matching for list items that have multiple words, and token-based word matching for single words. | |
| for phrase in automation_negative + context['negative']: | |
| phrase_l = phrase.lower() | |
| if ' ' in phrase_l: | |
| # Multi-word: exact phrase matching in the full text | |
| if phrase_l in text: | |
| # title counted twice, description once (since title is duplicated in text string) | |
| occurrences = text.count(phrase_l) | |
| custom_score -= (0.15 * occurrences) | |
| else: | |
| # Single word: token-based matching to avoid substring matches (e.g., "risk" in "brisk") | |
| if phrase_l in words: | |
| occurrences = sum(1 for w in tokens if w == phrase_l) | |
| custom_score -= (0.15 * occurrences) | |
| for phrase in automation_positive + context['positive']: | |
| phrase_l = phrase.lower() | |
| if ' ' in phrase_l: | |
| if phrase_l in text: | |
| occurrences = text.count(phrase_l) | |
| custom_score += (0.15 * occurrences) | |
| else: | |
| if phrase_l in words: | |
| occurrences = sum(1 for w in tokens if w == phrase_l) | |
| custom_score += (0.15 * occurrences) |
| try: | ||
| req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'}) | ||
| with urllib.request.urlopen(req) as response: | ||
| xml_data = response.read() |
There was a problem hiding this comment.
urllib.request.urlopen(req) is called without a timeout, so this script can hang indefinitely if Google News is slow or the network stalls. Pass an explicit timeout (and consider surfacing a clearer error message for timeouts) to make behavior more reliable in automation contexts.
| ticker = sys.argv[1] | ||
| job_title = get_job_title(ticker) | ||
|
|
There was a problem hiding this comment.
The script normalizes the ticker inside get_job_title() (removes $ and uppercases), but the JSON output returns the raw CLI input (ticker = sys.argv[1]). If a user passes $SWE, the output will contain $SWE which doesn’t match the documented examples and makes downstream consumption inconsistent. Consider normalizing the ticker once up-front and using the normalized value for both lookup and output.
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
honeycomb-news-sentiment/scripts/sentiment.py (2)
31-32: Reuse a single VADER analyzer instance
SentimentIntensityAnalyzeris created for each article. Reusing one instance per run is cheaper and keeps behavior identical.♻️ Proposed refactor
-def analyze_article(title, description, job_title): - analyzer = SentimentIntensityAnalyzer() +def analyze_article(title, description, job_title, analyzer): @@ - articles = [] + analyzer = SentimentIntensityAnalyzer() + articles = [] @@ - score = analyze_article(title, desc, job_title) + score = analyze_article(title, desc, job_title, analyzer)Also applies to: 131-131
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@honeycomb-news-sentiment/scripts/sentiment.py` around lines 31 - 32, The code creates a new SentimentIntensityAnalyzer for every call to analyze_article (and at another call site around line 131); instead, initialize a single SentimentIntensityAnalyzer once and reuse it: add a module-level analyzer instance (or accept an analyzer parameter) and update analyze_article (and the other invocation) to use that shared SentimentIntensityAnalyzer instead of constructing a new one each call (refer to SentimentIntensityAnalyzer and analyze_article to locate edits).
150-152: Narrow exception handling for clearer failure modesLine 150 catches everything as
Exception, which obscures parse/network-specific failures and weakens diagnostics.♻️ Proposed refactor
- except Exception as e: + except (urllib.error.URLError, ET.ParseError, ValueError) as e: print(json.dumps({"error": str(e)})) sys.exit(1)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@honeycomb-news-sentiment/scripts/sentiment.py` around lines 150 - 152, The catch-all except Exception in sentiment.py should be narrowed to handle specific failure modes: replace the generic except in the try/except around JSON parsing/network calls with targeted handlers such as json.JSONDecodeError for invalid JSON, requests.exceptions.RequestException (or the HTTP client used) for network errors, and ValueError/TypeError where applicable, logging each error via print(json.dumps({...})) and exiting with sys.exit(1); for anything unexpected re-raise the exception so it surfaces instead of being swallowed by a generic handler.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@honeycomb-news-sentiment/scripts/sentiment.py`:
- Around line 128-129: The code accesses .text on elements that may exist but
have None text, which can break sentiment scoring; update the title and desc
extraction to safely default to empty strings by using a short-circuit on the
.text value (e.g., replace the current expressions with something like: let elem
= item.find('title'); title = (elem is not None and (elem.text or "")) or "" and
similarly for description) so both the missing element and None .text cases
return "". Apply this change where title and desc are set in sentiment.py so
downstream sentiment functions receive strings.
- Around line 115-117: The outbound RSS fetch currently calls
urllib.request.urlopen(req) without a timeout which can hang; update the call to
include a sensible timeout (e.g., urlopen(req, timeout=...), not a blocking
default), wrap the urlopen/read block (the req, response, xml_data logic) in a
try/except that catches urllib.error.URLError and socket.timeout (or generic
Exception) to log the failure and fail fast or skip the feed, and ensure any
resources are closed/handled the same way.
- Line 7: Replace the unsafe xml.etree.ElementTree usage with defusedxml to
prevent XXE and XML bomb attacks: change the import from "from xml.etree import
ElementTree as ET" to use "defusedxml.ElementTree" and update any calls that
parse remote XML (e.g., ET.parse, ET.fromstring, ET.fromstringlist or any use of
ET) in this module (sentiment.py) to use the defusedxml.ElementTree API so
remote feed parsing is hardened; ensure all places that reference ET (including
the function that handles Google News feed parsing) import and call the defused
implementation instead of the stdlib ElementTree.
In `@honeycomb-news-sentiment/SKILL.md`:
- Around line 35-42: Update the SKILL.md JSON example and mapping section to
match scripts/sentiment.py: add the "article_scores" field to the example output
object (so the snippet includes "article_scores" alongside ticker, job_title,
query, articles_analyzed, sentiment_score) and add the missing role entries
"TRADER" and "LAWYER" in the mappings section to reflect the same roles used by
scripts/sentiment.py; ensure key names exactly match the script (article_scores,
sentiment_score, TRADER, LAWYER) and adjust surrounding prose/example
descriptions accordingly.
---
Nitpick comments:
In `@honeycomb-news-sentiment/scripts/sentiment.py`:
- Around line 31-32: The code creates a new SentimentIntensityAnalyzer for every
call to analyze_article (and at another call site around line 131); instead,
initialize a single SentimentIntensityAnalyzer once and reuse it: add a
module-level analyzer instance (or accept an analyzer parameter) and update
analyze_article (and the other invocation) to use that shared
SentimentIntensityAnalyzer instead of constructing a new one each call (refer to
SentimentIntensityAnalyzer and analyze_article to locate edits).
- Around line 150-152: The catch-all except Exception in sentiment.py should be
narrowed to handle specific failure modes: replace the generic except in the
try/except around JSON parsing/network calls with targeted handlers such as
json.JSONDecodeError for invalid JSON, requests.exceptions.RequestException (or
the HTTP client used) for network errors, and ValueError/TypeError where
applicable, logging each error via print(json.dumps({...})) and exiting with
sys.exit(1); for anything unexpected re-raise the exception so it surfaces
instead of being swallowed by a generic handler.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ec8b01af-1dcc-44e1-963b-c88b2ee88376
📒 Files selected for processing (2)
honeycomb-news-sentiment/SKILL.mdhoneycomb-news-sentiment/scripts/sentiment.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Closes #1
What this adds
New skill:
honeycomb-news-sentimentFetches real-time Google News RSS for any HoneyComb
job ticker and returns an AI automation sentiment score.
Usage
./scripts/sentiment.py SWE
Output
{
"ticker": "SWE",
"sentiment_score": -0.002,
"articles_analyzed": 10,
"article_scores": [...]
}
Tested on
Summary by CodeRabbit
New Features
Documentation