Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
120 commits
Select commit Hold shift + click to select a range
3a32907
Draft EventProcessor Loadbalancing
Aug 22, 2019
39b1b86
EventProcessor Load balancing
Aug 22, 2019
64da8ed
Small changes from code review
Aug 23, 2019
17f5153
small changes from bryan's review
Aug 23, 2019
04ef548
remove checkpoint manager from initialize
Aug 23, 2019
9be1741
small changes
Aug 23, 2019
d951dcf
change EventData.msg_properties to private attribute
Aug 26, 2019
8bbac25
remove abstract method
Aug 27, 2019
875841e
initial blob storage
Aug 28, 2019
70a33d0
code clean 1
Aug 28, 2019
6df7253
fix leaking connection str
Aug 28, 2019
abbdd25
code clean 2
Aug 28, 2019
b45d6b3
Fix pylint
Aug 29, 2019
247004a
Fix pylint
Aug 29, 2019
6ace6ce
Use properties EventData.partition_key
Aug 29, 2019
008421d
Small changes from code review
Aug 23, 2019
b8c027d
change EventData.msg_properties to private attribute
Aug 26, 2019
2489dd3
remove abstract method
Aug 27, 2019
3a2d72f
code clean 1
Aug 28, 2019
9735756
code clean 2
Aug 28, 2019
288617e
Fix pylint
Aug 29, 2019
2bdbffe
Fix pylint
Aug 29, 2019
e8ea699
Use properties EventData.partition_key
Aug 29, 2019
1b5753c
Draft EventProcessor Loadbalancing
Aug 22, 2019
b4b77f9
EventProcessor Load balancing
Aug 22, 2019
1787fdd
small changes from bryan's review
Aug 23, 2019
c2d0155
remove checkpoint manager from initialize
Aug 23, 2019
1074385
small changes
Aug 23, 2019
386baf0
Fix code review feedback
Aug 29, 2019
889597c
Merge branch 'eventhubs_preview3' of github.com:Azure/azure-sdk-for-p…
Aug 29, 2019
cb08478
Use properties EventData.partition_key
Aug 29, 2019
b3dcd07
Temporarily disable pylint errors that need refactoring
Aug 29, 2019
b85e6cc
fix pylint errors
Aug 29, 2019
92feb09
Merge branch 'master' into eventhubs_preview3
Aug 29, 2019
1afbf0c
Merge branch 'eventhubs_yx' of github.com:Azure/azure-sdk-for-python …
Aug 30, 2019
c126bea
Packaging update of azure-mgmt-datalake-analytics
AutorestCI Aug 30, 2019
40c7f03
Packaging update of azure-loganalytics
AutorestCI Aug 30, 2019
cf22c7c
Packaging update of azure-mgmt-storage
AutorestCI Aug 30, 2019
5e51ce2
fix pylint errors
Aug 30, 2019
726bf6f
ignore eventprocessor pylint temporarily
Aug 30, 2019
c7440b2
Merge branch 'eventhubs_preview3' into eventhubs_yx
Aug 30, 2019
fa804f4
code review fixes and pylint error
Aug 30, 2019
470cf7e
Merge branch 'eventhubs_yx' of github.com:Azure/azure-sdk-for-python …
Aug 30, 2019
ffd8cb0
small pylint adjustment
Aug 30, 2019
e5f3b50
reduce dictionary access
Aug 30, 2019
27cb0bf
initial blob storage
Aug 28, 2019
f6d77e7
fix leaking connection str
Aug 28, 2019
2f69d65
Merge branch 'master' into eventhubs_preview3
Aug 30, 2019
e5c8d1c
Add typing for Python2.7
Aug 30, 2019
32833b3
Merge branch 'eventhubs_blobstorage' of github.com:Azure/azure-sdk-fo…
Aug 30, 2019
e85ac17
[EventHub] IoTHub management operations improvement and bug fixing (#…
yunhaoling Sep 2, 2019
da6199f
Change test polling to 5 sec
Sep 2, 2019
e6a7c5e
Add async lock to ensure etag consistency
Sep 2, 2019
ebc4362
Add dependency to azure-storage
Sep 2, 2019
1503604
Remove dependency on PartitionManager of azure-eventhub
Sep 2, 2019
c9707c4
Fix azure-storage-blob requirement error
Sep 2, 2019
8343876
Revert "Packaging update of azure-mgmt-storage"
Sep 2, 2019
66c5b31
Revert "Packaging update of azure-loganalytics"
Sep 2, 2019
bcd851a
Revert "Packaging update of azure-mgmt-datalake-analytics"
Sep 2, 2019
d7b2606
Merge branch 'eventhubs_yx' into eventhubs_blobstorage
Sep 2, 2019
d740bb0
Trivial code change
Sep 2, 2019
778ab66
Add docstring to BlobPartitionManager
Sep 2, 2019
017d9f0
Merge branch 'eventhubs_yx' into eventhubs_blobstorage
Sep 2, 2019
1fb341b
[EventHub] Retry refactor (#7026)
yunhaoling Sep 3, 2019
aad6978
Refine exception handling for eventprocessor
Sep 3, 2019
a55dc13
Enable pylint for eventprocessor
Sep 3, 2019
a339985
Expose OwnershipLostError
Sep 3, 2019
9bed566
Refine exception handling
Sep 3, 2019
b878002
Merge branch 'eventhubs_yx' into eventhubs_blobstorage
Sep 3, 2019
7762130
add system_properties to EventData
Sep 3, 2019
1b10d00
Fix a small bug
Sep 4, 2019
13237b5
Refine example code
Sep 4, 2019
cbc6792
handle exception for claim_ownership and update_checkpoint
Sep 4, 2019
8748e1f
Add license info
Sep 4, 2019
97e0558
Restructure packages
Sep 4, 2019
e61d6a1
Lock each ownership with a separate lock
Sep 4, 2019
9102713
Move eventprocessor to aio
Sep 4, 2019
278592c
change checkpoint_manager to partition context
Sep 4, 2019
665f28c
fix pylint error
Sep 4, 2019
998eeed
Update receive method (#7064)
yunhaoling Sep 4, 2019
b03cc64
Merge branch 'eventhubs_yx' into eventhubs_blobstorage
Sep 4, 2019
db93fd4
Re-org namespace package structure
Sep 4, 2019
2050615
raise error while list_ownership got an exception
Sep 5, 2019
2781062
Restructure package structure
Sep 5, 2019
8a32e44
replace checkpointer with checkpointstore as a part of package name
Sep 6, 2019
e13ddee
Update accessibility of class (#7091)
yunhaoling Sep 6, 2019
f616f37
Update samples and codes according to the review (#7098)
yunhaoling Sep 6, 2019
dad5baa
Python EventHubs load balancing (#6901)
YijunXieMS Sep 7, 2019
8e7e1c1
Fix a pylint error
Sep 7, 2019
88ca853
Merge remote-tracking branch 'central/eventhubs_preview3' into eventh…
Sep 7, 2019
b32417b
Merge remote-tracking branch 'central/eventhubs_preview3' into eventh…
Sep 7, 2019
667f0b0
remove duplicated partition manager
Sep 7, 2019
f28365c
Fix a bug in list_ownership
Sep 7, 2019
74f39ce
Add pytest for blob partition manager
Sep 8, 2019
9959dc0
remove conftest.py from blob partition manager
Sep 8, 2019
1fd2243
Cache BlobClient instead of using ContainerClient to improve performance
Sep 9, 2019
b0e27a3
fix a list_ownership bug
Sep 9, 2019
c062fe0
add python requires
Sep 9, 2019
2b78446
Small fix
Sep 9, 2019
5bd2420
Change azure storage blob dependency version
Sep 9, 2019
05bdf04
Merge branch 'master' into eventhubs_blobstorage
Sep 9, 2019
b5af820
universal=0 by definition
Sep 9, 2019
74391a4
remove azure-eventhubs from dev requirement
Sep 9, 2019
f4f38bd
Update HISTORY
Sep 10, 2019
feffcfd
Update README
Sep 10, 2019
bb71c4a
Update README
Sep 10, 2019
2a41d06
empty init file under folder extensions to align with azure-eventhub
Sep 10, 2019
4553bba
Update readme.md
yunhaoling Sep 10, 2019
6938fc2
Fix a link issue
Sep 10, 2019
ca49bd8
Merge branch 'eventhubs_blobstorage' of github.com:YijunXieMS/azure-s…
Sep 10, 2019
2939ba2
fix a class name issue
Sep 10, 2019
89c99d1
add azure-eventhubs in dev_requirement
Sep 11, 2019
fd8ecdf
Revert "add azure-eventhubs in dev_requirement"
Sep 11, 2019
f7b85b4
Merge branch 'master' into eventhubs_blobstorage
Sep 11, 2019
74bd105
add azure-eventhubs in dev_requirement
Sep 11, 2019
c45bba9
Merge branch 'master' into eventhubs_blobstorage
Sep 11, 2019
f1053d2
Update azure-eventhub dependency to 5.0.0b3
Sep 11, 2019
da9f415
override azure-storage-blob version for azure-eventhubs-checkpointsto…
Sep 11, 2019
5eaa826
Add azure-eventhub in shared_requirements.txt
Sep 11, 2019
06896c0
Add extensions in manifest.in
Sep 11, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
EventProcessor Load balancing
  • Loading branch information
yijxie committed Aug 29, 2019
commit b4b77f9d6bb19fb2f4f7e6fa957c25fe572a4fec
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
# Licensed under the MIT License. See License.txt in the project root for license information.
# -----------------------------------------------------------------------------------

from typing import List, Iterable, Any, Dict
import time
import random
import math
Expand Down Expand Up @@ -34,8 +33,8 @@ async def claim_ownership(self):
"""Claims ownership for this EventProcessor
1. Retrieves all partition ids of an event hub from azure event hub service
2. Retrieves current ownership list via this EventProcessor's PartitionManager.
3. Searches claimable partitions for this EventProcessor. Refer to claim_ownership() for details.
4. Claims the ownership for the claimable partitions
3. Balances number of ownership. Refer to _balance_ownership() for details.
4. Claims the ownership for the balanced number of partitions.

:return: List[Dict[Any]]
"""
Expand All @@ -53,19 +52,35 @@ async def _retrieve_partition_ids(self):
self.all_parition_ids = await self.eventhub_client.get_partition_ids()

async def _balance_ownership(self):
"""Balances and claims ownership of partitions for this EventProcessor.
The balancing algorithm is:
1. Find partitions with inactive ownership and partitions that haven never been claimed before
2. Find the number of active owners, including this EventProcessor, for all partitions.
3. Calculate the average count of partitions that an owner should own.
(number of partitions // number of active owners)
4. Calculate the largest allowed count of partitions that an owner can own.
math.ceil(number of partitions / number of active owners). This should be equal or 1 greater than the average count
5. Adjust the number of partitions owned by this EventProcessor (owner)
a. if this EventProcessor owns more than largest allowed count, abandon one partition
b. if this EventProcessor owns less than average count, add one from the inactive or unclaimed partitions,
or steal one from another owner that has the largest number of ownership among all owners (EventProcessors)
c. Otherwise, no change to the ownership

The balancing algorithm adjust one partition at a time to gradually build the balanced ownership.
Ownership must be renewed to keep it active. So the returned result includes both existing ownership and
the newly adjusted ownership.
This method balances but doesn't claim ownership. The caller of this method tries to claim the result ownership
list. But it may not successfully claim all of them because of concurrency. Other EventProcessors may happen to
claim a partition at that time. Since balancing and claiming are run in infinite repeatedly,
it achieves balancing among all EventProcessors after some time of running.

:return: List[Dict[str, Any]], A list of ownership.
"""
ownership_list = await self.partition_manager.list_ownership(self.eventhub_client.eh_name, self.consumer_group_name)
ownership_dict = dict((x["partition_id"], x) for x in ownership_list) # put the list to dict for fast lookup
'''
now = time.time()
partition_ids_no_ownership = list(filter(lambda x: x not in ownership_dict, self.all_parition_ids))
inactive_ownership = filter(lambda x: x["last_modified_time"] + self.ownership_timeout < now, ownership_list)
claimable_partition_ids = partition_ids_no_ownership + [x["partition_id"] for x in inactive_ownership]
active_ownership = list(filter(lambda x: x["last_modified_time"] + self.ownership_timeout >= now, ownership_list))
active_ownership_count_group_by_owner = Counter([x["owner_id"] for x in active_ownership])
active_ownership_self = list(filter(lambda x: x["owner_id"] == self.owner_id, active_ownership))
'''
claimable_partition_ids = []
active_ownership_self = []

claimable_partition_ids = [] # partitions with inactive ownership and partitions that have never been claimed yet
active_ownership_self = [] # active ownership of this EventProcessor
active_ownership_count_group_by_owner = Counter()
for partition_id in self.all_parition_ids:
ownership = ownership_dict.get(partition_id)
Expand Down Expand Up @@ -100,7 +115,7 @@ async def _balance_ownership(self):
{"partition_id": random_partition_id,
"eventhub_name": self.eventhub_client.eh_name,
"consumer_group_name": self.consumer_group_name,
"owner_level": 0})
"owner_level": 0}) # TODO: consider removing owner_level
random_chosen_to_claim["owner_id"] = self.owner_id
to_claim.append(random_chosen_to_claim)
else: # steal from another owner that has the most count
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,13 @@
import uuid
import asyncio
import logging
from enum import Enum

from azure.eventhub import EventPosition, EventHubError
from azure.eventhub.aio import EventHubClient
from .checkpoint_manager import CheckpointManager
from .partition_manager import PartitionManager
from ._ownership_manager import OwnershipManager
from .partition_processor import PartitionProcessor, CloseReason
from .utils import get_running_loop
from .partition_processor import CloseReason, PartitionProcessor

log = logging.getLogger(__name__)

Expand All @@ -29,20 +27,21 @@ class EventProcessor(object):

It provides the user a convenient way to receive events from multiple partitions and save checkpoints.
If multiple EventProcessors are running for an event hub, they will automatically balance load.
This load balancing won't be available until preview 3.

Example:
.. code-block:: python

class MyPartitionProcessor(PartitionProcessor):
async def process_events(self, events):
if events:
# do something sync or async to process the events
await self._checkpoint_manager.update_checkpoint(events[-1].offset, events[-1].sequence_number)

import asyncio
from azure.eventhub.aio import EventHubClient
from azure.eventhub.eventprocessor import EventProcessor, PartitionProcessor, Sqlite3PartitionManager

class MyPartitionProcessor(object):
async def process_events(self, events, checkpoint_manager):
if events:
# do something sync or async to process the events
await checkpoint_manager.update_checkpoint(events[-1].offset, events[-1].sequence_number)


client = EventHubClient.from_connection_string("<your connection string>", receive_timeout=5, retry_total=3)
partition_manager = Sqlite3PartitionManager()
try:
Expand All @@ -55,7 +54,7 @@ async def process_events(self, events):

"""
def __init__(self, eventhub_client: EventHubClient, consumer_group_name: str,
partition_processor_factory,
partition_processor_factory: Callable[..., PartitionProcessor],
partition_manager: PartitionManager, **kwargs):
"""
Instantiate an EventProcessor.
Expand All @@ -73,6 +72,8 @@ def __init__(self, eventhub_client: EventHubClient, consumer_group_name: str,
:type partition_manager: Class implementing the ~azure.eventhub.eventprocessor.PartitionManager.
:param initial_event_position: The offset to start a partition consumer if the partition has no checkpoint yet.
:type initial_event_position: int or str
:param polling_interval: The interval between any two pollings of balancing and claiming
:type float

"""

Expand All @@ -98,9 +99,12 @@ def __repr__(self):
async def start(self):
"""Start the EventProcessor.

1. retrieve the partition ids from eventhubs.
2. claim partition ownership of these partitions.
3. repeatedly call EvenHubConsumer.receive() to retrieve events and call user defined PartitionProcessor.process_events().
1. Calls the OwnershipManager to keep claiming and balancing ownership of partitions in an
infinitely loop until self.stop() is called.
2. Cancels tasks for partitions that are no longer owned by this EventProcessor
3. Creates tasks for partitions that are newly claimed by this EventProcessor
4. Keeps tasks running for partitions that haven't changed ownership
5. Each task repeatedly calls EvenHubConsumer.receive() to retrieve events and call user defined partition processor

:return: None

Expand All @@ -111,22 +115,23 @@ async def start(self):
self._running = True
while self._running:
claimed_ownership_list = await ownership_manager.claim_ownership()
claimed_partition_ids = [x["partition_id"] for x in claimed_ownership_list]
to_cancel_list = self._tasks.keys() - claimed_partition_ids
if to_cancel_list:
self._cancel_tasks_for_partitions(to_cancel_list)
log.info("EventProcesor %r has cancelled partitions %r", self._id, to_cancel_list)

if claimed_partition_ids:
if claimed_ownership_list:
claimed_partition_ids = [x["partition_id"] for x in claimed_ownership_list]
to_cancel_list = self._tasks.keys() - claimed_partition_ids
self._create_tasks_for_claimed_ownership(claimed_ownership_list)
else:
log.warning("EventProcessor %r hasn't claimed an ownership. It keeps claiming.", self._id)
to_cancel_list = self._tasks.keys()
if to_cancel_list:
self._cancel_tasks_for_partitions(to_cancel_list)
log.info("EventProcesor %r has cancelled partitions %r", self._id, to_cancel_list)
await asyncio.sleep(self._polling_interval)

async def stop(self):
"""Stop all the partition consumer
"""Stop claiming ownership and all the partition consumers owned by this EventProcessor

This method cancels tasks that are running EventHubConsumer.receive() for the partitions owned by this EventProcessor.
This method stops claiming ownership of owned partitions and cancels tasks that are running
EventHubConsumer.receive() for the partitions owned by this EventProcessor.

:return: None

Expand All @@ -152,6 +157,13 @@ def _create_tasks_for_claimed_ownership(self, to_claim_ownership_list):

async def _receive(self, ownership):
log.info("start ownership, %r", ownership)
partition_processor = self._partition_processor_factory()
if not hasattr(partition_processor, "process_events"):
log.error(
"Fatal error: a partition processor should at least have method process_events(events, checkpoint_manager). EventProcessor will stop.")
await self.stop()
raise TypeError("Partition processor must has method process_events(events, checkpoint_manager")

partition_consumer = self._eventhub_client.create_consumer(ownership["consumer_group_name"],
ownership["partition_id"],
EventPosition(ownership.get("offset", self._initial_event_position))
Expand All @@ -161,8 +173,6 @@ async def _receive(self, ownership):
ownership["consumer_group_name"],
ownership["owner_id"],
self._partition_manager)
partition_processor = self._partition_processor_factory()

async def initialize():
if hasattr(partition_processor, "initialize"):
await partition_processor.initialize(checkpoint_manager)
Expand Down Expand Up @@ -192,6 +202,7 @@ async def close(close_reason):
)
await process_error(cancelled_error)
await close(CloseReason.SHUTDOWN)
# TODO: release the ownership immediately via partition manager
break
except EventHubError as eh_err:
reason = CloseReason.LEASE_LOST if eh_err.error == "link:stolen" else CloseReason.EVENTHUB_EXCEPTION
Expand All @@ -205,11 +216,10 @@ async def close(close_reason):
eh_err
)
await process_error(eh_err)
await close(reason)
await close(reason) # An EventProcessor will pick up this partition again after the ownership is released
# TODO: release the ownership immediately via partition manager
break
except Exception as exp:
log.warning(exp)
# TODO: will review whether to break and close partition processor after user's code has an exception
# TODO: try to inform other EventProcessors to take the partition when this partition is closed in preview 3?
finally:
await partition_consumer.close()
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ class PartitionProcessor(Protocol):
implementing this abstract class will be created for every partition the associated ~azure.eventhub.eventprocessor.EventProcessor owns.

"""
async def initialize(self, checkpoint_manager: CheckpointManager):
async def initialize(self):
pass

async def close(self, reason, checkpoint_manager: CheckpointManager):
Expand All @@ -45,7 +45,7 @@ async def process_events(self, events: List[EventData], checkpoint_manager: Chec
:type events: list[~azure.eventhub.common.EventData]

"""
pass
raise NotImplementedError

async def process_error(self, error, checkpoint_manager: CheckpointManager):
"""Called when an error happens
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -121,9 +121,16 @@ async def update_checkpoint(self, eventhub_name, consumer_group_name, partition_
offset, sequence_number):
cursor = self.conn.cursor()
try:
cursor.execute("update " + _check_table_name(self.ownership_table) + " set offset=?, sequence_number=? where eventhub_name=? and consumer_group_name=? and partition_id=?",
(offset, sequence_number, eventhub_name, consumer_group_name, partition_id))
self.conn.commit()
cursor.execute("select owner_id from " + _check_table_name(self.ownership_table) + " where eventhub_name=? and consumer_group_name=? and partition_id=?",
(eventhub_name, consumer_group_name, partition_id))
cursor_fetch = cursor.fetchall()
if cursor_fetch and owner_id == cursor_fetch[0][0]:
cursor.execute("update " + _check_table_name(self.ownership_table) + " set offset=?, sequence_number=? where eventhub_name=? and consumer_group_name=? and partition_id=?",
(offset, sequence_number, eventhub_name, consumer_group_name, partition_id))
self.conn.commit()
else:
logger.info("EventProcessor couldn't checkpoint to partition %r because it no longer has the ownership", partition_id)

finally:
cursor.close()

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
import os
from azure.eventhub.aio import EventHubClient
from azure.eventhub.eventprocessor import EventProcessor
from azure.eventhub.eventprocessor import PartitionProcessor
from azure.eventhub.eventprocessor import Sqlite3PartitionManager

RECEIVE_TIMEOUT = 5 # timeout in seconds for a receiving operation. 0 or None means no timeout
Expand All @@ -18,33 +17,23 @@ async def do_operation(event):
print(event)


class MyPartitionProcessor(PartitionProcessor):
def __init__(self, checkpoint_manager):
super(MyPartitionProcessor, self).__init__(checkpoint_manager)

async def process_events(self, events):
class MyPartitionProcessor(object):
async def process_events(self, events, checkpoint_manager):
if events:
await asyncio.gather(*[do_operation(event) for event in events])
await self._checkpoint_manager.update_checkpoint(events[-1].offset, events[-1].sequence_number)


def partition_processor_factory(checkpoint_manager):
return MyPartitionProcessor(checkpoint_manager)


async def run_awhile(duration):
client = EventHubClient.from_connection_string(CONNECTION_STR, receive_timeout=RECEIVE_TIMEOUT,
retry_total=RETRY_TOTAL)
partition_manager = Sqlite3PartitionManager()
event_processor = EventProcessor(client, "$default", MyPartitionProcessor, partition_manager)
try:
asyncio.ensure_future(event_processor.start())
await asyncio.sleep(duration)
await event_processor.stop()
finally:
await partition_manager.close()
await checkpoint_manager.update_checkpoint(events[-1].offset, events[-1].sequence_number)
else:
print("empty events received", "partition:", checkpoint_manager.partition_id)


if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(run_awhile(60))
client = EventHubClient.from_connection_string(CONNECTION_STR, receive_timeout=RECEIVE_TIMEOUT, retry_total=RETRY_TOTAL)
partition_manager = Sqlite3PartitionManager(db_filename="eventprocessor_test_db")
event_processor = EventProcessor(client, "$default", MyPartitionProcessor, partition_manager, polling_interval=1)
try:
loop.run_until_complete(event_processor.start())
except KeyboardInterrupt:
loop.run_until_complete(event_processor.stop())
finally:
loop.stop()