-
Notifications
You must be signed in to change notification settings - Fork 3.3k
feat(ingest): SageMaker feature store ingestion #2758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
shirshanka
merged 28 commits into
datahub-project:master
from
kevinhu:sagemaker-features
Jun 30, 2021
Merged
Changes from all commits
Commits
Show all changes
28 commits
Select commit
Hold shift + click to select a range
ec8a385
Create common AWS config
kevinhu d3bf612
Init sagemaker
kevinhu 3db0a58
Common AWS dependencies
kevinhu d282559
Get features in feature group
kevinhu 8f455c9
Ingest feature groups
kevinhu 2bfb882
Add example ingestion config
kevinhu 5bff9f4
Fix feature ingestion
kevinhu 44ecb58
Append Glue data catalog source
kevinhu d660a9b
Handle primary key ingestion
kevinhu 0259845
Init tests and stubs
kevinhu cd4d233
Add sagemaker golden
kevinhu 4ff8434
Clean up golden
kevinhu 8971109
Add descriptions and filter primary keys
kevinhu 9133c85
Include custom fields in feature tables
kevinhu 777f7df
Add sagemaker custom properties
kevinhu 3722726
Merge
kevinhu 149584a
Cleanup
kevinhu fb70c0b
Fix old references
kevinhu 1c248c3
Add test stub with offline store
kevinhu 3a4012e
Update custom properties
kevinhu 3b575b1
Merge
kevinhu ffcd8cc
Merge branch 'master' of github.com:kevinhu/datahub into sagemaker-fe…
kevinhu 768393e
Refactor
kevinhu 4bc4601
Merge branch 'master' of github.com:kevinhu/datahub into sagemaker-fe…
kevinhu 63841e4
Update comments
kevinhu 30564cc
Merge branch 'master' of github.com:kevinhu/datahub into sagemaker-fe…
kevinhu 0bbe932
Merge
kevinhu 8f96239
Fix imports order
kevinhu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
10 changes: 10 additions & 0 deletions
10
metadata-ingestion/examples/recipes/sagemaker_to_datahub.yml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| # in this example, AWS creds are detected automatically – see the README for more details | ||
| source: | ||
| type: sagemaker | ||
| config: | ||
| aws_region: "us-west-2" | ||
|
|
||
| sink: | ||
| type: "datahub-rest" | ||
| config: | ||
| server: "http://localhost:8080" | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
87 changes: 87 additions & 0 deletions
87
metadata-ingestion/src/datahub/ingestion/source/aws_common.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,87 @@ | ||
| from functools import reduce | ||
| from typing import List, Optional, Union | ||
|
|
||
| import boto3 | ||
|
|
||
| from datahub.configuration import ConfigModel | ||
| from datahub.configuration.common import AllowDenyPattern | ||
|
|
||
|
|
||
| def assume_role( | ||
| role_arn: str, aws_region: str, credentials: Optional[dict] = None | ||
| ) -> dict: | ||
| credentials = credentials or {} | ||
| sts_client = boto3.client( | ||
| "sts", | ||
| region_name=aws_region, | ||
| aws_access_key_id=credentials.get("AccessKeyId"), | ||
| aws_secret_access_key=credentials.get("SecretAccessKey"), | ||
| aws_session_token=credentials.get("SessionToken"), | ||
| ) | ||
|
|
||
| assumed_role_object = sts_client.assume_role( | ||
| RoleArn=role_arn, RoleSessionName="DatahubIngestionSource" | ||
| ) | ||
| return assumed_role_object["Credentials"] | ||
|
|
||
|
|
||
| class AwsSourceConfig(ConfigModel): | ||
kevinhu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| """ | ||
| Common AWS credentials config. | ||
|
|
||
| Currently used by: | ||
| - Glue source | ||
| - SageMaker source | ||
| """ | ||
|
|
||
| env: str = "PROD" | ||
|
|
||
| database_pattern: AllowDenyPattern = AllowDenyPattern.allow_all() | ||
| table_pattern: AllowDenyPattern = AllowDenyPattern.allow_all() | ||
|
|
||
| aws_access_key_id: Optional[str] = None | ||
| aws_secret_access_key: Optional[str] = None | ||
| aws_session_token: Optional[str] = None | ||
| aws_role: Optional[Union[str, List[str]]] = None | ||
| aws_region: str | ||
|
|
||
| def get_client(self, service: str) -> boto3.client: | ||
| if ( | ||
| self.aws_access_key_id | ||
| and self.aws_secret_access_key | ||
| and self.aws_session_token | ||
| ): | ||
| return boto3.client( | ||
| service, | ||
| aws_access_key_id=self.aws_access_key_id, | ||
| aws_secret_access_key=self.aws_secret_access_key, | ||
| aws_session_token=self.aws_session_token, | ||
| region_name=self.aws_region, | ||
| ) | ||
| elif self.aws_access_key_id and self.aws_secret_access_key: | ||
| return boto3.client( | ||
| service, | ||
| aws_access_key_id=self.aws_access_key_id, | ||
| aws_secret_access_key=self.aws_secret_access_key, | ||
| region_name=self.aws_region, | ||
| ) | ||
| elif self.aws_role: | ||
| if isinstance(self.aws_role, str): | ||
| credentials = assume_role(self.aws_role, self.aws_region) | ||
| else: | ||
| credentials = reduce( | ||
| lambda new_credentials, role_arn: assume_role( | ||
| role_arn, self.aws_region, new_credentials | ||
| ), | ||
| self.aws_role, | ||
| {}, | ||
| ) | ||
| return boto3.client( | ||
| service, | ||
| aws_access_key_id=credentials["AccessKeyId"], | ||
| aws_secret_access_key=credentials["SecretAccessKey"], | ||
| aws_session_token=credentials["SessionToken"], | ||
| region_name=self.aws_region, | ||
| ) | ||
| else: | ||
| return boto3.client(service, region_name=self.aws_region) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.