Skip to content

Commit dad727c

Browse files
MetaHIN: Add README
1 parent ba7fedb commit dad727c

File tree

5 files changed

+415
-8
lines changed

5 files changed

+415
-8
lines changed

Meta-Learning/MetaHIN/README.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# MetaHIN: Meta-Learning on Heterogeneous Information Networks for Cold-start Recommendation
2+
3+
This is the PyTorch implementation of the paper "[Meta-learning on Heterogeneous Information Networks for Cold-start Recommendation](https://yuanfulu.github.io/publication/KDD-MetaHIN.pdf)" that is adapted from the [original codebase](https://github.com/rootlu/MetaHIN).
4+
MetaHIN is a novel attempt to exploit meta-learning on Heterogeneous Information Networks for cold-start recommendation, which alleviates the cold-start problem at both data and model levels.
5+
It leverages multi-faceted semantic contexts and a co-adaption meta-learner in order to learn finer-grained semantic priors for new tasks in both semantic and task-wise manners.
6+
7+
## Scripts
8+
* [data_helper.py](https://github.com/khanhnamle1994/MetaRec/blob/master/Meta-Learning/MetaHIN/data_helper.py): This is the data loader script.
9+
* [data_processor.py](https://github.com/khanhnamle1994/MetaRec/blob/master/Meta-Learning/MetaHIN/data_processor.py): This is the data processor script.
10+
* [config.py](https://github.com/khanhnamle1994/MetaRec/blob/master/Meta-Learning/MetaHIN/config.py): This is the configuration script that includes hyper-parameters used to train MetaHIN.
11+
* [embedding_init.py](https://github.com/khanhnamle1994/MetaRec/blob/master/Meta-Learning/MetaHIN/embedding_init.py): This is the embedding script that converts user and item input features into user and item embeddings.
12+
* [metaHIN.py](https://github.com/khanhnamle1994/MetaRec/blob/master/Meta-Learning/MetaHIN/metaHIN.py): This is the model script that defines MetaHIN.
13+
* [meta_learner.py](https://github.com/khanhnamle1994/MetaRec/blob/master/Meta-Learning/MetaHIN/meta_learner.py): This is the training script that trains MAMO by updating the parameters in a meta-learning paradigm.
14+
* [evaluation.py](https://github.com/khanhnamle1994/MetaRec/blob/master/Meta-Learning/MetaHIN/evaluation.py): This is the evaluation script that evaluates the performance of learned embeddings w.r.t clustering and classification.
15+
* [main.py](https://github.com/khanhnamle1994/MetaRec/blob/master/Meta-Learning/MetaHIN/main.py): This is the main script that executes the whole code.
16+
17+
## Requirements
18+
19+
```
20+
- Python 3.6.9
21+
- PyTorch 1.4.0
22+
```
23+
See the detailed [requirements](https://github.com/rootlu/MetaHIN/blob/master/requirements.txt).
24+
25+
## Citation
26+
27+
```
28+
@inproceedings{lu2020meta,
29+
title={Meta-learning on Heterogeneous Information Networks for Cold-start Recommendation},
30+
author={Lu, Yuanfu and Fang, Yuan and Shi, Chuan},
31+
booktitle={Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
32+
pages={1563--1573},
33+
year={2020}
34+
}
35+
```

Meta-Learning/MetaHIN/config.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# MetaHIN configuration
22
config = {
33
'dataset': 'movielens', # specify MovieLens1M dataset
4-
'mp': ['um', 'umum', 'umam', 'umdm'], #
4+
'mp': ['um', 'umum', 'umam', 'umdm'], # a set of meta-paths
55
'file_num': 12, # each task contains 12 files for movielens
66

77
# item parameters
@@ -24,15 +24,15 @@
2424

2525
'first_fc_hidden_dim': 64, # number of dimensions in the first fully-connected hidden layer
2626
'second_fc_hidden_dim': 64, # number of dimensions in the second fully-connected hidden layer
27-
'mp_update': 1,
28-
'local_update': 1,
27+
'mp_update': 1, # meta-path update
28+
'local_update': 1, # local update
2929
'lr': 5e-4, # step size Beta (global learning rate)
30-
'mp_lr': 5e-3,
30+
'mp_lr': 5e-3, # meta-path learning rate
3131
'local_lr': 5e-3, # step size Alpha (local learning rate)
3232
'batch_size': 32, # number of tasks for each batch
3333
'num_epoch': 100, # number of epochs
34-
'neigh_agg': 'mean',
35-
'mp_agg': 'mean'
34+
'neigh_agg': 'mean', # neighborhood aggregation
35+
'mp_agg': 'mean' # meta-path aggregation
3636
}
3737

3838
'''
Lines changed: 224 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,224 @@
1+
# Import packages
2+
import os
3+
import json
4+
import pandas as pd
5+
import numpy as np
6+
import torch
7+
import re
8+
import random
9+
import pickle
10+
import os
11+
from tqdm import tqdm
12+
import collections
13+
14+
random.seed(13)
15+
16+
# Data directories
17+
input_dir = '../../ml-1m/original/'
18+
output_dir = 'processed-data'
19+
20+
# List of possible states
21+
states = ["warm_up", "user_cold_testing", "item_cold_testing", "user_and_item_cold_testing", "meta_training"]
22+
23+
if not os.path.exists("{}/meta_training/".format(output_dir)):
24+
os.mkdir("{}/log/".format(output_dir))
25+
for state in states:
26+
os.mkdir("{}/{}/".format(output_dir, state))
27+
if not os.path.exists("{}/{}/{}".format(output_dir, "log", state)):
28+
os.mkdir("{}/{}/{}".format(output_dir, "log", state))
29+
30+
# Load ratings data
31+
ui_data = pd.read_csv(input_dir + 'ratings.dat', names=['user', 'item', 'rating', 'timestamp'],
32+
sep="::", engine='python')
33+
print("Number of ratings:", len(ui_data))
34+
35+
# Load user data
36+
user_data = pd.read_csv(input_dir + 'users.dat', names=['user', 'gender', 'age', 'occupation_code', 'zip'],
37+
sep="::", engine='python')
38+
39+
# Load item data
40+
item_data = pd.read_csv(input_dir + 'movies_extrainfos.dat',
41+
names=['item', 'title', 'year', 'rate', 'released', 'genre',
42+
'director', 'writer', 'actors', 'plot', 'poster'],
43+
sep="::", engine='python', encoding="utf-8")
44+
45+
user_list = list(set(ui_data.user.tolist()) | set(user_data.user))
46+
item_list = list(set(ui_data.item.tolist()) | set(item_data.item))
47+
48+
user_num = len(user_list)
49+
item_num = len(item_list)
50+
print("Number of users:", user_num, "and Number of items:", item_num)
51+
52+
"""
53+
1 - Code to process user and item features
54+
"""
55+
56+
57+
def load_list(fname):
58+
"""
59+
Function to load a file into a Python list
60+
:param fname: file name
61+
:return: Python list
62+
"""
63+
list_ = []
64+
with open(fname, encoding="utf-8") as f:
65+
for line in f.readlines():
66+
list_.append(line.strip())
67+
return list_
68+
69+
70+
rate_list = load_list("{}/m_rate.txt".format(input_dir)) # list of rate levels
71+
genre_list = load_list("{}/m_genre.txt".format(input_dir)) # list of genres
72+
actor_list = load_list("{}/m_actor.txt".format(input_dir)) # list of actors
73+
director_list = load_list("{}/m_director.txt".format(input_dir)) # list of directors
74+
gender_list = load_list("{}/m_gender.txt".format(input_dir)) # list of genders
75+
age_list = load_list("{}/m_age.txt".format(input_dir)) # list of ages
76+
occupation_list = load_list("{}/m_occupation.txt".format(input_dir)) # list of occupations
77+
zipcode_list = load_list("{}/m_zipcode.txt".format(input_dir)) # list of zipcodes
78+
79+
# Verify the lists
80+
print("Number of rate levels:", len(rate_list), "\n",
81+
"Number of genres:", len(genre_list), "\n",
82+
"Number of actors:", len(actor_list), "\n",
83+
"Number of directors:", len(director_list), "\n",
84+
"Number of gender:", len(gender_list), "\n",
85+
"Number of age:", len(age_list), "\n",
86+
"Number of occupation:", len(occupation_list), "\n",
87+
"Number of zipcodes:", len(zipcode_list))
88+
89+
90+
def item_converting(row, rate_list, genre_list, director_list, actor_list):
91+
"""
92+
Convert item data into PyTorch tensor
93+
:param row: current row
94+
:param rate_list: list of rate levels
95+
:param genre_list: list of movie genres
96+
:param director_list: list of directors
97+
:param actor_list: list of actors
98+
"""
99+
# Convert rate_list to PyTorch Tensor
100+
rate_idx = torch.tensor([[rate_list.index(str(row['rate']))]]).long()
101+
102+
# Convert genre_list to PyTorch Tensor
103+
genre_idx = torch.zeros(1, 25).long()
104+
for genre in str(row['genre']).split(", "):
105+
idx = genre_list.index(genre)
106+
genre_idx[0, idx] = 1 # one-hot vector
107+
108+
# Convert director_list to PyTorch Tensor
109+
director_idx = torch.zeros(1, 2186).long()
110+
director_id = []
111+
for director in str(row['director']).split(", "):
112+
idx = director_list.index(re.sub(r'\([^()]*\)', '', director))
113+
director_idx[0, idx] = 1
114+
director_id.append(idx + 1) # id starts from 1, not index
115+
116+
# Convert actor_list to PyTorch Tensor
117+
actor_idx = torch.zeros(1, 8030).long()
118+
actor_id = []
119+
for actor in str(row['actors']).split(", "):
120+
idx = actor_list.index(actor)
121+
actor_idx[0, idx] = 1
122+
actor_id.append(idx + 1)
123+
124+
# Concatenate PyTorch tensors into one-dimensional tensor
125+
return torch.cat((rate_idx, genre_idx), 1), \
126+
torch.cat((rate_idx, genre_idx, director_idx, actor_idx), 1), \
127+
director_id, actor_id
128+
129+
130+
def user_converting(row, gender_list, age_list, occupation_list, zipcode_list):
131+
"""
132+
Convert user data into PyTorch tensor
133+
:param row: current row
134+
:param gender_list: list of genders
135+
:param age_list: list of ages
136+
:param occupation_list: list of occupations
137+
:param zipcode_list: list of zipcodes
138+
"""
139+
# Convert gender_list to PyTorch Tensor
140+
gender_idx = torch.tensor([[gender_list.index(str(row['gender']))]]).long()
141+
142+
# Convert age_list to PyTorch Tensor
143+
age_idx = torch.tensor([[age_list.index(str(row['age']))]]).long()
144+
145+
# Convert occupation_list to PyTorch Tensor
146+
occupation_idx = torch.tensor([[occupation_list.index(str(row['occupation_code']))]]).long()
147+
148+
# Convert zipcode_list to PyTorch Tensor
149+
zip_idx = torch.tensor([[zipcode_list.index(str(row['zip'])[:5])]]).long()
150+
151+
# Concatenate PyTorch tensors into one-dimensional tensor
152+
return torch.cat((gender_idx, age_idx, occupation_idx, zip_idx), 1) # (1, 4)
153+
154+
155+
# Create a hash map for item features
156+
movie_fea_hete = {}
157+
movie_fea_homo = {}
158+
m_directors = {}
159+
m_actors = {}
160+
for idx, row in item_data.iterrows():
161+
m_info = item_converting(row, rate_list, genre_list, director_list, actor_list)
162+
movie_fea_hete[row['item']] = m_info[0]
163+
movie_fea_homo[row['item']] = m_info[1]
164+
m_directors[row['item']] = m_info[2]
165+
m_actors[row['item']] = m_info[3]
166+
167+
# Create a hash map for user features
168+
user_fea = {}
169+
for idx, row in user_data.iterrows():
170+
u_info = user_converting(row, gender_list, age_list, occupation_list, zipcode_list)
171+
user_fea[row['user']] = u_info
172+
173+
"""
174+
2 - Code to process meta-path features
175+
"""
176+
177+
178+
def reverse_dict(d):
179+
# {1:[a,b,c], 2:[a,f,g],...}
180+
re_d = collections.defaultdict(list)
181+
for k, v_list in d.items():
182+
for v in v_list:
183+
re_d[v].append(k)
184+
return dict(re_d)
185+
186+
187+
a_movies = reverse_dict(m_actors)
188+
d_movies = reverse_dict(m_directors)
189+
print("Actor dictionary:", len(a_movies), " and Director dictionary:", len(d_movies))
190+
191+
192+
def jsonKeys2int(x):
193+
"""
194+
Turn JSON keys into integers
195+
"""
196+
if isinstance(x, dict):
197+
return {int(k): v for k, v in x.items()}
198+
return x
199+
200+
201+
state = 'meta_training'
202+
203+
# Load user features support set
204+
support_u_movies = json.load(open(output_dir + state + '/support_u_movies.json', 'r'), object_hook=jsonKeys2int)
205+
# Load user features query set
206+
query_u_movies = json.load(open(output_dir + state + '/query_u_movies.json', 'r'), object_hook=jsonKeys2int)
207+
# Load user target support set
208+
support_u_movies_y = json.load(open(output_dir + state + '/support_u_movies_y.json', 'r'), object_hook=jsonKeys2int)
209+
# Load user target query set
210+
query_u_movies_y = json.load(open(output_dir + state + '/query_u_movies_y.json', 'r'), object_hook=jsonKeys2int)
211+
212+
if support_u_movies.keys() == query_u_movies.keys():
213+
u_id_list = support_u_movies.keys()
214+
print(len(u_id_list))
215+
216+
train_u_movies = {}
217+
if support_u_movies.keys() == query_u_movies.keys():
218+
u_id_list = support_u_movies.keys()
219+
print(len(u_id_list))
220+
221+
for idx, u_id in tqdm(enumerate(u_id_list)):
222+
train_u_movies[int(u_id)] = []
223+
train_u_movies[int(u_id)] += support_u_movies[u_id] + query_u_movies[u_id]
224+
len(train_u_movies)

Meta-Learning/MetaHIN/embedding_init.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ def forward(self, user_fea):
6666
return torch.cat((gender_emb, age_emb, occupation_emb, area_emb), 1) # (1, 4*32)
6767

6868

69-
class ItemEmbeddingML(torch.nn.Module):
69+
class ItemEmbedding(torch.nn.Module):
7070
"""
7171
Initialize item embedding class
7272
"""
@@ -75,7 +75,7 @@ def __init__(self, config):
7575
Initialize the item class
7676
:param config: experiment configuration
7777
"""
78-
super(ItemEmbeddingML, self).__init__()
78+
super(ItemEmbedding, self).__init__()
7979
self.num_rate = config['num_rate'] # Number of rate levels
8080
self.num_genre = config['num_genre'] # Number of genres
8181
self.embedding_dim = config['embedding_dim'] # Number of embedding dimensions

0 commit comments

Comments
 (0)