Skip to content

Commit b891eb5

Browse files
JongpilLeeJongpilLee
authored andcommitted
initial commit
1 parent a98cda2 commit b891eb5

File tree

10 files changed

+543
-0
lines changed

10 files changed

+543
-0
lines changed

.DS_Store

6 KB
Binary file not shown.

50tagList.txt

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
rock
2+
pop
3+
alternative
4+
indie
5+
electronic
6+
female vocalists
7+
dance
8+
00s
9+
alternative rock
10+
jazz
11+
beautiful
12+
metal
13+
chillout
14+
male vocalists
15+
classic rock
16+
soul
17+
indie rock
18+
Mellow
19+
electronica
20+
80s
21+
folk
22+
90s
23+
chill
24+
instrumental
25+
punk
26+
oldies
27+
blues
28+
hard rock
29+
ambient
30+
acoustic
31+
experimental
32+
female vocalist
33+
guitar
34+
Hip-Hop
35+
70s
36+
party
37+
country
38+
easy listening
39+
sexy
40+
catchy
41+
funk
42+
electro
43+
heavy metal
44+
Progressive rock
45+
60s
46+
rnb
47+
indie pop
48+
sad
49+
House
50+
happy

ForwardProp.sh

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
#!/bin/sh
2+
3+
# $1 scratch folder
4+
# $2 feature extraction list
5+
# $3 model selection
6+
7+
8+
9+
for i in "$@"
10+
do
11+
case $i in
12+
-m=*|--mode=*)
13+
mode="${i#*=}"
14+
;;
15+
esac
16+
done
17+
18+
echo mode = ${mode}
19+
20+
# encoding feature
21+
if [ mode="encoding" ]; then
22+
python encoding_cnn.py "$2" "$3"
23+
elif [ mode="prediction" ]; then
24+
python prediction_cnn.py "$2" "$3"
25+
fi
26+
27+

README.txt

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# README
2+
----------------------------------------------------------------------------
3+
* Contact Info *
4+
5+
<Jongpil Lee>
6+
Korea Advanced Institute of Science and Technoloty (KAIST)
7+
Graduate School of Culture Techonology (GSCT)
8+
richter@kaist.ac.kr
9+
10+
----------------------------------------------------------------------------
11+
* Description *
12+
13+
This is slightly modifided versions from our submission to the 2017 MIREX audio classification (train/test) tasks.
14+
Used model is based on our previously published paper [https://arxiv.org/abs/1706.06810].
15+
16+
There are total two functions in this repo.
17+
18+
1. predicting 50 tags using CNN learned from MSD tagging dataset.
19+
20+
2. transfer last hidden layer of the CNN to the your new task.
21+
This function consists of two stage: feature extraction and train/classification.
22+
23+
----------------------------------------------------------------------------
24+
* Platform and Requirements *
25+
26+
<Dependencies>
27+
keras 1.1.0
28+
theano 0.8.2
29+
python 2.7.6
30+
31+
<Python Libraries>
32+
librosa
33+
numpy
34+
sklearn
35+
36+
----------------------------------------------------------------------------
37+
* Use *
38+
39+
1. 50 tag prediction
40+
41+
./ForwardProp.sh -m=prediction /path/to/save/folder /path/to/fileList.txt
42+
43+
{"file_name": "./path/to/save/folder/file_name.json", "prediction_msd": {"beautiful": "0.0206099", "punk": "0.00465381", "indie": "0.0876653", "male vocalists": "0.0211934", "female vocalist": "0.00529418", "heavy metal": "0.00191998", "pop": "0.063148", "sad": "0.015539", "00s": "0.0115924", "ambient": "0.0148107", "alternative": "0.0425866", "hard rock": "0.00436063", "electronic": "0.016531", "blues": "0.143018", "folk": "0.315052", "classic rock": "0.0361686", "alternative rock": "0.00850769", "90s": "0.00585691", "60s": "0.0267258", "indie rock": "0.0129534", "electronica": "0.00600895", "female vocalists": "0.0476008", "easy listening": "0.0104203", "dance": "0.00346507", "funk": "0.00661781", "House": "0.00164513", "80s": "0.00953005", "party": "0.00136872", "Mellow": "0.0486049", "electro": "0.00234408", "chillout": "0.017821", "happy": "0.00424408", "oldies": "0.0182328", "rnb": "0.00878901", "jazz": "0.123137", "70s": "0.0187786", "instrumental": "0.0407893", "indie pop": "0.0125248", "sexy": "0.00269948", "Hip-Hop": "0.00374524", "chill": "0.0139084", "guitar": "0.0837907", "country": "0.0271717", "metal": "0.00198551", "soul": "0.0420783", "catchy": "0.00135911", "rock": "0.118368", "acoustic": "0.203366", "Progressive rock": "0.0103604", "experimental": "0.024019"}}
44+
45+
These json files of file list would be saved in the save folder.
46+
47+
2. get last hidden layer and train svm onto new label dataset
48+
49+
# get last hidden layer
50+
./ForwardProp.sh -m=encoding /path/to/save/folder /path/to/fileList.txt
51+
52+
# train and classification
53+
./TrainAndClassify.sh /path/to/save/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/output
54+
55+
example trainListFile.txt
56+
57+
/media/bach1/dataset/gtzan/blues/blues.00029.wav blues
58+
/media/bach1/dataset/gtzan/blues/blues.00030.wav blues
59+
/media/bach1/dataset/gtzan/blues/blues.00031.wav blues
60+
/media/bach1/dataset/gtzan/blues/blues.00032.wav blues
61+
...
62+
63+
example testListFile.txt
64+
/media/bach1/dataset/gtzan/blues/blues.00035.wav
65+
/media/bach1/dataset/gtzan/blues/blues.00036.wav
66+
/media/bach1/dataset/gtzan/blues/blues.00037.wav
67+
68+
expected output file
69+
/media/bach1/dataset/gtzan/blues/blues.00035.wav blues
70+
/media/bach1/dataset/gtzan/blues/blues.00036.wav blues
71+
/media/bach1/dataset/gtzan/blues/blues.00037.wav blues
72+
73+
---------------------------------------------------------------------------
74+

TrainAndClassify.sh

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
#!/bin/sh
2+
3+
# $1 scratch folder
4+
# $2 train list
5+
# $3 test list
6+
# $4 output list
7+
8+
# encoding feature
9+
python run_svm.py "$1" "$2" "$3" "$4"
10+
11+

encoding_cnn.py

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
import os
2+
import numpy as np
3+
import time
4+
5+
from keras.optimizers import SGD
6+
from keras.models import model_from_json,Model
7+
from keras import backend as K
8+
from keras.callbacks import Callback,ModelCheckpoint,EarlyStopping
9+
from keras.layers import Input
10+
from keras.layers.core import Dense
11+
12+
import sys
13+
import librosa
14+
15+
16+
# load model
17+
model_path = './models/'
18+
19+
architecture_name = model_path + 'architecture_msdTag.json'
20+
weight_name = model_path + 'weight_msdTag.hdf5'
21+
22+
nst = 0
23+
partition = 1
24+
25+
save_path = sys.argv[1]
26+
train_arg = sys.argv[2]
27+
28+
fs = 22050
29+
30+
def load_melspec(file_name_from,num_segment,sample_length):
31+
#file_name = file_name_from.replace('.wav','.au')
32+
file_name = file_name_from
33+
34+
tmp,sr = librosa.load(file_name,sr=fs,mono=True)
35+
tmp = tmp.astype(np.float32)
36+
37+
y_length = len(tmp)
38+
39+
tmp_segmentized = np.zeros((num_segment,sample_length,1))
40+
for iter2 in range(0,num_segment):
41+
42+
hopping = (y_length-sample_length)/(num_segment-1)
43+
count_tmp = 0
44+
if hopping < 0:
45+
if count_tmp == 0:
46+
tmp_tmp = np.repeat(tmp,10)
47+
count_tmp += 1
48+
y_length_tmp = len(tmp_tmp)
49+
hopping = (y_length_tmp - sample_length)/(num_segment-1)
50+
tmp_segmentized[iter2,:,0] = tmp_tmp[iter2*hopping:iter2*hopping+sample_length]
51+
else:
52+
tmp_segmentized[iter2,:,0] = tmp[iter2*hopping:iter2*hopping+sample_length]
53+
54+
return tmp_segmentized
55+
56+
57+
# load data
58+
with open(train_arg) as f:
59+
train_list = [x.split('\t')[0] for x in f.read().splitlines()]
60+
61+
print len(train_list)
62+
all_list = train_list
63+
print len(all_list)
64+
65+
model = model_from_json(open(architecture_name).read())
66+
model.load_weights(weight_name)
67+
print 'model loaded!!!'
68+
69+
70+
# compile & optimizer
71+
sgd = SGD(lr=0.001,decay=1e-6,momentum=0.9,nesterov=True)
72+
model.compile(loss='categorical_crossentropy',optimizer=sgd,metrics=['accuracy'])
73+
74+
# print model summary
75+
model.summary()
76+
77+
sample_length = model.input_shape[1]
78+
print sample_length
79+
80+
num_segment = int(22050*30/sample_length)+1
81+
print 'Number of segments per song: ' + str(num_segment)
82+
83+
84+
# define activation layer
85+
layer_dict = dict([(layer.name,layer) for layer in model.layers[1:]])
86+
layer_num = (len(layer_dict)-1)/4
87+
88+
# msd doesn't have dropout so +1 for capturing last hidden layer
89+
activation_layer = 'activation_%d' % (layer_num+1)
90+
print activation_layer
91+
92+
layer_output = layer_dict[activation_layer].output
93+
get_last_hidden_output = K.function([model.layers[0].input, K.learning_phase()], [layer_output])
94+
95+
# encoding
96+
all_size = len(all_list)
97+
for iter2 in range(int(nst*all_size/partition),int((nst+1)*all_size/partition)):
98+
# check existence
99+
save_name = save_path + '/' + model_select + all_list[iter2].replace('.wav','.npy')
100+
101+
if not os.path.exists(os.path.dirname(save_name)):
102+
os.makedirs(os.path.dirname(save_name))
103+
104+
if os.path.isfile(save_name) == 1:
105+
print iter2, save_name + '_file_exist!!!!!!!'
106+
continue
107+
108+
# load melgram
109+
x_sample_tmp = load_melspec(all_list[iter2],num_segment,sample_length)
110+
print x_sample_tmp.shape
111+
112+
# prediction
113+
weight = get_last_hidden_output([x_sample_tmp,0])[0]
114+
115+
maxpooled = np.amax(weight,axis=1)
116+
averagepooled = np.average(maxpooled,axis=0)
117+
print averagepooled.shape,iter2
118+
119+
np.save(save_name,averagepooled)
120+
121+
122+
123+

0 commit comments

Comments
 (0)