"C:\\Users\\Agusti Frananda\\Anaconda3\\lib\\site-packages\\sklearn\\feature_extraction\\text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working\n",
" from collections import Mapping, defaultdict\n"
]
}
],
"source": [
"source": [
"import re\n",
"import re\n",
"from nltk.corpus import stopwords\n",
"from nltk.corpus import stopwords\n",
...
@@ -667,7 +243,7 @@
...
@@ -667,7 +243,7 @@
},
},
{
{
"cell_type": "code",
"cell_type": "code",
"execution_count": 13,
"execution_count": 9,
"metadata": {},
"metadata": {},
"outputs": [],
"outputs": [],
"source": [
"source": [
...
@@ -676,7 +252,7 @@
...
@@ -676,7 +252,7 @@
},
},
{
{
"cell_type": "code",
"cell_type": "code",
"execution_count": 14,
"execution_count": 10,
"metadata": {},
"metadata": {},
"outputs": [
"outputs": [
{
{
...
@@ -703,38 +279,80 @@
...
@@ -703,38 +279,80 @@
" <th>NAME</th>\n",
" <th>NAME</th>\n",
" <th>CATEGORY</th>\n",
" <th>CATEGORY</th>\n",
" <th>DESCRIPTION</th>\n",
" <th>DESCRIPTION</th>\n",
" <th>FABRIC</th>\n",
" <th>IMAGE</th>\n",
" <th>SIZE</th>\n",
" <th>PRICE</th>\n",
" <th>PRODUCT ID</th>\n",
" <th>WEBSITE</th>\n",
" <th>PRODUCT URL</th>\n",
" </tr>\n",
" </tr>\n",
" </thead>\n",
" </thead>\n",
" <tbody>\n",
" <tbody>\n",
" <tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>fort collin men red solid pad jacket</td>\n",
" <td>fort collin men red solid pad jacket</td>\n",
" <td>Men Jackets Coats</td>\n",
" <td>Men Jackets Coats</td>\n",
" <td>Fort Collins Men Red Solid Padded Jacket, For...</td>\n",
" <td>Fort Collins Men Red Solid Padded Jacket, For...</td>\n",
"###### Kami mendefinisikan fungsi yang akan mengambil node dan panjang path yang dilalui sebagai input. Fungsi akan berjalan melalui node yang terhubung dari input node yang ditentukan random walk. Lalu fungsi akan mengembalikan urutan node yang dilalui."
"# Node Embedding"
]
]
},
},
{
{
"cell_type": "code",
"cell_type": "code",
"execution_count": 18,
"execution_count": 17,
"metadata": {},
"metadata": {},
"outputs": [],
"outputs": [],
"source": [
"source": [
"def get_randomwalk(node, path_length):\n",
"from node2vec import Node2Vec"
" \n",
" random_walk = [node]\n",
" \n",
" for i in range(path_length-1):\n",
" temp = list(G.neighbors(node))\n",
" temp = list(set(temp) - set(random_walk)) \n",
" if len(temp) == 0:\n",
" break\n",
"\n",
" random_node = random.choice(temp)\n",
" random_walk.append(random_node)\n",
" node = random_node\n",
" \n",
" return random_walk"
]
]
},
},
{
{
"cell_type": "markdown",
"cell_type": "markdown",
"metadata": {},
"metadata": {},
"source": [
"source": [
"###### Contoh fungsi untuk: Men Formal Trousers"
"### Kita menghitung probabilitas yang ada dan melakukan generate walks."
]
]
},
},
{
{
"cell_type": "code",
"cell_type": "code",
"execution_count": 19,
"execution_count": 18,
"metadata": {},
"metadata": {},
"outputs": [
"outputs": [
{
{
"data": {
"name": "stderr",
"text/plain": [
"output_type": "stream",
"['Men Formal Trousers', 'invictu men black slim fit solid formal trouser']"
"model.wv.get_vector('fort collin men red solid pad jacket')"
"all_nodes = list(G.nodes())\n",
"\n",
"random_walks = []\n",
"for n in tqdm(all_nodes):\n",
" for i in range(5):\n",
" random_walks.append(get_randomwalk(n,10))\n",
" \n",
"# count of sequences\n",
"len(random_walks)"
]
]
},
},
{
{
"cell_type": "markdown",
"cell_type": "markdown",
"metadata": {},
"metadata": {},
"source": [
"source": [
"###### Dengan panjang path yang kami atur dengan nilai 10, maka didapatkan 40.625 urutan random walk. Urutan ini dapat digunakan sebagai input ke model skip-gram dan mengekstraksi bobot yang dipelajari oleh model (node embedding)."
"### Array yang ada memiliki panjang 64 karena kami mendefinisikan dimensi dengan 64. "