lhernandezearlyalert
diff --git a/‎extra_topics/network_analysis.ipynb‎
Lines changed: 199 additions & 0 deletions b/‎extra_topics/network_analysis.ipynb‎
Lines changed: 199 additions & 0 deletions
@@ -0,0 +1,199 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Brief introduction to NetworkX\n",
+    "\n",
+    "Network Analysis is a broad, emerging field across many disciplines. It focuses on the role of relationships (\"edges\") between entities (\"nodes\"). Nodes could be people or neurons or power plants or genes. Edges might include friendships or connections or co-presence at events.\n",
+    "\n",
+    "Here, I show a very brief introduction to how to use the [networkx](https://networkx.github.io/) package to do basic network analysis."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import networkx as nx\n",
+    "import matplotlib.pyplot as plt\n",
+    "import pandas as pd\n",
+    "import numpy as np"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Creating a graph object\n",
+    "\n",
+    "`networkx` has a number of built-in functions that let you create random networks. Here I show a Barabasi-Albert graph, which is highly skewed with a few central nodes."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "G = nx.barabasi_albert_graph(100,2)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "It's often useful to visualize these networks. `networkx` includes a number of visualization options. Here's a simple one."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "nx.draw_spring(G);"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We also often want to get information about the network. For example, here is the degree distribution (a histogram of how many edges each node has)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "plt.hist([v for k,v in nx.degree(G)]);"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "And here is the betweenness centrality. Other measures have a similar syntax."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "plt.hist(nx.centrality.closeness_centrality(G).values());"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "There are also network-level measures."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "nx.diameter(G)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "nx.cluster.average_clustering(G)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Importing from Pandas\n",
+    "\n",
+    "You likely want to create a network from data. In order to create a graph object, you need to convert the data into what's called an \"edgelist\". The easiest way to do this is with a data frame which has at least two columns.\n",
+    "\n",
+    "The first is the source node, the second is the target node, and after that are any attributes of the edges (e.g., weight or valence). Here's a simple example"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "nodes = list(range(100))\n",
+    "\n",
+    "df = pd.DataFrame({'from': np.random.choice(nodes, 100),\n",
+    "                   'to': np.random.choice(nodes,100)\n",
+    "                  })"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "G = nx.from_pandas_edgelist(df, source='from', target='to')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "nx.draw(G);"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "plt.hist([v for k,v in nx.degree(G)]);"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}