Skip to content

Commit 4f23524

Browse files
committed
added mallet
1 parent ace5c4e commit 4f23524

File tree

1,833 files changed

+142670
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,833 files changed

+142670
-0
lines changed

dump_people_with_just_id.py

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
import pandas as pd
2+
3+
def main():
4+
import argparse
5+
parser = argparse.ArgumentParser()
6+
parser.add_argument('-i', '--interaction_path', required=True)
7+
parser.add_argument('-o', '--output_path', required=True)
8+
9+
args = parser.parse_args()
10+
df = pd.read_pickle(args.interaction_path)
11+
12+
people = set()
13+
for i, r in df.iterrows():
14+
people.add(r['sender_id'])
15+
for id_ in r['recipient_ids']:
16+
people.add(id_)
17+
18+
new_df = pd.DataFrame({'id': list(people)})
19+
new_df.to_pickle(args.output_path)
20+
21+
if __name__ == "__main__":
22+
main()
23+
24+
25+

external/mallet-2.0.8RC3/LICENSE

Lines changed: 242 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,242 @@
1+
This software is Copyright (C) 2002, 2003 University of Massachusetts
2+
Amherst, Department of Computer Science, and is licensed under the
3+
terms of the Common Public License, Version 1.0 or (at your option)
4+
any subsequent version.
5+
6+
The license is approved by the Open Source Initiative, and is available
7+
from their website at http://www.opensource.org.
8+
9+
=====================
10+
11+
Common Public License Version 1.0
12+
13+
THE ACCOMPANYING PROGRAM IS PROVIDED UNDER THE TERMS OF THIS COMMON
14+
PUBLIC LICENSE ("AGREEMENT"). ANY USE, REPRODUCTION OR DISTRIBUTION OF
15+
THE PROGRAM CONSTITUTES RECIPIENT'S ACCEPTANCE OF THIS AGREEMENT.
16+
17+
1. DEFINITIONS
18+
19+
"Contribution" means:
20+
21+
a) in the case of the initial Contributor, the initial code and
22+
documentation distributed under this Agreement, and
23+
24+
b) in the case of each subsequent Contributor:
25+
26+
i) changes to the Program, and
27+
28+
ii) additions to the Program;
29+
30+
where such changes and/or additions to the Program originate from and
31+
are distributed by that particular Contributor. A Contribution
32+
'originates' from a Contributor if it was added to the Program by such
33+
Contributor itself or anyone acting on such Contributor's
34+
behalf. Contributions do not include additions to the Program which:
35+
(i) are separate modules of software distributed in conjunction with
36+
the Program under their own license agreement, and (ii) are not
37+
derivative works of the Program.
38+
39+
"Contributor" means any person or entity that distributes the Program.
40+
41+
"Licensed Patents " mean patent claims licensable by a Contributor
42+
which are necessarily infringed by the use or sale of its Contribution
43+
alone or when combined with the Program.
44+
45+
"Program" means the Contributions distributed in accordance with this
46+
Agreement.
47+
48+
"Recipient" means anyone who receives the Program under this
49+
Agreement, including all Contributors.
50+
51+
2. GRANT OF RIGHTS
52+
53+
a) Subject to the terms of this Agreement, each Contributor hereby
54+
grants Recipient a non-exclusive, worldwide, royalty-free copyright
55+
license to reproduce, prepare derivative works of, publicly display,
56+
publicly perform, distribute and sublicense the Contribution of such
57+
Contributor, if any, and such derivative works, in source code and
58+
object code form.
59+
60+
b) Subject to the terms of this Agreement, each Contributor hereby
61+
grants Recipient a non-exclusive, worldwide, royalty-free patent
62+
license under Licensed Patents to make, use, sell, offer to sell,
63+
import and otherwise transfer the Contribution of such Contributor, if
64+
any, in source code and object code form. This patent license shall
65+
apply to the combination of the Contribution and the Program if, at
66+
the time the Contribution is added by the Contributor, such addition
67+
of the Contribution causes such combination to be covered by the
68+
Licensed Patents. The patent license shall not apply to any other
69+
combinations which include the Contribution. No hardware per se is
70+
licensed hereunder.
71+
72+
c) Recipient understands that although each Contributor grants the
73+
licenses to its Contributions set forth herein, no assurances are
74+
provided by any Contributor that the Program does not infringe the
75+
patent or other intellectual property rights of any other entity. Each
76+
Contributor disclaims any liability to Recipient for claims brought by
77+
any other entity based on infringement of intellectual property rights
78+
or otherwise. As a condition to exercising the rights and licenses
79+
granted hereunder, each Recipient hereby assumes sole responsibility
80+
to secure any other intellectual property rights needed, if any. For
81+
example, if a third party patent license is required to allow
82+
Recipient to distribute the Program, it is Recipient's responsibility
83+
to acquire that license before distributing the Program.
84+
85+
d) Each Contributor represents that to its knowledge it has sufficient
86+
copyright rights in its Contribution, if any, to grant the copyright
87+
license set forth in this Agreement.
88+
89+
3. REQUIREMENTS
90+
91+
A Contributor may choose to distribute the Program in object code form
92+
under its own license agreement, provided that:
93+
94+
a) it complies with the terms and conditions of this Agreement; and
95+
96+
b) its license agreement:
97+
98+
i) effectively disclaims on behalf of all Contributors all warranties
99+
and conditions, express and implied, including warranties or
100+
conditions of title and non-infringement, and implied warranties or
101+
conditions of merchantability and fitness for a particular purpose;
102+
103+
ii) effectively excludes on behalf of all Contributors all liability
104+
for damages, including direct, indirect, special, incidental and
105+
consequential damages, such as lost profits;
106+
107+
iii) states that any provisions which differ from this Agreement are
108+
offered by that Contributor alone and not by any other party; and
109+
110+
iv) states that source code for the Program is available from such
111+
Contributor, and informs licensees how to obtain it in a reasonable
112+
manner on or through a medium customarily used for software exchange.
113+
114+
When the Program is made available in source code form:
115+
116+
a) it must be made available under this Agreement; and
117+
118+
b) a copy of this Agreement must be included with each copy of the
119+
Program.
120+
121+
Contributors may not remove or alter any copyright notices contained
122+
within the Program.
123+
124+
Each Contributor must identify itself as the originator of its
125+
Contribution, if any, in a manner that reasonably allows subsequent
126+
Recipients to identify the originator of the Contribution.
127+
128+
4. COMMERCIAL DISTRIBUTION
129+
130+
Commercial distributors of software may accept certain
131+
responsibilities with respect to end users, business partners and the
132+
like. While this license is intended to facilitate the commercial use
133+
of the Program, the Contributor who includes the Program in a
134+
commercial product offering should do so in a manner which does not
135+
create potential liability for other Contributors. Therefore, if a
136+
Contributor includes the Program in a commercial product offering,
137+
such Contributor ("Commercial Contributor") hereby agrees to defend
138+
and indemnify every other Contributor ("Indemnified Contributor")
139+
against any losses, damages and costs (collectively "Losses") arising
140+
from claims, lawsuits and other legal actions brought by a third party
141+
against the Indemnified Contributor to the extent caused by the acts
142+
or omissions of such Commercial Contributor in connection with its
143+
distribution of the Program in a commercial product offering. The
144+
obligations in this section do not apply to any claims or Losses
145+
relating to any actual or alleged intellectual property
146+
infringement. In order to qualify, an Indemnified Contributor must: a)
147+
promptly notify the Commercial Contributor in writing of such claim,
148+
and b) allow the Commercial Contributor to control, and cooperate with
149+
the Commercial Contributor in, the defense and any related settlement
150+
negotiations. The Indemnified Contributor may participate in any such
151+
claim at its own expense.
152+
153+
For example, a Contributor might include the Program in a commercial
154+
product offering, Product X. That Contributor is then a Commercial
155+
Contributor. If that Commercial Contributor then makes performance
156+
claims, or offers warranties related to Product X, those performance
157+
claims and warranties are such Commercial Contributor's responsibility
158+
alone. Under this section, the Commercial Contributor would have to
159+
defend claims against the other Contributors related to those
160+
performance claims and warranties, and if a court requires any other
161+
Contributor to pay any damages as a result, the Commercial Contributor
162+
must pay those damages.
163+
164+
5. NO WARRANTY
165+
166+
EXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, THE PROGRAM IS
167+
PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
168+
KIND, EITHER EXPRESS OR IMPLIED INCLUDING, WITHOUT LIMITATION, ANY
169+
WARRANTIES OR CONDITIONS OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY
170+
OR FITNESS FOR A PARTICULAR PURPOSE. Each Recipient is solely
171+
responsible for determining the appropriateness of using and
172+
distributing the Program and assumes all risks associated with its
173+
exercise of rights under this Agreement, including but not limited to
174+
the risks and costs of program errors, compliance with applicable
175+
laws, damage to or loss of data, programs or equipment, and
176+
unavailability or interruption of operations.
177+
178+
6. DISCLAIMER OF LIABILITY
179+
180+
EXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, NEITHER RECIPIENT NOR
181+
ANY CONTRIBUTORS SHALL HAVE ANY LIABILITY FOR ANY DIRECT, INDIRECT,
182+
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING
183+
WITHOUT LIMITATION LOST PROFITS), HOWEVER CAUSED AND ON ANY THEORY OF
184+
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
185+
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OR
186+
DISTRIBUTION OF THE PROGRAM OR THE EXERCISE OF ANY RIGHTS GRANTED
187+
HEREUNDER, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
188+
189+
7. GENERAL
190+
191+
If any provision of this Agreement is invalid or unenforceable under
192+
applicable law, it shall not affect the validity or enforceability of
193+
the remainder of the terms of this Agreement, and without further
194+
action by the parties hereto, such provision shall be reformed to the
195+
minimum extent necessary to make such provision valid and enforceable.
196+
197+
If Recipient institutes patent litigation against a Contributor with
198+
respect to a patent applicable to software (including a cross-claim or
199+
counterclaim in a lawsuit), then any patent licenses granted by that
200+
Contributor to such Recipient under this Agreement shall terminate as
201+
of the date such litigation is filed. In addition, if Recipient
202+
institutes patent litigation against any entity (including a
203+
cross-claim or counterclaim in a lawsuit) alleging that the Program
204+
itself (excluding combinations of the Program with other software or
205+
hardware) infringes such Recipient's patent(s), then such Recipient's
206+
rights granted under Section 2(b) shall terminate as of the date such
207+
litigation is filed.
208+
209+
All Recipient's rights under this Agreement shall terminate if it
210+
fails to comply with any of the material terms or conditions of this
211+
Agreement and does not cure such failure in a reasonable period of
212+
time after becoming aware of such noncompliance. If all Recipient's
213+
rights under this Agreement terminate, Recipient agrees to cease use
214+
and distribution of the Program as soon as reasonably
215+
practicable. However, Recipient's obligations under this Agreement and
216+
any licenses granted by Recipient relating to the Program shall
217+
continue and survive.
218+
219+
Everyone is permitted to copy and distribute copies of this Agreement,
220+
but in order to avoid inconsistency the Agreement is copyrighted and
221+
may only be modified in the following manner. The Agreement Steward
222+
reserves the right to publish new versions (including revisions) of
223+
this Agreement from time to time. No one other than the Agreement
224+
Steward has the right to modify this Agreement. IBM is the initial
225+
Agreement Steward. IBM may assign the responsibility to serve as the
226+
Agreement Steward to a suitable separate entity. Each new version of
227+
the Agreement will be given a distinguishing version number. The
228+
Program (including Contributions) may always be distributed subject to
229+
the version of the Agreement under which it was received. In addition,
230+
after a new version of the Agreement is published, Contributor may
231+
elect to distribute the Program (including its Contributions) under
232+
the new version. Except as expressly stated in Sections 2(a) and 2(b)
233+
above, Recipient receives no rights or licenses to the intellectual
234+
property of any Contributor under this Agreement, whether expressly,
235+
by implication, estoppel or otherwise. All rights in the Program not
236+
expressly granted under this Agreement are reserved.
237+
238+
This Agreement is governed by the laws of the State of New York and
239+
the intellectual property laws of the United States of America. No
240+
party to this Agreement will bring a legal action under this Agreement
241+
more than one year after the cause of action arose. Each party waives
242+
its rights to a jury trial in any resulting litigation.

external/mallet-2.0.8RC3/Makefile

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
MALLET_DIR = $(shell pwd)
2+
3+
JAVAC = javac
4+
JAVA_FLAGS = \
5+
-classpath "$(MALLET_DIR)/class:$(MALLET_DIR)/lib/mallet-deps.jar:$(MALLET_DIR)/lib/jdom-1.0.jar:$(MALLET_DIR)/lib/grmm-deps.jar:$(MALLET_DIR)/lib/weka.jar " \
6+
-sourcepath "$(MALLET_DIR)/src" \
7+
-g:lines,vars,source \
8+
-d $(MALLET_DIR)/class \
9+
-J-Xmx200m -source 1.5
10+
11+
JAVADOC = javadoc
12+
JAVADOC_FLAGS = -J-Xmx300m
13+
JAVADOCS=html
14+
15+
16+
MALLET_VERSION=20080618
17+
ifeq ($(BUILDING_GRMM),yes)
18+
DISTNAME=grmm-$(VERSION)
19+
else
20+
VERSION=$(MALLET_VERSION)
21+
DISTNAME=mallet-$(VERSION)
22+
endif
23+
24+
25+
all: class link-resources
26+
$(JAVAC) $(JAVA_FLAGS) `find src -name '*.java'`
27+
28+
javadoc: html class
29+
$(JAVADOC) $(JAVADOC_FLAGS) -classpath "$(MALLET_DIR)/class:$(MALLET_DIR)/lib/mallet-deps.jar:$(MALLET_DIR)/lib/grmm-deps.jar" -d $(MALLET_DIR)/html -sourcepath $(MALLET_DIR)/src -source 1.4 -subpackages edu
30+
31+
grmmdoc: html class
32+
$(JAVADOC) $(JAVADOC_FLAGS) -classpath "$(MALLET_DIR)/class:$(MALLET_DIR)/lib/mallet-deps.jar" -d $(MALLET_DIR)/html -sourcepath $(MALLET_DIR)/src -source 1.4 -subpackages edu.umass.cs.mallet.users.casutton.graphical
33+
34+
copy-resources: class
35+
cd src ; gtar --exclude CVS -cf - `find . -type d -name resources` | (cd ../class ; gtar -xf -)
36+
37+
# Soft link the resources directories in mallet/src into mallet/class
38+
link-resources: class
39+
cd src ; for d in `find . -type d -name resources` ; do \
40+
echo $$d ; \
41+
mkdir -p `dirname ../class/$$d` ; \
42+
rm -f ../class/$$d ; \
43+
(cd ../class ; ln -s `echo $$d | sed 's,/[^/]*,/\.\.,g'`/src/$$d $$d ) ; \
44+
done
45+
46+
jar: class
47+
jar -cvf lib/mallet.jar -C class cc/
48+
49+
srcjar: class
50+
jar -cvf lib/mallet.jar src Makefile -C class cc/
51+
52+
class:
53+
mkdir -p class
54+
55+
html:
56+
mkdir -p html
57+
58+
clean:
59+
rm -rf class/* lib/unpack
60+
61+
echo-classpath:
62+
export CLASSPATH=$(MALLET_DIR)/class
63+
64+
65+
# removed javadoc
66+
.distfiles: FORCE jar
67+
rm -f $@
68+
echo .emacs.mallet >> $@
69+
echo HACKING >> $@
70+
echo LICENSE >> $@
71+
echo Makefile >> $@
72+
echo OTHER-SIMILAR-SOFTWARE.html >> $@
73+
echo README.html >> $@
74+
echo TODO >> $@
75+
echo README.ant >> $@
76+
echo build.xml >> $@
77+
find src -name '*.java' -not -path 'src/com/*' >> $@
78+
find src -path '*/resources/*' -type f -not -path '*/CVS/*' >> $@ # include resource dirs -cas
79+
echo lib/*.jar lib/Makefile >> $@
80+
#find lib/jython -type f -not -path '*/CVS/*' >> $@
81+
#find scripts -type f -not -path '*/CVS/*' >> $@
82+
#echo doc/*.html >> $@
83+
# Include built jars. Wildcards cannot be used below, for these files don't exist yet. -cas
84+
echo dist/mallet.jar dist/mallet-deps.jar >> $@
85+
if [ ! -z "$$BUILDING_GRMM"]; then echo dist/grmm-deps.jar >> $@; fi
86+
# include the javadocs
87+
#find $(JAVADOCS) -type f >> $@
88+
# find the executables in bin/ directory to be included
89+
find bin -type f -maxdepth 1 -perm -a+x -not \( -path '*/CVS/*' -or -name 'prepend-license.sh' \) >> $@
90+
if [ -z "$$BUILDING_GRMM" ]; then \
91+
grep -v mallet/grmm $@ > /tmp/$@ ; rm $@ ; mv /tmp/$@ $@ ; \
92+
fi
93+
94+
dist/$(DISTNAME).tar.gz: .distfiles
95+
-mkdir dist
96+
# remove extant build directory
97+
rm -rf $(DISTNAME)
98+
# create temp build directory
99+
mkdir $(DISTNAME)
100+
# add other important files to dist dir for convenience
101+
cp lib/mallet-deps.jar lib/mallet.jar dist
102+
# copying files to build directory
103+
#cat .distfiles | xargs -n256 cp --preserve --link --parents --target-directory $(DISTNAME)
104+
tar --files-from .distfiles -cf - | (cd $(DISTNAME) ; tar -xpvf -)
105+
# tar build directory
106+
tar -chvf dist/$(DISTNAME).tar $(DISTNAME)
107+
# remove extant *.tar.gz file
108+
rm -f $(TARBALL)
109+
# gzip tar file
110+
gzip -9 dist/$(DISTNAME).tar
111+
# remove temp build directory
112+
rm -rf $(DISTNAME)
113+
114+
FORCE:

0 commit comments

Comments
 (0)