Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 15 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,46 +11,51 @@

## [竞赛](https://www.kaggle.com/competitions)

`现在我们已经准备好尝试 Kaggle 竞赛了,这些竞赛分成以下几个类别。`
```
机器学习比赛,奖金很高,业界承认分数。
现在我们已经准备好尝试 Kaggle 竞赛了,这些竞赛分成以下几个类别。
```

> [第1部分: 在课堂 InClass](https://www.kaggle.com/competitions?sortBy=deadline&group=all&page=1&pageSize=20&segment=inClass)
### [第1部分: 在课堂 InClass](https://www.kaggle.com/competitions?sortBy=deadline&group=all&page=1&pageSize=20&segment=inClass)

InClass: 为您的学生举办免费竞赛,将机器学习应用于实际问题。


> [第2部分: 操场 Playground](https://www.kaggle.com/competitions?sortBy=deadline&group=all&page=1&pageSize=20&segment=playground)
### [第2部分: 操场 Playground](https://www.kaggle.com/competitions?sortBy=deadline&group=all&page=1&pageSize=20&segment=playground)



> [第3部分: 快速入门 Getting Started](https://www.kaggle.com/competitions?sortBy=deadline&group=all&page=1&pageSize=20&segment=gettingStarted)
### [第3部分: 快速入门 Getting Started](https://www.kaggle.com/competitions?sortBy=deadline&group=all&page=1&pageSize=20&segment=gettingStarted)

Getting Started:这些竞赛的结构和 Featured 竞赛类似,但没有奖金。它们有更简单的数据集、大量教程和滚动的提交窗口让你可以随时输入。

Getting Started 竞赛非常适合初学者,因为它们给你提供了低风险的学习环境,并且还有很多社区创造的教程:https://www.kaggle.com/c/titanic#tutorials

> [**数字识别**](competitions/GettingStarted/DigitRecognizer.md): 使用著名的 MNIST 数据来学习计算机视觉基础原理

> [第4部分: 企业招聘 Recruitment](https://www.kaggle.com/competitions?sortBy=deadline&group=all&page=1&pageSize=20&segment=gettingStarted)

### [第4部分: 企业招聘 Recruitment](https://www.kaggle.com/competitions?sortBy=deadline&group=all&page=1&pageSize=20&segment=gettingStarted)

Recruitment:这些是由想要招聘数据科学家的公司赞助的。目前仍然相对少见。


> [第5部分: 研究项目(少奖金) Research](https://www.kaggle.com/competitions?sortBy=deadline&group=all&page=1&pageSize=20&segment=gettingStarted)
### [第5部分: 研究项目(少奖金) Research](https://www.kaggle.com/competitions?sortBy=deadline&group=all&page=1&pageSize=20&segment=gettingStarted)

Research:这些是研究方向的竞赛,只有很少或没有奖金。它们也有非传统的提交流程。


> [第6部分: 大型组织比赛(大奖金) Featured](https://www.kaggle.com/competitions?sortBy=deadline&group=all&page=1&pageSize=20&segment=gettingStarted)
### [第6部分: 大型组织比赛(大奖金) Featured](https://www.kaggle.com/competitions?sortBy=deadline&group=all&page=1&pageSize=20&segment=gettingStarted)

Featured:这些通常是由公司、组织甚至政府赞助的,奖金池最大。


## [数据集](https://www.kaggle.com/datasets)

存放数据集的位置
数据集,可直接用于机器学习。

## [核心思想](https://www.kaggle.com/kernels)

记录作者核心思想的位置
在线编程。(猜测,基于 jupyter 实现)

## [论坛](https://www.kaggle.com/discussion)

Expand Down Expand Up @@ -87,6 +92,7 @@ Featured:这些通常是由公司、组织甚至政府赞助的,奖金池最

## 其它中文文档

* [Beam 中文文档](http://beam.apachecn.org/)
* [Sklearn 0.19 中文文档](http://sklearn.apachecn.org/)
* [Spark 2.2.0和2.0.2 中文文档](http://spark.apachecn.org)
* [Storm 1.1.0和1.0.1 中文文档](http://storm.apachecn.org/)
Expand Down
11 changes: 11 additions & 0 deletions competitions/GettingStarted/DigitRecognizer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# **数字识别**

[**数字识别**](competitions/GettingStarted/DigitRecognizer.md):使用著名的 MNIST 数据来学习计算机视觉基础原理

## 内容说明:

* MNIST("修改后的国家标准与技术研究所")是计算机视觉的事实上的 "hello world" 数据集。自1999年发布以来,手写图像的经典数据集已成为基准分类算法的基础。随着新机器学习技术的出现,MNIST 仍然是研究人员和学习者的可靠资源。
* 在本次比赛中,您的目标是正确识别数以万计手写图像的数字。我们策划了一套教程式的内核,涵盖从回归到神经网络的一切。我们鼓励您尝试使用不同的算法来学习第一手什么是有效的,以及技术如何比较。
* [文档](competitions/GettingStarted/DigitRecognizer.md) [代码](src/python/GettingStarted/DigitRecognizer/dr_knn_pandas.py) **视频-待定**

##
4 changes: 2 additions & 2 deletions docs/Github-QuickStart.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# GitHub 入门操作指南

<a href=""> 此图片链接为 bilibili 视频地址: (视频图片下面为文本操作指南)
<img src="../static/images/doc/ApacheCN-GitHub入门操作-Fork到PullRequests.png">
<a href="https://www.bilibili.com/video/av15705305/"> 此图片链接为 bilibili 视频地址: (视频图片下面为文本操作指南)
<img src="../static/images/doc/ApacheCN-GitHub入门操作-Fork到PullRequests.png" >
</a>

> 一) fork apachecn/kaggle 项目
Expand Down
68 changes: 68 additions & 0 deletions src/python/GettingStarted/DigitRecognizer/dr_knn_pandas.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
#!/usr/bin/python
# -*- coding: utf-8 -*-
"""
Created on 2017-08-03 14:04:32
@author: ApacheCN_xy
Description: DigitRecognizer for sklearn_knn
Tips:This knn used too much time to fit the model,and the Kaggle Score is 0.96800,not as my expected result~
github: https://github.com/chenyyx/Kaggle
"""

from numpy import *
import csv
import time
import pandas as pd


# 加载数据
def opencsv():#使用pandas打开
data = pd.read_csv('input/DigitRecognition/train.csv')
data1=pd.read_csv('input/DigitRecognition/test.csv')
train_data = data.values[0:,1:]#读入全部训练数据
train_label = data.values[0:,0]
test_data=data1.values[0:,0:]#测试全部测试个数据
return train_data,train_label,test_data


def saveResult(result,csvName):
with open(csvName,'wb') as myFile:
myWriter=csv.writer(myFile)
myWriter.writerow(["ImageId","Label"])
index=0
for i in result:
tmp=[]
index=index+1
tmp.append(index)
#tmp.append(i)
tmp.append(int(i))
myWriter.writerow(tmp)

from sklearn.neighbors import KNeighborsClassifier


def knnClassify(trainData,trainLabel,testData):
knnClf=KNeighborsClassifier()#default:k = 5,defined by yourself:KNeighborsClassifier(n_neighbors=10)
knnClf.fit(trainData,ravel(trainLabel))
testLabel=knnClf.predict(testData)
saveResult(testLabel,'output/DigitRecognizer/Result_sklearn_knn.csv')
return testLabel


def dRecognition_knn():
loadStartTime = time.time()
trainData,trainLabel,testData = opencsv()
# print "trainData==>", type(trainData), shape(trainData)
# print "trainLabel==>", type(trainLabel), shape(trainLabel)
# print "testData==>", type(testData), shape(testData)
loadEndTime=time.time()
print "load data finish"
print('load data time used:%f' % (loadEndTime - loadStartTime))
t = time.time()
result=knnClassify(trainData,trainLabel,testData)
print "finish!"
k=time.time()
print('classify time used:%f' % (k - t))


if __name__ == '__main__':
dRecognition_knn()