Tag Archives: python

python机器学习深度学习总结

1、Python环境搭建(Windows)

开发工具:PyCharm Community Edition(free)

Python环境:WinPython 3.5.2.3Qt5
–此环境集成了机器学习和深度学习用到的主要包:
numpy,scipy,matplotlib,pandas,scikit-learn,theano,keras

IPython notebook :

2、示例代码:

scikit-learn sample

keras sample

3、数据集Datasets

GeoHey公共数据

4、kaggle平台

Kaggle是一个数据建模数据分析竞赛平台。企业和研究者可在其上发布数据,统计学者和数据挖掘专家可在其上进行竞赛以产生最好的模型。这一众包模式依赖于这一事实,即有众多策略可以用于解决几乎所有预测建模的问题,而研究者不可能在一开始就了解什么方法对于特定问题是最为有效的。Kaggle的目标则是试图通过众包的形式来解决这一难题,进而使数据科学成为一场运动。(wiki)

5、常见问题处理

Approaching (Almost) Any Machine Learning Problem

 

python:No module named xxx

OSX/Linux

Use $ sudo pip install requests if you have pip installed

On OSX you can also use sudo easy_install -U requests if you have easy_install installed.

Windows

Use > Path\easy_install.exe requests if you have a windows machine, where easy_install can be found in your Python*\Scripts folder, if it was installed. (Note Path\easy_install.exe is an example, mine is C:\Python32\Scripts\easy_install.exe)

If you don’t have easy install and are running on a windows machine, you can get it here: http://www.lfd.uci.edu/~gohlke/pythonlibs/#distribute

If you manually want to add a library to a windows machine, you can download the compressed library, uncompress it, and then place it into the Lib folder of your python path.

From Source (Universal)

For any missing library, the source is usually available at https://pypi.python.org/pypi/. Then:

On mac osx and windows, after downloading the source zip, uncompress it and from the termiminal/cmd run python setup.py install from the uncompressed dir.

from:http://stackoverflow.com/questions/17309288/importerror-no-module-named-requests

Python Resource

 Python语言:

1、Python Language Essentials  《Python for Data Analysis》

2、廖雪峰python教程

Python函数式编程:从入门到走火入魔

Python工具及环境:

IPython Notebook: 交互计算新时代

Anaconda Scientific Python Distribution

WinPython

库与框架:

Beautiful Soup 4.2.0 文档

10 Minutes to pandas

数据分析:

零基础学习Python数据分析

Book:

Python基础教程

利用Python进行数据分析

集体智慧编程 (豆瓣)

这是一本非常好的入门书,书中的例子源码都是Python实现的,并且能帮你迅速熟悉Python相关的各种计算库

统计学习方法 (豆瓣)

这本书深入浅出地讲了和机器学习有关的一切数学基础知识,一整本的干货,没有废话,非常值得一读

Some Online Resources:

http://docs.python.org/tut/tut.html – Beginners

http://diveintopython3.ep.io/ – Intermediate

http://www.pythonchallenge.com/ – Expert Skills

http://docs.python.org/ – collection of all knowledge

Some more:

A Byte of Python.

Python 2.5 Quick Reference

Python Side bar

A Nice blog for beginners

Think Python: An Introduction to Software Design

Python Resource

优秀Python学习资源收集汇总(强烈推荐)

学习Python编程的11个资源

Hidden features of Python

怎么用最短时间高效而踏实地学习 Python

refer:http://stackoverflow.com/questions/70577/best-online-resource-to-learn-python

Open Sourcing a Python Project the Right Way

Most Python developers have written at least one tool, script, library or framework that others would find useful. My goal in this article is to make the process of open-sourcing existing Python code as clear and painless as possible. And I don’t simply mean, “create GitHub repo, git push, post on Reddit, and call it a day.” By the end of this article, you’ll be able to take an existing code base and transform it into an open source project that encourages both use and contribution.

While every project is different, there are some parts of the process of open-sourcing existing code that are common to all Python projects. In the vein of another popular series I’ve written, “Starting a Django Project The Right Way,” I’ll outline the steps I’ve found to be necessary when open-sourcing a Python project.

数据爬取和数据分析案例

数据爬取:

*如何入门 Python 爬虫?

专栏:Python爬虫入门教程

Python爬虫学习系列教程

模拟登录一些知名的网站,为了方便爬取需要登录的网站

Python 爬虫-模拟登录知乎-爬取拉勾网职位信息

Python写的链家爬虫 代码+数据

数据爬取工具或框架:

scrapy

Hawk 【重磅开源】Hawk-数据抓取工具:简明教程

pyspider

使用Wget下载整个网站
you-get(Releases · soimort/you-get · GitHub这里面有各种发布版本)。

刚开始写爬虫用的是urllib2,后来知道了requests,惊为天人。
刚开始解析网页用的是re,后来知道了BeautifulSoup,解析页面不能再轻松。
再后来看别人的爬虫,知道了scrapy,被这个框架惊艳到了。
之后遇到了一些有验证码的网站,于是知道了PIL。但后来知道了opencv,pybrain。当在爬虫中用上人工神经网络识别出验证码,兴奋得守在爬虫旁边看他爬完全站。
再后来知道了threading,知道了celery。(知乎)

使用Python进行验证码识别

数据分析案例:

有哪些网站用爬虫爬取能得到很有价值的数据?

2016豆瓣电影可视化分析报告

京东百万记录分析中国人罩杯分布 | 上150万数据 密码:guvy)

用Python侦测比特币交易的网络可视化分析

如何通过房屋租售比来判断房产的价值或泡沫?
你用 Python 做过什么有趣的数据挖掘/分析项目

知乎问题爬虫

知乎数据 API 接口 (node.js)

拉勾职位信息爬取

赶集租房信息

链家爬虫 (数据:链家数据

使用Python进行验证码识别

个人博客:

沙漠之鹰