Do you know how Data Scientists use Python and if it requires data, where I can get sample data to practice with? I'm currently learning the language and plan on applying for Data Scientists jobs, so it would be helpful if I knew if there were specific types of tasks where I'd use python. If it's relevant, I use C# regularly.
|
migrated from stackoverflow.com Feb 11 '12 at 14:53
Any type of data parsing is going to be of two types. Ascii or binary. I would practice and learn how to 'get' data from multiply sources. This could be from files, from network connections, from stdout of subprocesses, serial port... Etc... for ascii data, you would must probably be using strings methods for splitting and storing the data. Binary data is where you'd really want to practice. You have to worry about the endianess of the data, worry about the specific bit lengths of different datatypes on different platforms, following specifications for the format of the data, actually parsing the raw data (get familiar with 'struct' and packing and unpacking data). as for examples, search for tutorials on each piece I've mentioned. Most items you can create a temp generator of data to create the other side of a process (such as a tcp server, a subprocess spitting out stdout, something to generate files). Good luck |
|||||
|
I'm not the most experienced in that field but I use Python for data mining and experimental data analysis and modification. Why Python for that ?
I wrote an article in french on "Why Python ?" if you can read it. If you want some data you can found some on sites like UCI Machine Learning Repository :
** EDIT ** |
|||||||||
|
Why don't you try some scientific computation libraries like this:
These are tools for scientific computation in Python. |
|||||||
|
This video on youtube would explain that to you in detail Python for Data Science |
|||||||||
|
I'd recommend this book. Machine Learning: An Algorithmic Perspective It uses data sets from the UCI Machine Learning Repository. Even if you don't precisely want to do machine learning, but are more interested in statistics there are a bunch of data sets here that might interest you. The main library for doing science with Python is Numpy. Enthought has a great Python distribution which includes Numpy, matplotlib, and many other libraries to get you started. Also check out: Numpy 1.5 Beginners Guide. It's a great introduction and will get you up and running doing simple things quickly. The author's blog also has some good introductory material. Hope that helps. An aside (noting you're a c# programmer as well). I still haven't been able to get Numpy to work successfully in IronPython. I have to use normal CPython. If you happen to get that figured out please let me know! Having Numpy within .NET would be really nice. |
|||
|