Member

[email protected]

Total likes received: 98 | Pubby Cash: 932


My Articles: 44

Shape of a Column in Pandas Data Frame

Categories: Tech | Pubby Cash Received:  10

Pandas is a must-have Python module for data manipulation and analysis. It can read a csv file into Python in a format of pandas data frame. Let's talk about the shape of a data-frame column, which may confuse a lot of beginners. Let's say we have a csv table named "data.csv" like this:

Year Counts
2008 100
2009 200
2010 300
2011 400
2012 500
After reading the csv file with pandas by executing the following code:
import pandas as pd
df = pd.read_csv('data.csv')
df.set_index('Year', inplace=True)
y = df['Counts']
print(y, type(y), y.shape)


You will get a variable y with two columns - Year as index and the Counts. Not one column as one may think! The type of y belongs to a class of pands.core.series.Series and the shape of the y is (5, ). Note this is zero dimension. This is what confuses people most. How can an object of two columns has a shape of zero dimension? What should we do if we only want one column - 'Counts'?

Let's execute the following codes to see the results:
y = df['Counts'].tolist()
print(y, type(y))


Now we got a list with only 'Counts' values. The data type is list. Since it is a list, the shape method cannot be used.

Then, let's execute the following codes to see the results:
y = df['Counts'].to_numpy()
print(y, type(y),y.shape)


It appears we got a list with only 'Counts' values. But this list is not a true list, as the shape method can be applied. It is a numpy array with a shape of (5, ), meaning zero dimension. Note that although we did not import numpy here, pandas is built on numpy and thus possess numpy's array features.

This zero dimension can cause some problems if you want to apply some machine learning algorithms to the data using the sci-kit learn module, because it accepts numpy array with at least one dimension. The question becomes: how to change a zero dimension array to a one-dimension array? A reshape(-1,1) method can be neat here. Execute the following code and see the results:
y = df['Counts'].to_numpy().reshape(-1,1)
print(y,type(y),y.shape)


You will see this time, y has a shape of (5,1), although you cannot really visually tell the difference compared to the previous y. But it is now a one-dimensional numpy array that can be executed with machine learning algorithms!

Although you don't have to go through the processes like these to plot the results with matplotlib, you have more flexibility and more control over your x and y axis, which is a good thing. Here is an example I did with my data:

...  Read more

Matplotlib - A Fantastic Plotting Module in Python

Categories: Tech | Pubby Cash Received:  20

I have been using a variety of plotting software for research and engineering drawing purposes, which include excel spreadsheet, sigmaplot, origin, minitab, matlab, and autoCAD. While studying matplotlib recently, I find it is time to switch to this module with python. It is so powerful that all plot details can be customized. For my applications, I need a plotting tool with the following features: (1) Linear regression. Matplotlib overkills it by possessing machine learning algorithms with a scikit-learn module. For linear regression, use from sklearn.linear_model import LinearRegression (2) Dual axis (3) Multiple plots in one figure; the canvas size of the figure can be defined; the canvas size of each plot in it can be defined as well. (4) Axis break (5) Customized error bars (6) Easy formatting of fonts, legend, axis scales, ticks and labels. With coding, no hand-formatting is needed like I did in excel spreadsheet before. (7) Easy labeling with text, annotating with arrows, and drawing additional horizontal and vertical lines (8) Inclusion of superscript, subscript, and special characters such as greek letter mu in axis labels (9) Easy export as high DPI figure while the file size is still small. This is perfect for journal submissions. The following plot is a sample I just created as a showcase how powerful it is to satisfy my above-mentioned needs. ...  Read more

A Kivy Installation Problem on Python 3.8

Categories: Tech | Pubby Cash Received:  20

Kivy is a python module that can be used to develop apps on multiple platforms. To install it, go to kivy.org and find the installation instructions on your Windows or Mac system. However, if you are using Python 3.8, kivy might not be successfully installed, as kivy is not supporting this version right now. You will see a full screen of red warning text with a final error message like "ERROR: Command errored out with exit status 1". If this happens, don't worry. Copy and paste the following kivy installation code to the command line and install it again. pip install kivy[base] kivy_examples --pre --extra-index-url https://kivy.org/downloads/simple/ ...  Read more

Data Science - A Promising Discipline in the 21st Century

Categories: Tech | Pubby Cash Received:  10

Data Science is not only a well paid career nowadays, but also evolves into the architecture of lots of other disciplines. As an engineer in traditional civil and environmental engineering disciplines, I recently started to learn data science because my job needs it as well. In many engineering applications, decisions need to made, not by human any more, but by machine. Think about a simple application: should a disinfectant dosing pump be turned on or turned off for water disinfection at a water treatment plant? During this COVID-19 crisis, the management at the plant might say, let's keep the pump running so that there will be enough disinfectant to kill the virus. However, too much disinfectant dosing can also have side effects on human health, which increases the amount of the carcinogenic disinfection byproducts. This is a kind of decision that grows complicated and needs backup based on data analysis. Machine learning is a branch of data science that uses techniques of regression, supervised clustering, etc, to predict the future, or the consequences, in this case, what will happen if the pump is turned on and what will happen if it is turned off. A wise decision is therefore can be made, which can ensure public health and even save people's lives. I could give millions of other examples that depend on all kinds of correct decisions made by machine based on the processing of tons of environmental data. The Tesla autopilot and the recent success of SpaceX rocket launch are one of them. Python - a coding tool, is indispensable for data science learning and applications, and actually falls into the discipline of data science itself and can aid learning. The following image shows the relationship of data science with other closely-related disciplines. You will know your hard-working direction if you want a well paid job and good career!
...  Read more

Jupyter Notebook - A Must-Have Editor for Those Who are Both Programmers and Educators

Categories: Tech | Pubby Cash Received:  10

In this article titled "Sublime Text vs. Pycharm, Which One is Better?", I mentioned Jupyter Notebook as one of well-known Python IDEs. I do see a number of YouTubers use it for demonstration and education purposes, and thought it is cool and practical to use during learning and teaching. However, I think it cannot replace Sublime Text for development of a series of complicated and connected coding files. The reason Jupyter Notebook is superior for education is that it is featured with multiple interactive code blocks that can be easily added or deleted. You can run one single code block or multiple code blocks to see the results instantaneously without executing the whole py file. The shortcuts are just like other editors: ctrl z for undo, ctrl / for comment, and the like, and you can edit those if you want. Of course, Jupyter Notebook also has downsides. I summarized two, and now only have one left. The No. 1 downside I had before was the default white background appearance, as I prefer a dark theme. It is now no longer a downside, as I've figured out how to change to a dark theme manually. The other downside is it runs on a terminal window just like a web application built on flask, and actually uses two of your computer resources simultaneously: a browser window + a terminal window. For some old computers, it will not be running as fast as Sublime Text, because Sublime Text is an editor stands on its own. It does not need you to run in command line unless you want to test the code you've written. However, as an educator and researcher, I would use Jupyter Notebook as well to produce some reproducible code snippets and save as .ipynb files for future references. With that said, here are some instructions for you to install Jupyter Notebook and apply a dark theme as I did. Open a terminal window, enter: pip install notebook pip install jupyterthemes jupyter notebook In a Jupyter Notebook code block, enter !jt -t monokai Note: "monokai" is the name of a dark theme I'm using. Restart Jupyter Notebook in the command line, and the dark theme is ready to go. ...  Read more

1 ... 4 5 6 ... 9

Daily Deals


MECHANICSBURG WEATHER