When doing data projects, working with `Pandas`

is a tool that I will always reach for first. Recently, I was working with Capital Bike Shares public API, and along with my usual method of cleaning up data, had a discovery that makes certain jobs easier.

First, my usual way of gathering data from a JSON feed.

```
stationId = pd.Series(data[0]['station_id'])
bikesAv = pd.Series(data[0]['num_bikes_available'])
bikesDis = pd.Series(data[0]['num_bikes_disabled'])
docksOpen = pd.Series(data[0]['num_docks_available'])
docksDis = pd.Series(data[0]['num_docks_disabled'])
for i in range(1, len(data)):
stationId[i] = pd.Series(data[i]['station_id'])
bikesAv[i] = pd.Series(data[i]['num_bikes_available'])
bikesDis[i] = pd.Series(data[i]['num_bikes_disabled'])
docksOpen[i] = pd.Series(data[i]['num_docks_available'])
docksDis[i] = pd.Series(data[i]['num_docks_disabled'])
```

I like to first create Pandas Series instead of a dataframe from the start because I belive that it makes the data collection more modular, making it easier later down the line to make more dataframes with different conditons attached to them. As seen I create the series first, and then I iterate ver the entire JSON feed to grab the rest of the data.

Next, combine it into a dataframe, which is simple, but I will show anyway.

```
docksDf = pd.DataFrame({
"Station ID": stationId.astype('int32'),
"Bikes Available": bikesAv.astype('int32'),
"Bikes Disabled": bikesDis.astype('int32'),
"Docks Available": docksOpen.astype('int32'),
"Docks Disabled": docksDis.astype('int32')
})
```

The reason I have `astype('int32')`

their is to standardize the data, along with making sure in the Pandas dataframe, that the dtype does not appear. The dtype apearing in the dataframe can sometimes be a issue, so `astype('int32')`

solves the issue nicely.

Belatedly, I discovered a nice trick with Numpy on Dataframes. I tried to do the usual `a < b <c`

style comparison on a pandas dataframe with `where()`

but I got a error. I discovered after some googling that `np.logical_and`

will let one make these types of logic statements when managing dataframes. As seen below, I was able to do the comparison that I wanted to do with `np.logical_and`

.

```
AVDSDf = pd.DataFrame({
"Below 5 Available": (docksDf["Bikes Available"]).where(docksDf["Bikes Available"] < 5),
"Between 5 and 10 Available": (docksDf["Bikes Available"]).where(np.logical_and(docksDf["Bikes Available"] >= 5,
docksDf["Bikes Available"] < 10)),
"Between 10 and 20 Available": (docksDf["Bikes Available"]).where(np.logical_and(docksDf["Bikes Available"] >= 10,
docksDf["Bikes Available"] < 20)),
"Above 20 Available": (docksDf["Bikes Available"]).where(20 <= docksDf["Bikes Available"]),
"Below 5 Disabled": (docksDf["Bikes Disabled"]).where(docksDf["Bikes Disabled"] < 5),
"Between 5 and 10 Disabled": (docksDf["Bikes Disabled"]).where(np.logical_and(docksDf["Bikes Disabled"] >= 5,
docksDf["Bikes Disabled"] < 10)),
"Above 10 Disabled": (docksDf["Bikes Disabled"]).where(10 <= docksDf["Bikes Disabled"])
})
```

The virtue of this trick is while simple, it opens up more ways to look at data. For example just by doing this `"Disabled to Available Ratio": (docksDf["Bikes Disabled"] / docksDf["Bikes Available"]).astype('float')`

, I am able to find the ratio of disabled to available bikes. This is convient because this data point, not given directly, with some simple work in Pandas, is able to be revealed. Doing this, I was able to find out that at somepoint on Sunday, around `10%`

of all Capital bikes in D.C were disabled.

While more complicated munging is possible working with Lambdas and other tools in python, the discovery of `np.logical_and`

is a very convient tool for doing some data exploration.