This is all taken from sentdex
's excellent Tensorflow tutorial series. Check it all out here:
https://www.youtube.com/playlist?list=PLQVvvaa0QuDfhTox0AjmQ6tvTgMBZBEXN
In this post I'll briefly note how to import own data into a jupyter notebook. Venv setup is the same as in the previous post.
Then pip install numpy
and pip install opencv-python
to use cv2
if you're going to be working with images.
Set up a data directory with paths to your relevant training data/categories, e.g here we have two separate categories Cat and Dog images and we're just checking if we've connected the paths correctly by displaying an image:
DATADIR = "C:/Users/Greg Sukochev/Desktop/PetImages"
CATEGORIES = ["Dog", "Cat"]
for category in CATEGORIES:
path = os.path.join(DATADIR, category) # path to cats or dogs dir
for img in os.listdir(path):
img_array = cv2.imread(os.path.join(path,img), cv2.IMREAD_GRAYSCALE)
plt.imshow(img_array, cmap="gray")
plt.show()
break
break
Since the images are all different sizes we need to standardize them and have a look at the result:
IMG_SIZE = 50
new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE))
plt.imshow(new_array, cmap = "gray")
plt.show()
Then we can create our training data, passing in this case if we encounter a broken image/error (should probably delete it in practice):
training_data = []
def create_training_data():
for category in CATEGORIES:
path = os.path.join(DATADIR, category) # path to cats or dogs dir
class_num = CATEGORIES.index(category)
for img in os.listdir(path):
try:
img_array = cv2.imread(os.path.join(path,img), cv2.IMREAD_GRAYSCALE)
new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE))
training_data.append([new_array, class_num])
except Exception as e:
pass
create_training_data()
The class_num
is assigning an actual number to a dog or cat image, I think 0
for dog and 1
for cat in this case.
We then shuffle our data (our data should be balanced in terms of numbers already 50:50, dogs:cats):
import random
random.shuffle(training_data)
Now we pack our data into the variables we're going to use in our network. x
has to be a numpy array in order to work with keras. x
is our image (just an array of numbers), y
is our label for our image (in this case a 0
or 1
). We also reshape it: -1
is a catch-all for how many features we have, the shape of data is IMG_SIZE by IMG_SIZE, the final 1
is because it's a grayscale (would be a 3
for colour).
x = []
y = []
for features, label in training_data:
x.append(features)
y.append(label)
x = np.array(x).reshape(-1, IMG_SIZE, IMG_SIZE, 1)
Finally, we use pickle
to save this data, because we don't want to generate this data everytime, particularly when we start tweaking the model:
import pickle
pickle_out = open("x.pickle", "wb")
pickle.dump(x, pickle_out)
pickle_out.close()
pickle_out = open("y.pickle", "wb")
pickle.dump(y, pickle_out)
pickle_out.close()
And load it for use:
pickle_in = open("x.pickle", "rb")
x = pickle.load(pickle_in)
In the next post I'll use this data in the actual neural network via:
x = np.asarray(pickle.load(open("x.pickle", "rb")))
y = np.asarray(pickle.load(open("y.pickle", "rb")))