FastAI Lesson 2
Recap
This lesson was all about getting a model deployed to production. It was an occasion for me to make sure I had jupyter
working locally and to understand how that flow would works. I used Jeremyâs technique of building an app in notebook and exporting it with nbdev
directives, which is super neat! I also set myself up on paperspace, which Iâd highly endorese. Eight dollars well spent.
This lessonâs clean notebook was basically a recap of lesson 1, with bears instead of cats and dogs or birds.
I deployed my model to a Huggingface Space using Gradio. This is based on a model I trained last week for recognizing bean lesions. So if you happen to be a farmer wanting to evaluated the health of your beansâŚyou know who to call đ¸. I skipped the step of using the api for this app. Gradio now has an API client that has to be installed with npm
. WhichâŚIm not trying to deploy anything that needs a build step. I wish they hadnât done that.
I hit a few hiccups which were easily navigated with some googling. Mainly these were API changes in various libraries. Inevitable as things move along but bumps nonetheless.
Quiz
- Provide an example of where the bear classification model might work poorly in production, due to structural or style differences in the training data. Not great at images that differ in stylisitically. The camera to recognize the bears might be mounted upside down.
- Where do text models currently have a major deficiency? Theyâre plausible, but are they correct?!
- What are possible negative societal implications of text generation models? Spreading false information thatâs very convincing
- In situations where a model might make mistakes, and those mistakes could be harmful, what is a good alternative to automating a process? Human in the loop.
- What kind of tabular data is deep learning particularly good at? where the column data might be very diverse
- Whatâs a key downside of directly using a deep learning model for recommendation systems? The recommendations might not actually be helpful. eg. they might recommend books you already have
- What are the steps of the Drivetrain Approach? objectives, levers, data, model (how the levers influence the objectives)
- How do the steps of the Drivetrain Approach map to a recommendation system? objective: drive more sales levers: ranking of the recommendations data: past choices of the user model: two models that are contingent on seeing or not seeing the recommendation
- Create an image recognition model using data you curate, and deploy it on the web.
- What is
DataLoaders
? Generic class for getting data into a learner - What four things do we need to tell fastai to create
DataLoaders
? - input and output types
- how get get the items
- how to label the items
- how to create a validaton set
- What does the
splitter
parameter toDataBlock
do? telling fai how to split of a validation set - How do we ensure a random split always gives the same validation set? seed will set the randomness
- What letters are often used to signify the independent and dependent variables? x for independent y for dependent
- Whatâs the difference between the crop, pad, and squish resize approaches? When might you choose one over the others? crop = cut pad = fill in with black squish = distort to fit chose might be dictated by the particulars of the data
- What is data augmentation? Why is it needed? creating random variations that seem different (to a model) but donât actually change in meaning
- What is the difference between
item_tfms
andbatch_tfms
? item is single, batch is a group - What is a confusion matrix? a nXn matrix that plots what the model was predicted vs the correct labels. the center diagonal is correct guess. others a re off somehow. lets us see issues in data or model. uses the validation set.
- What does
export
save? a pkl file of the trained model - What is it called when we use a model for getting predictions, instead of training? inference
- What are IPython widgets?
- When might you want to use CPU for deployment? When might GPU be better? cpu is more cost effective, easier to manage, more available, and perfectly suitable for running inferences.
- What are the downsides of deploying your app to a server, instead of to a client (or edge) device such as a phone or PC? network issues, latency, sensitive data, complexity of runnning infra
- What are three examples of problems that could occur when rolling out a bear warning system in practice? data is video vs pictures, bears in are in novel positions or lighting, night pictures, speed of results
- What is âout-of-domain dataâ? data that differs a lot to what was seen in training
- What is âdomain shiftâ? type of data changes over time so the initial model doesnt apply so much. the use of the model actually changes things, so the model has to be adjusted.
- What are the three steps in the deployment process?
- manual steps, human checks it all
- limited scope. time or geography limited. careful supervisions
- gradual expansion. need reporting. need to consider what can go wrong.