教程 | 如何使用DeepFake實現(xiàn)視頻換臉

mediatv 2020-02-14

展開全文

機(jī)器之心發(fā)布

作者：馮沁原

不久之前，AV 視頻換臉明星的 DeepFake 火了。這篇文章將一步步教你如何實現(xiàn)換臉。

如果你是第一次聽說 DeepFake，一定要點擊上面的視頻，親自感受一下尼古拉斯的臉是如何占據(jù)全世界的每一個影片。

項目實戰(zhàn)

我們要如何實現(xiàn)視頻里的變臉呢?

因為視頻是連續(xù)的圖片，那么我們只需要把每一張圖片中的臉切換了，就能得到變臉的新視頻了。那么如何切換一個視頻中的圖片呢? 這需要我們首先找到視頻中的臉，然后把臉進(jìn)行切換。我們會發(fā)現(xiàn)，變臉這個難題可以拆解成如下的流程。

于是，在我們會在后續(xù)按照這五個步驟進(jìn)行介紹。

視頻轉(zhuǎn)圖像

FFmpeg

FFmpeg 提供了處理音頻、視頻、字幕和相關(guān)源數(shù)據(jù)的工具庫。核心的庫包括:

libavcodec 提供了處理編碼的能力
libavformat 實現(xiàn)了流協(xié)議、容器類型、基本的 I/O 訪問
libavutil 包括哈希、解壓縮等多樣的功能
libavfilter 提供了鏈?zhǔn)叫薷囊纛l和視頻的能力
libavdevice 提供了對設(shè)備訪問的抽象
libswresample 實現(xiàn)了混音等能力
libswscale 實現(xiàn)了顏色和尺度變換的能力

對外主要提供了三個工具:

ffmpeg 用來處理多媒體內(nèi)容
ffplay 是一個極簡的播放器
ffprobe 是多媒體內(nèi)容的分析工具

于是，我們的視頻轉(zhuǎn)圖片的功能，可以通過以下命令來實現(xiàn)，

ffmpeg -i clipname -vf fps=framerate -qscale:v 2"imagename%04d.jpg"

具體來說，上面的指令可以把一個視頻，按照固定的頻率生成圖片。

人臉定位

基本算法

人臉定位是一個相對成熟的領(lǐng)域，主要應(yīng)用 dlib 庫的相關(guān)功能。我們雖然可以定制一個人臉識別的算法，但是我們也可以使用已有的通用的人臉識別的函數(shù)庫。

有兩類算法，一類是 HOG 的臉部標(biāo)記算法。

(來源: Facial landmarks with dlib, OpenCV, and Python)

該算法的效果如上圖。它將人臉分成了如下的區(qū)域:

眼睛 (左/右)
眉毛 (左/右)
鼻子
嘴
下巴

基于這些標(biāo)記，我們不僅能夠進(jìn)行后續(xù)的換臉，也能檢測臉的具體形態(tài)，眨眼狀態(tài)等。例如，我們可以把這些點連在一起，得到更多的特征。

(來源: Real-Time Face Pose Estimation )

尋找臉部標(biāo)記是一個預(yù)測問題，輸入是一張圖片和興趣區(qū)域，輸出是興趣區(qū)域的關(guān)鍵點。

HOG 是如何找到人臉的呢? 這是一個通用的檢測算法:

從數(shù)據(jù)集中找到正樣本，并且計算 HOG 描述
從數(shù)據(jù)集中找到負(fù)樣本，并且計算 HOG 描述
基于 HOG 的描述使用分類算法
在負(fù)樣本上在不同的起點和尺度進(jìn)行分類，并且找到誤判的 HOG
基于上一步的負(fù)樣本，對模型進(jìn)行重新的訓(xùn)練

這里有個問題，如何計算 HOG 的描述呢? 我們可以計算每個點的亮度，然后把每個點表示為指向更黑的方向的向量。如下圖所示:

(來源: Machine Learning is Fun! Part 4: Modern Face Recognition with Deep Learning )

我們?yōu)槭裁匆@么做呢? 因為每個點的絕對值會受到環(huán)境的影響，但是相對值則比較穩(wěn)定。因此，我們通過梯度變化的表示，能夠準(zhǔn)備出高質(zhì)量的數(shù)據(jù)。當(dāng)然，我們也可以進(jìn)一步的把相鄰的點聚合在一起，從而產(chǎn)生更有代表性的數(shù)據(jù)。

現(xiàn)在可以進(jìn)行檢測了

首先在新的圖片上基于不同的起點和尺度尋找可行的區(qū)間；
基于非極大抑制的方法來減少冗余和重復(fù)的，下圖就是一個有冗余和去除冗余的情況，這個方法說白了就是找一個最大概率的矩陣去覆蓋掉和它過于重合的矩陣，并且不斷重復(fù)這個過程。

(來源: Histogram of Oriented Gradients and Object Detection)

有了輪廓之后，我們可以找到臉部標(biāo)記。尋找臉部標(biāo)記的算法是基于《One Millisecond Face Alignment with an Ensemble of Regression Trees》的論文。簡單來說，它利用了已經(jīng)標(biāo)記好的訓(xùn)練集來訓(xùn)練一個回歸樹的組合，從而用來預(yù)測。

(來源: One Millisecond Face Alignment with an Ensemble of Regression Trees)

在這個基礎(chǔ)上，就能夠標(biāo)記出這 68 個點。

(來源: Facial landmarks with dlib, OpenCV, and Python )

基于人臉的 68 個標(biāo)記的坐標(biāo)，可以計算人臉的?度，從而摳出擺正后的人臉。但是 dlib 要求識別的必須是全臉，因此會減少我們的樣本集以及一些特定的樣本場景。同時，因為人臉是 64*64 像素的尺寸，因此也要處理清晰度的問題。

另一種方法是用 CNN 訓(xùn)練一個識別臉部的模型。CNN 能夠檢測更多的?度，但是需要更多的資源，并且可能在大文件上失效。

數(shù)據(jù)準(zhǔn)備

我們的目標(biāo)是把原始人臉轉(zhuǎn)換為目標(biāo)人臉，因此我們需要收集原始人臉的圖片和目標(biāo)人臉的圖片。如果你選擇的是一個名人，那么可以直接用 Google image 得到你想要的圖片。雖然視頻中的圖片也能用，但是也可以收集一些多樣的數(shù)據(jù)。當(dāng)然，我用的是我和我老婆的圖片，因此直接從我們的 Photo 中導(dǎo)出即可。當(dāng)人臉數(shù)據(jù)生成后，最好仔細(xì)檢查一下，避免不應(yīng)該的臉或者其它的東東出現(xiàn)在你的訓(xùn)練集中。

extract.py

Deepfake 用于定位人臉的算法如下:

importcv2 # 開源的計算機(jī)視覺庫

frompathlib importPath # 提供面向?qū)ο蠓绞降奈募L問

fromtqdm importtqdm # 提供進(jìn)度條顯示功能

importos # 提供操作系統(tǒng)相關(guān)的訪問

importnumpy asnp # 提供科學(xué)計算相關(guān)的功能

fromlib.cli importDirectoryProcessor, rotate_image # 處理一個目錄的文件，然后保存到新的目錄中；旋轉(zhuǎn)圖片，其實是在utils中

fromlib.utils importget_folder # 獲得一個folder，不存在則創(chuàng)建

fromlib.multithreading importpool_process # 多進(jìn)程并發(fā)計算

fromlib.detect_blur importis_blurry # 判斷圖片是否模糊

fromplugins.PluginLoader importPluginLoader # 加載對應(yīng)的算法

classExtractTrainingData(DirectoryProcessor):# 從訓(xùn)練集提取頭像

defcreate_parser(self, subparser, command, deion):

self.optional_arguments = self.get_optional_arguments()

self.parser = subparser.add_parser(

command,

help="Extract the faces from a pictures.",

deion=deion,

epilog="Questions and feedback:

https://github.com/deepfakes/faceswap-playground"

)

# 參數(shù)配置部分省略

defprocess(self):

extractor_name = "Align"# 對應(yīng)的是Extract_Align.py

self.extractor = PluginLoader.get_extractor(extractor_name)()

processes = self.arguments.processes

try:

ifprocesses != 1: # 多進(jìn)程處理圖片

files = list(self.read_directory())

forfilename, faces intqdm(pool_process(self.processFiles, files, processes=processes), total = len(files)):

self.num_faces_detected += 1

self.faces_detected[os.path.basename(filename)] = faces

else: # 單進(jìn)程處理圖片

forfilename intqdm(self.read_directory()):

try:

image = cv2.imread(filename)

self.faces_detected[os.path.basename(filename)] = self.handleImage(image, filename)

exceptException ase:

ifself.arguments.verbose:

print('Failed to extract from image: {}. Reason: {}'.format(filename, e))

pass

finally:

self.write_alignments()

defprocessFiles(self, filename):# 處理一個單獨的圖片的函數(shù)

try:

image = cv2.imread(filename)

returnfilename, self.handleImage(image, filename)

exceptException ase:

ifself.arguments.verbose:

print('Failed to extract from image: {}. Reason: {}'.format(filename, e))

pass

returnfilename, []

defgetRotatedImageFaces(self, image, angle):# 得到固定角度旋轉(zhuǎn)后的圖片的人臉

rotated_image = rotate_image(image, angle)

faces = self.get_faces(rotated_image, rotation=angle)

rotated_faces = [(idx, face) foridx, face infaces]

returnrotated_faces, rotated_image

defimageRotator(self, image):# 得到一系列旋轉(zhuǎn)后的人臉

''' rotates the image through rotation_angles to try to find a face '''

forangle inself.rotation_angles:

rotated_faces, rotated_image = self.getRotatedImageFaces(image, angle)

iflen(rotated_faces) > 0:

ifself.arguments.verbose:

print('found face(s) by rotating image {} degrees'.format(angle))

break

returnrotated_faces, rotated_image

defhandleImage(self, image, filename):

faces = self.get_faces(image)

process_faces = [(idx, face) foridx, face infaces]

# 沒有找到人臉，嘗試旋轉(zhuǎn)圖片

ifself.rotation_angles isnotNoneandlen(process_faces) == 0:

process_faces, image = self.imageRotator(image)

rvals = []

foridx, face inprocess_faces:

# 畫出人臉的標(biāo)記

ifself.arguments.debug_landmarks:

for(x, y) inface.landmarksAsXY():

cv2.circle(image, (x, y), 2, (0, 0, 255), -1)

resized_image, t_mat = self.extractor.extract(image, face, 256, self.arguments.align_eyes)

output_file = get_folder(self.output_dir) / Path(filename).stem

# 檢測圖片是否模糊

ifself.arguments.blur_thresh isnotNone:

aligned_landmarks = self.extractor.transform_points(face.landmarksAsXY(), t_mat, 256, 48)

feature_mask = self.extractor.get_feature_mask(aligned_landmarks / 256, 256, 48)

feature_mask = cv2.blur(feature_mask, (10, 10))

isolated_face = cv2.multiply(feature_mask, resized_image.astype(float)).astype(np.uint8)

blurry, focus_measure = is_blurry(isolated_face, self.arguments.blur_thresh)

# print("{} focus measure: {}".format(Path(filename).stem, focus_measure))

# cv2.imshow("Isolated Face", isolated_face)

# cv2.waitKey(0)

# cv2.destroyAllWindows()

ifblurry:

print("{}'s focus measure of {} was below the blur threshold, moving to "blurry"".format(Path(filename).stem, focus_measure))

output_file = get_folder(Path(self.output_dir) / Path("blurry")) / Path(filename).stem

cv2.imwrite('{}_{}{}'.format(str(output_file), str(idx), Path(filename).suffix), resized_image) # 生成新圖片

f = {

"r": face.r,

"x": face.x,

"w": face.w,

"y": face.y,

"h": face.h,

"landmarksXY": face.landmarksAsXY()

}

rvals.append(f)

returnrvals

注意，基于特征標(biāo)記的算法對于傾斜的臉效果不好，也可以引入 CNN。

人臉轉(zhuǎn)換

人臉轉(zhuǎn)換的基本原理是什么? 假設(shè)讓你盯著一個人的視頻連續(xù)看上 100 個小時，接著又給你看一眼另外一個人的照片，接著讓你憑著記憶畫出來剛才的照片，你一定畫的會很像第一個人的。

我們使用的模型是 Autoencoder。有趣的是，這個模型所做的是基于原始的圖片再次生成原始的圖片。Autoencoder 的編碼器把圖片進(jìn)行壓縮，而解碼器把圖片進(jìn)行還原，一個示例如下圖:

(來源: Building Autoencoders in Keras )

在這個基礎(chǔ)上，即使我們輸入的是另外一個人臉，也會被 Autoencoder 編碼成為一個類似原來的臉。

為了提升我們最終的效果，我們還需要把人臉共性相關(guān)的屬性和人臉特性相關(guān)的屬性進(jìn)行學(xué)習(xí)。因此，我們對所有的臉都用一個統(tǒng)一的編碼器，這個編碼器的目的是學(xué)習(xí)人臉共性的地方;然后，我們對每個臉有一個單獨的解碼器，這個解碼器是為了學(xué)習(xí)人臉個性的地方。這樣當(dāng)你用 B 的臉通過編碼器，再使用 A 的解碼器的話，你會得到一個與 B 的表情一致，但是 A 的臉。

這個過程用公式表示如下:

X' = Decoder(Encoder(Shuffle(X)))

Loss = L1Loss(X'-X)

A' = Decoder_A(Encoder(Shuffle(A)))

Loss_A = L1Loss(A'-A)

B' = Decoder_B(Encoder(Shuffle(B)))

Loss_B = L1Loss(B'-B)

具體來說，在訓(xùn)練過程中，我們輸入 A 的圖片，通過編碼器和解碼器還原 A 的臉;然后我們輸入 B 的圖片，通過相同的編碼器但是不同的解碼器還原 B 的臉。不斷迭代這個過程，直到 loss 降低到一個閾值。在模型訓(xùn)練的時候，我建議把 loss 降低到 0.02，這樣的效果會比較好。

這里用的是比較標(biāo)準(zhǔn)的建模方式。值得注意的是，作者通過加入 PixelShuffler() 的函數(shù)把圖像進(jìn)行了一定的扭曲，而這個扭曲增加了學(xué)習(xí)的難度，反而讓模型能夠?qū)崿F(xiàn)最終的效果。仔細(xì)想想這背后的道理，如果你一直在做簡單的題目，那么必然不會有什么解決難題的能力。但是，我只要把題目做一些變體，就足以讓你成?。

因為在建模中使用的是原圖 A 的扭曲來還原 A，應(yīng)用中是用 B 來還原 A，所以扭曲的方式會極大的影響到最終的結(jié)果。因此，如何選擇更好的扭曲方式，也是一個重要的問題。

當(dāng)我們圖片融合的時候，會有一個難題，如何又保證效果又防止圖片抖動。于是我們還要引入相關(guān)的算法處理這些情況。于是我們可以知道，一個看似直接的人臉轉(zhuǎn)換算法在實際操作中需要考慮各種各樣的特殊情況，這才是真真的接地氣。

train.py

以下是進(jìn)行訓(xùn)練的算法邏輯：

importcv2 # 開源的計算機(jī)視覺庫

importnumpy # 提供科學(xué)計算相關(guān)的功能

importtime # 提供時間相關(guān)的功能

importthreading # 提供多線程相關(guān)的功能

fromlib.utils importget_image_paths, get_folder # 得到一個目錄下的圖片；獲得一個folder，不存在則創(chuàng)建

fromlib.cli importFullPaths, argparse, os, sys

fromplugins.PluginLoader importPluginLoader # 加載對應(yīng)的算法

tf = None

set_session = None

defimport_tensorflow_keras():# 在需要的時候載入TensorFlow和keras模塊

''' Import the TensorFlow and keras set_session modules only when they are required '''

globaltf

globalset_session

iftf isNoneorset_session isNone:

importtensorflow

importkeras.backend.tensorflow_backend # keras依賴底層的tensorflow實現(xiàn)具體的運算

tf = tensorflow

set_session = keras.backend.tensorflow_backend.set_session

classTrainingProcessor(object):# 訓(xùn)練器

arguments = None

def__init__(self, subparser, command, deion='default'):# 初始化訓(xùn)練器

self.argument_list = self.get_argument_list()

self.optional_arguments = self.get_optional_arguments()

self.parse_arguments(deion, subparser, command)

self.lock = threading.Lock()

defprocess_arguments(self, arguments):

self.arguments = arguments

print("Model A Directory: {}".format(self.arguments.input_A))

print("Model B Directory: {}".format(self.arguments.input_B))

print("Training data directory: {}".format(self.arguments.model_dir))

self.process()

# 參數(shù)配置部分省略

@staticmethod

defget_optional_arguments():# 創(chuàng)建一個存放參數(shù)的數(shù)組

''' Put the arguments in a list so that they are accessible from both argparse and gui '''

# Override this for custom arguments

argument_list = []

returnargument_list

defparse_arguments(self, deion, subparser, command):

parser = subparser.add_parser(

command,

help="This command trains the model for the two faces A and B.",

deion=deion,

epilog="Questions and feedback:

https://github.com/deepfakes/faceswap-playground")

foroption inself.argument_list:

args = option['opts']

kwargs = {key: option[key] forkey inoption.keys() ifkey != 'opts'}

parser.add_argument(*args, **kwargs)

parser = self.add_optional_arguments(parser)

parser.set_defaults(func=self.process_arguments)

defadd_optional_arguments(self, parser):

foroption inself.optional_arguments:

args = option['opts']

kwargs = {key: option[key] forkey inoption.keys() ifkey != 'opts'}

parser.add_argument(*args, **kwargs)

returnparser

defprocess(self):# 具體的執(zhí)行

self.stop = False

self.save_now = False

thr = threading.Thread(target=self.processThread, args=(), kwargs={}) # 線程執(zhí)行

thr.start()

ifself.arguments.preview:

print('Using live preview')

whileTrue:

try:

withself.lock:

forname, image inself.preview_buffer.items():

cv2.imshow(name, image)

key = cv2.waitKey(1000)

ifkey == ord('n') orkey == ord('r'):

break

ifkey == ord('s'):

self.save_now = True

exceptKeyboardInterrupt:

break

else:

try:

input() # TODO how to catch a specific key instead of Enter?

# there isnt a good multiplatform solution: https:///questions/3523174/raw-input-in-python-without-pressing-enter

exceptKeyboardInterrupt:

pass

print("Exit requested! The trainer will complete its current cycle, save the models and quit (it can take up a couple of seconds depending on your training speed). If you want to kill it now, press Ctrl + c")

self.stop = True

thr.join() # waits until thread finishes

defprocessThread(self):

try:

ifself.arguments.allow_growth:

self.set_tf_allow_growth()

print('Loading data, this may take a while...') # 加載數(shù)據(jù)

# this is so that you can enter case insensitive values for trainer

trainer = self.arguments.trainer

trainer = "LowMem"iftrainer.lower() == "lowmem"elsetrainer

model = PluginLoader.get_model(trainer)(get_folder(self.arguments.model_dir), self.arguments.gpus) # 讀取模型

model.load(swapped=False)

images_A = get_image_paths(self.arguments.input_A) # 圖片A

images_B = get_image_paths(self.arguments.input_B) # 圖片B

trainer = PluginLoader.get_trainer(trainer) # 創(chuàng)建訓(xùn)練器

trainer = trainer(model, images_A, images_B, self.arguments.batch_size, self.arguments.perceptual_loss) # 設(shè)置訓(xùn)練器參數(shù)

print('Starting. Press "Enter" to stop training and save model')

forepoch inrange(0, self.arguments.epochs):

save_iteration = epoch % self.arguments.save_interval == 0

trainer.train_one_step(epoch, self.show if(save_iteration orself.save_now) elseNone) # 進(jìn)行一步訓(xùn)練

ifsave_iteration:

model.save_weights()

ifself.stop:

break

ifself.save_now:

model.save_weights()

self.save_now = False

model.save_weights()

exit(0)

exceptKeyboardInterrupt:

try:

model.save_weights()

exceptKeyboardInterrupt:

print('Saving model weights has been cancelled!')

exit(0)

exceptException ase:

raisee

exit(1)

defset_tf_allow_growth(self):

import_tensorflow_keras()

config = tf.ConfigProto()

config.gpu_options.allow_growth = True

config.gpu_options.visible_device_list="0"

set_session(tf.Session(config=config))

preview_buffer = {}

defshow(self, image, name=''):# 提供預(yù)覽

try:

ifself.arguments.redirect_gui:

path = os.path.realpath(os.path.dirname(sys.argv[0]))

img = '.gui_preview.png'

imgfile = os.path.join(path, img)

cv2.imwrite(imgfile, image)

elifself.arguments.preview:

withself.lock:

self.preview_buffer[name] = image

elifself.arguments.write_image:

cv2.imwrite('_sample_{}.jpg'.format(name), image)

exceptException ase:

print("could not preview sample")

raisee

Trainer.py

以下實現(xiàn)了一次具體的訓(xùn)練：

importtime

importnumpy

fromlib.training_data importTrainingDataGenerator, stack_images

classTrainer():

random_transform_args = { # 初始化參數(shù)

'rotation_range': 10,

'zoom_range': 0.05,

'shift_range': 0.05,

'random_flip': 0.4,

}

def__init__(self, model, fn_A, fn_B, batch_size, *args):

self.batch_size = batch_size

self.model = model

generator = TrainingDataGenerator(self.random_transform_args, 160) # 讀取需要的數(shù)據(jù)

self.images_A = generator.minibatchAB(fn_A, self.batch_size)

self.images_B = generator.minibatchAB(fn_B, self.batch_size)

deftrain_one_step(self, iter, viewer):# 訓(xùn)練一步

epoch, warped_A, target_A = next(self.images_A)

epoch, warped_B, target_B = next(self.images_B)

loss_A = self.model.autoencoder_A.train_on_batch(warped_A, target_A) # 計算損失

loss_B = self.model.autoencoder_B.train_on_batch(warped_B, target_B)

print("[{0}] [#{1:05d}] loss_A: {2:.5f}, loss_B: {3:.5f}".format(time.strftime("%H:%M:%S"), iter, loss_A, loss_B),

end='r')

ifviewer isnotNone:

viewer(self.show_sample(target_A[0:14], target_B[0:14]), "training")

defshow_sample(self, test_A, test_B):

figure_A = numpy.stack([

test_A,

self.model.autoencoder_A.predict(test_A),

self.model.autoencoder_B.predict(test_A),

], axis=1)

figure_B = numpy.stack([

test_B,

self.model.autoencoder_B.predict(test_B),

self.model.autoencoder_A.predict(test_B),

], axis=1)

iftest_A.shape[0] % 2== 1:

figure_A = numpy.concatenate ([figure_A, numpy.expand_dims(figure_A[0],0) ])

figure_B = numpy.concatenate ([figure_B, numpy.expand_dims(figure_B[0],0) ])

figure = numpy.concatenate([figure_A, figure_B], axis=0)

w = 4

h = int( figure.shape[0] / w)

figure = figure.reshape((w, h) + figure.shape[1:])

figure = stack_images(figure)

returnnumpy.clip(figure * 255, 0, 255).astype('uint8')

AutoEncoder.py

以下是我們使用的AutoEncoder的算法邏輯：

# AutoEncoder的基礎(chǔ)類

importos, shutil

encoderH5 = 'encoder.h5'

decoder_AH5 = 'decoder_A.h5'

decoder_BH5 = 'decoder_B.h5'

classAutoEncoder:

def__init__(self, model_dir, gpus):

self.model_dir = model_dir

self.gpus = gpus

self.encoder = self.Encoder()

self.decoder_A = self.Decoder()

self.decoder_B = self.Decoder()

self.initModel()

defload(self, swapped):

(face_A,face_B) = (decoder_AH5, decoder_BH5) ifnotswapped else(decoder_BH5, decoder_AH5)

try: # 加載權(quán)重

self.encoder.load_weights(str(self.model_dir / encoderH5))

self.decoder_A.load_weights(str(self.model_dir / face_A))

self.decoder_B.load_weights(str(self.model_dir / face_B))

print('loaded model weights')

returnTrue

exceptException ase:

print('Failed loading existing training data.')

print(e)

returnFalse

defsave_weights(self):# 存儲權(quán)重

model_dir = str(self.model_dir)

ifos.path.isdir(model_dir + "_bk"):

shutil.rmtree(model_dir + "_bk")

shutil.move(model_dir, model_dir + "_bk")

os.mkdir(model_dir)

self.encoder.save_weights(str(self.model_dir / encoderH5))

self.decoder_A.save_weights(str(self.model_dir / decoder_AH5))

self.decoder_B.save_weights(str(self.model_dir / decoder_BH5))

print('saved model weights')

Model.py

以下是我們的具體模型：

# Based on the original https://www./r/deepfakes/ code sample + contribs

fromkeras.models importModel asKerasModel

fromkeras.layers importInput, Dense, Flatten, Reshape

fromkeras.layers.advanced_activations importLeakyReLU

fromkeras.layers.convolutional importConv2D

fromkeras.optimizers importAdam

from.AutoEncoder importAutoEncoder

fromlib.PixelShuffler importPixelShuffler

fromkeras.utils importmulti_gpu_model

IMAGE_SHAPE = (64, 64, 3)

ENCODER_DIM = 1024

classModel(AutoEncoder):

definitModel(self):

optimizer = Adam(lr=5e-5, beta_1=0.5, beta_2=0.999) # 深入理解Adam的優(yōu)化

x = Input(shape=IMAGE_SHAPE)

self.autoencoder_A = KerasModel(x, self.decoder_A(self.encoder(x)))

self.autoencoder_B = KerasModel(x, self.decoder_B(self.encoder(x)))

ifself.gpus > 1:

self.autoencoder_A = multi_gpu_model( self.autoencoder_A , self.gpus)

self.autoencoder_B = multi_gpu_model( self.autoencoder_B , self.gpus)

self.autoencoder_A.compile(optimizer=optimizer, loss='mean_absolute_error')

self.autoencoder_B.compile(optimizer=optimizer, loss='mean_absolute_error')

defconverter(self, swap):

autoencoder = self.autoencoder_B ifnotswap elseself.autoencoder_A

returnlambdaimg: autoencoder.predict(img)

defconv(self, filters):

defblock(x):

x = Conv2D(filters, kernel_size=5, strides=2, padding='same')(x)

x = LeakyReLU(0.1)(x)

returnx

returnblock

defupscale(self, filters):

defblock(x):

x = Conv2D(filters * 4, kernel_size=3, padding='same')(x)

x = LeakyReLU(0.1)(x) # 使用 LeakyReLU 激活函數(shù)

x = PixelShuffler()(x) # 將filter的大小變?yōu)樵瓉淼?/4，讓高和寬變?yōu)樵瓉淼膬杀?/p>

returnx

returnblock

defEncoder(self):

input_ = Input(shape=IMAGE_SHAPE)

x = input_

x = self.conv(128)(x)

x = self.conv(256)(x)

x = self.conv(512)(x)

x = self.conv(1024)(x)

x = Dense(ENCODER_DIM)(Flatten()(x))

x = Dense(4* 4* 1024)(x)

x = Reshape((4, 4, 1024))(x)

x = self.upscale(512)(x)

returnKerasModel(input_, x)

defDecoder(self):

input_ = Input(shape=(8, 8, 512))

x = input_

x = self.upscale(256)(x)

x = self.upscale(128)(x)

x = self.upscale(64)(x)

x = Conv2D(3, kernel_size=5, padding='same', activation='sigmoid')(x)

returnKerasModel(input_, x)

整個網(wǎng)絡(luò)的結(jié)構(gòu)如下：

來源: 刷爆朋友圈的視頻人物換臉是怎樣煉成的?

我們可以看出來，經(jīng)歷了四個卷積層、展開層、全連接層，我們開始 upscale 整個模型。在我們 upscale 一半的時候，我們把 encoder 和 decoder 進(jìn)行

了切割，從而保證了共性和個性的分離。

convert.py

在訓(xùn)練的基礎(chǔ)上，我們現(xiàn)在可以進(jìn)行圖片的轉(zhuǎn)換了。

importcv2

importre

importos

frompathlib importPath

fromtqdm importtqdm

fromlib.cli importDirectoryProcessor, FullPaths

fromlib.utils importBackgroundGenerator, get_folder, get_image_paths, rotate_image

fromplugins.PluginLoader importPluginLoader

classConvertImage(DirectoryProcessor):

filename = ''

defcreate_parser(self, subparser, command, deion):

self.optional_arguments = self.get_optional_arguments()

self.parser = subparser.add_parser(

command,

help="Convert a source image to a new one with the face swapped.",

deion=deion,

epilog="Questions and feedback:

https://github.com/deepfakes/faceswap-playground"

)

# 參數(shù)配置部分省略

defprocess(self):# 進(jìn)行模型的轉(zhuǎn)換和拼接

# Original & LowMem models go with Adjust or Masked converter

# Note:GAN prediction outputs a mask + an image, while other predicts only an image

model_name = self.arguments.trainer

conv_name = self.arguments.converter

self.input_aligned_dir = None

model = PluginLoader.get_model(model_name)(get_folder(self.arguments.model_dir), self.arguments.gpus)

ifnotmodel.load(self.arguments.swap_model):

print('Model Not Found! A valid model must be provided to continue!')

exit(1)

input_aligned_dir = Path(self.arguments.input_dir)/Path('aligned')

ifself.arguments.input_aligned_dir isnotNone:

input_aligned_dir = self.arguments.input_aligned_dir

try:

self.input_aligned_dir = [Path(path) forpath inget_image_paths(input_aligned_dir)]

iflen(self.input_aligned_dir) == 0:

print('Aligned directory is empty, no faces will be converted!')

eliflen(self.input_aligned_dir) <= len(self.input_dir)/3:

print('Aligned directory contains an amount of images much less than the input, are you sure this is the right directory?')

except:

print('Aligned directory not found. All faces listed in the alignments file will be converted.')

converter = PluginLoader.get_converter(conv_name)(model.converter(False),

trainer=self.arguments.trainer,

blur_size=self.arguments.blur_size,

seamless_clone=self.arguments.seamless_clone,

sharpen_image=self.arguments.sharpen_image,

mask_type=self.arguments.mask_type,

erosion_kernel_size=self.arguments.erosion_kernel_size,

match_histogram=self.arguments.match_histogram,

smooth_mask=self.arguments.smooth_mask,

avg_color_adjust=self.arguments.avg_color_adjust

)

batch = BackgroundGenerator(self.prepare_images(), 1)

# frame ranges stuff...

self.frame_ranges = None

# split out the frame ranges and parse out "min" and "max" values

minmax = {

"min": 0, # never any frames less than 0

"max": float("inf")

}

ifself.arguments.frame_ranges:

self.frame_ranges = [tuple(map(lambdaq: minmax[q] ifq inminmax.keys() elseint(q), v.split("-"))) forv inself.arguments.frame_ranges]

# last number regex. I know regex is hacky, but its reliablyhacky(tm).

self.imageidxre = re.compile(r'(d+)(?!.*d)')

foritem inbatch.iterator():

self.convert(converter, item)

defcheck_skipframe(self, filename):

try:

idx = int(self.imageidxre.findall(filename)[0])

returnnotany(map(lambdab: b[0]<=idx<=b[1], self.frame_ranges))

except:

returnFalse

defcheck_skipface(self, filename, face_idx):

aligned_face_name = '{}_{}{}'.format(Path(filename).stem, face_idx, Path(filename).suffix)

aligned_face_file = Path(self.arguments.input_aligned_dir) / Path(aligned_face_name)

# TODO:Remove this temporary fix for backwards compatibility of filenames

bk_compat_aligned_face_name = '{}{}{}'.format(Path(filename).stem, face_idx, Path(filename).suffix)

bk_compat_aligned_face_file = Path(self.arguments.input_aligned_dir) / Path(bk_compat_aligned_face_name)

returnaligned_face_file notinself.input_aligned_dir andbk_compat_aligned_face_file notinself.input_aligned_dir

defconvert(self, converter, item):

try:

(filename, image, faces) = item

skip = self.check_skipframe(filename)

ifself.arguments.discard_frames andskip:

return

ifnotskip: # process frame as normal

foridx, face infaces:

ifself.input_aligned_dir isnotNoneandself.check_skipface(filename, idx):

print('face {} for frame {} was deleted, skipping'.format(idx, os.path.basename(filename)))

continue

# Check for image rotations and rotate before mapping face

ifface.r != 0:

height, width = image.shape[:2]

image = rotate_image(image, face.r)

image = converter.patch_image(image, face, 64if"128"notinself.arguments.trainer else128)

# TODO:This switch between 64 and 128 is a hack for now. We should have a separate cli option for size

image = rotate_image(image, face.r * -1, rotated_width=width, rotated_height=height)

else:

image = converter.patch_image(image, face, 64if"128"notinself.arguments.trainer else128)

# TODO:This switch between 64 and 128 is a hack for now. We should have a separate cli option for size

output_file = get_folder(self.output_dir) / Path(filename).name

cv2.imwrite(str(output_file), image)

exceptException ase:

print('Failed to convert image: {}. Reason: {}'.format(filename, e))

defprepare_images(self):

self.read_alignments()

is_have_alignments = self.have_alignments()

forfilename intqdm(self.read_directory()):

image = cv2.imread(filename)

ifis_have_alignments:

ifself.have_face(filename):

faces = self.get_faces_alignments(filename, image)

else:

tqdm.write ('no alignment found for {}, skipping'.format(os.path.basename(filename)))

continue

else:

faces = self.get_faces(image)

yieldfilename, image, faces

當(dāng)然我們也可以用 GAN 算法進(jìn)行優(yōu)化，那么讓我們看一下使用 GAN 的模型。

(來源: shaoanlu/faceswap-GAN)

如上圖所示，我們首先扣取 A 的人臉，然后進(jìn)行變形，之后經(jīng)歷編碼和解碼生成了重建的臉和 Mask。以下是我們的學(xué)習(xí)目標(biāo)。

(來源: shaoanlu/faceswap-GAN)

從圖片到視頻

基于我們 FFmpeg 的講解，可以使用以下命令將一批圖片合并為一個視頻：

ffmpeg -f image2 -i imagename%04d.jpg -vcodec libx264 -crf 15-pix_fmt yuv420p output_filename.mp4

如果你希望新生成的視頻有聲音，那就可以在最后把有聲音的視頻中的聲音拼接到你最后產(chǎn)生的目標(biāo)視頻上即可。

云平臺部署

我們可以在 Google Cloud 中部署云平臺。具體請看視頻展示，我在這里展示幾個關(guān)鍵步驟：

(來源: How to Create DeepFakes with Google Cloud GPU Services)

最后是我在 Google Cloud 上進(jìn)行 Training 的一個截圖。

項目架構(gòu)

最后讓我們從高層理解一下整個 DeepFake 項目的架構(gòu)。

社會影響

我們已經(jīng)聊了 Deepfake 的原理，那么它到底有哪些真正的社會價值呢? 我們可以用任何人來拍攝一個電影，然后變成我們想要的任何人。我們可以創(chuàng)建更加真實的虛擬人物。穿衣購物可以更加真人模擬。

總結(jié)

我們用到了如下的技術(shù)棧、框架、平臺：

Dlib：基于 C++的機(jī)器學(xué)習(xí)算法庫 OpenCV:計算機(jī)視覺算法庫 Keras:在底層機(jī)器學(xué)習(xí)框架之上的高級 API 架構(gòu) TensorFlow:Google 開源的機(jī)器學(xué)習(xí)算法框架 CUDA:Nvidia 提供的針對 GPU 加速的開發(fā)環(huán)境
Google Cloud Platform:Google 提供的云計算服務(wù)平臺 Virtualenv:創(chuàng)建獨立的 Python 環(huán)境 FFmpeg:多媒體音視頻處理開源庫
現(xiàn)在就來上手，把你心愛的另一半的人臉搬上好萊塢吧。

本文為機(jī)器之心發(fā)布，轉(zhuǎn)載請聯(lián)系本公眾號獲得授權(quán)。