Jason Pan

Python 面试准备 01

黄杰 / 2020-07-22


import time
import signal
run_flag = threading.Event()
run_flag.set()


def signal_handler(signal, frame):
    print " get singal !!"
    run_flag.clear()


if __name__ == '__main__':
    signal.signal(signal.SIGINT, signal_handler)
    while run_flag.is_set():
        time.sleep(1)
import logging, logging.config
from logging.handlers import TimedRotatingFileHandler
def create_logger():
    logger = logging.getLogger("Rotating Log_" + area + '_' + str(index))
    logger.setLevel(logging.DEBUG)
    handler = TimedRotatingFileHandler(logname,
                                       when="H",
                                       interval=3,
                                       backupCount=24)
    formatter = logging.Formatter(
        '[%(asctime)s][%(process)d][%(levelname)s] %(message)s')
    handler.setFormatter(formatter)
    logger.addHandler(handler)
    return logger

安装不同版本的 pip

其实就是获取不同版本的 python 对应的 get-pip:

wget https://bootstrap.pypa.io/{version}/get-pip.py

比如安装对应Python 2.6的pip:

curl --silent --show-error --retry 5 https://bootstrap.pypa.io/2.6/get-pip.py | sudo python

安装不同版本 Python 所需的模块

python2 -m pip intall redis

全局解释器锁

GIL (Global Interpreter Lock)

The GIL was invented because CPython’s memory management is not thread-safe. With only one thread running at a time, CPython can rest assured there will never be race conditions.

The GIL can cause I/O-bound threads to be scheduled ahead of CPU-bound threads, and it prevents signals from being delivered.

参考资料

往现有类动态添加成员方法

背景:使用Protobuf,对象直接加入到 set() 中会报错“无法被hash”,因此需要给对应的类添加一个__hash__()函数

def task_result_hash(self):
    return hash(self.task_id + "#" + str(self.exec_time))

task_pb2.TaskResult.__hash__ = task_result_hash

数字转成二进制字符串

使用 format 转成字符串:

"{0:b}".format(37)

或者使用显式的转换:

bin(10)
hex(10)

遍历 Queue() 中的元素

全取出来的迭代,且会block掉,其实就是while-do的方式去调用queue.get

for job in iter(queue.get, None):
  print(job)

不取出来的迭代

for job in queue.queue:
  print(job)

Protobuf 相同类型无法 CopyFrom

TypeError: Parameter to MergeFrom() must be instance of same class: expected Hello_msg got Hello_msg.

可能是import了多个不同路径的pb2.py的文件

优雅的 switch 替代

def f(x):
    return {
        'a': 1,
        'b': 2
    }.get(x, 9)    # 9 is default if x not found

这个链接中的好多例子存在问题,有的是非线程安全的,有的是低效的,有的会抛异常。

Python3 字节与字符串的转换

Python 3 里边 b'abc' 就是一个 bytes 类型,unicode的串就是 str 类型,他们有如下特点:

>>> type ('abc')
<class 'str'>
>>> type (b'abc')
<class 'bytes'>
>>> type (b'abc'.decode())
<class 'str'>
>>> type(b'1'[0])
<class 'int'>

读文件读出来的内容是什么?

另外需要注意的

bytes 与 str 相互转换

b"abcde".decode()  # Python3 默认编码方式就是utf-8
u"abcde".encode()

非编码的 “字节数组“ (并非是bytes,而是 list of int) 转成字符串。

bytes_data = [112, 52, 52]
"".join(map(chr, bytes_data))

枚举类型

import enum

@enum.unique
class ChannelConnectivity(enum.Enum):
    IDLE = (_cygrpc.ConnectivityState.idle, 'idle')
    CONNECTING = (_cygrpc.ConnectivityState.connecting, 'connecting')
    READY = (_cygrpc.ConnectivityState.ready, 'ready')
    TRANSIENT_FAILURE = (_cygrpc.ConnectivityState.transient_failure,
                         'transient failure')
    SHUTDOWN = (_cygrpc.ConnectivityState.shutdown, 'shutdown')

来源: GRPC源代码

Python3 废弃 iteritems() 函数

 File "/data/common_detector/agent/main.py", line 25, in main
  for k, v in proto_supported.PROTO_DETECTOR_MAP.iteritems():
AttributeError: 'dict' object has no attribute 'iteritems'

Python 3.x 里面,iteritems()viewitems() 这两个方法都已经废除了。

在 Python 3.x 用 items() 替换 iteritems(),可以用于 for 来循环遍历。

二维的for循环产生list

[(d, p) for d in dests for p in ports]
[(x, y) for x in [1, 2] for y in [3,4]]
print "List comprehension:"
for x, y in [(x,y) for x in a for y in b]:
    print x, y

https://www.oreilly.com/library/view/python-cookbook/0596001673/ch01s15.html

类变量与类函数

class MyClass:
     i = 3

https://stackoverflow.com/questions/68645/are-static-class-variables-possible-in-python

https://zhuanlan.zhihu.com/p/28010894

异步检查端口连通性

https://gunhanoral.com/python/async-port-check/

Python 3 socket说明

https://docs.python.org/3/library/socket.html#socket.socket.connect

类型判断

isinstance(q, Queue)

Using type() instead of isinstance() for a typecheck.

type(q) is Queue

isinstancetype 的区别是:

举例,假设 NewQueue 继承自 queue.Queue

q = NewQueue()
print(type(q) is Queue)  # False
print(isinstance(q, Queue))  # True

这个也很好理解,type() 只能返回一个值,所以只能给返回具体的类型;而 isinstance() 是传入一个候选值,可以比那里对象父类类型进行判断。

Py协程与Go协程的区别

https://segmentfault.com/a/1190000021250088

异步事件库的比较

简单对比 Libevent、libev、libuv

uvloop与asyncio的比较

Python 3依赖库的一些问题

ModuleNotFoundError: No module named ‘_ctypes’

安装 libffi 开发包

yum install libffi-devel -y

重新编译安装 Python

make && make install

线程没有 join() 引起的问题

RuntimeError: can’t register atexit after shutdown

主线程没有 join() 等待子线程退出。

Pip安装与删除

pip uninstall sacc_tool_pkg

自定义安装包

https://packaging.python.org/tutorials/packaging-projects/

python3 setup.py sdist bdist_wheel

官方说明文档

https://docs.python.org/zh-cn/3/tutorial/index.html

实现单例

https://www.cnblogs.com/huchong/p/8244279.html

编程规范

python3 -m pylint

google 编程规范中要求避免使用全局变量。

模块级别的常量是允许和被鼓励的,需要使用_的前缀。

来源:google python 编程规范

过长字符串换行:

pic_url = ("http://graph.facebook.com/{0}/"
           "picture?width=100&height=100".format(me['id']))

要加过括号,要不然会当成两个语句

Pylint’s “Too few public methods” message mean: 类不能只作为数据结构储存变量,还得提供几个公共方法。

Pylint 加项目根文件夹,可以检查整个项目

Python bindings: 运行C/C++代码

https://realpython.com/python-bindings-overview/

解析 pcap 的库

import dpkt

https://github.com/kbandla/dpkt

周期性执行任务

使用 sched 模块,调度定时任务:

enter 函数除了可以设置延迟时间和优先级之外,可以传入函数需要的参数。

def _period_func_wrapper(func, scheduler, delay, priority, *args):
    scheduler.enter(delay, priority, _period_func_wrapper, (func, scheduler, delay, priority, *args))
    func(*args)


def run_process_per_minute(func, *args):
    s = sched.scheduler(time.time, time.sleep)
    _period_func_wrapper(func, s, 60, 1, *args)
    s.run()

Pylint 字母说明:

[x for x in L if x is not None]

https://stackoverflow.com/questions/54867/what-is-the-difference-between-old-style-and-new-style-classes-in-python

https://stackoverflow.com/questions/34619790/pylint-message-logging-format-interpolation

W: 50,15: Catching too general exception Exception (broad-except) W: 56,12: return statement in finally block may swallow exception (lost-exception) W:227,15: Catching too general exception Exception (broad-except) W: 10, 0: Unused import ConfigParser (unused-import)

创建临时文件:

import tempfile
_, temp_file_path = tempfile.mkstemp()
print("File path: " + temp_file_path)
os.remove
os.rmdir

CodeCC 修复记录

Pylint

Final newline missing

Used when the last line in a file is missing a newline.tencent standards/python 1.6.3

No exception type(s) specified

Used when an except clause doesn't specify exceptions type to catch.tencent standards/python 2.4.3
try:
    return read_config()['global']['appid']
except KeyError:
    return ""

Module ‘sys’ has no ‘_MEIPASS’ member

Used when a variable is accessed for an unexistent member.
if getattr(sys, 'frozen', False):
    base_dir = sys._MEIPASS
if getattr(sys, 'frozen', False) and hasattr(sys, '_MEIPASS'):
    base_dir = sys._MEIPASS

Wildcard import sacc.core

Used when from module import * is detected.

When an import statement in the pattern of from MODULE import * is used it may become difficult for a Python validator to detect undefined names in the program that imported the module. Furthermore, as a general best practice, import statements should be as specific as possible and should only import what they need.

Constant name “base_dir” doesn’t conform to UPPER_CASE naming style

Used when the name doesn't match the regular expression associated to its type (constant, variable, class...). tencent standards/python 1.18.1

执行系统指令

import os
stream = os.popen('free -b -t -w')
output = stream.read()
#print(output)


```bash
python2 -m pylint --msg-template='{msg_id}:{line:3d},{column}: {obj}: {msg}' express_check.py > pylint.out

CSV 首行作为key构造字典

利用到next,取出第一次迭代输出作为key,之后继续迭代构造字典。

csv = open("titanic.txt")
keys = next(csv).strip().split(",")
print([{k: v for k, v in zip(keys, row.strip().split(","))} for row in csv])
csv.close()

https://stackoverflow.com/questions/14503973/python-global-keyword-vs-pylint-w0603

REPL

https://zh.wikipedia.org/wiki/%E8%AF%BB%E5%8F%96%EF%B9%A3%E6%B1%82%E5%80%BC%EF%B9%A3%E8%BE%93%E5%87%BA%E5%BE%AA%E7%8E%AF

https://stackoverflow.com/a/5599313/3801587

When Python detects the “exec statement”, it will force Python to switch local storage from array to dictionary. However since “exec” is a function in Python 3.x, the compiler cannot make this distinction since the user could have done something like “exec = 123”.

无法在python3 中动态的赋值变量

# 安装 pip
wget https://bootstrap.pypa.io/pip/2.7/get-pip.py
python get-pip.py --user

# 更新 setuptools 解决下边链接中提到的问题
# > 'install_requires' must be a string or list ...
# https://github.com/sdispater/pendulum/issues/187#issuecomment-375820769
python -m pip install setuptools -U --user

# 安装 requests
git clone git://github.com/psf/requests.git
cd requests/
python setup.py install --user
print "Zip:"
for x, y in zip(a, b):
    print x, y
print "List comprehension:"
for x, y in [(x,y) for x in a for y in b]:
    print x, y

python2 -m pip install –upgrade pip

pylint –disable=R,C x.py

[sum(x) for x in zip(*blocked_lines)]

https://stackoverflow.com/questions/3279560/reverse-colormap-in-matplotlib

一个不容易发现的错误

queue_name += '_QUEUE_NORMAL',

https://stackoverflow.com/questions/3279560/reverse-colormap-in-matplotlib

常用的目录操作函数

os.listdir("/usr")

pyenv

https://github.com/pyenv/pyenv

https://stackoverflow.com/questions/5067604/determine-function-name-from-within-that-function-without-using-traceback

获取当前文件所在的绝对路径

获取调用者的文件名和行号

获取当前语句的行号

获取当前所在函数名

获取调用者的函数名

Code Review 的若干问题记录

  1. 没有必要的类的封装

从上下文来看,这个类只有一个函数、一个写死的URL,完全没有存在的必要。

  1. URL/密码 硬编码写入文件

  2. try 的块太大 该异常只可能是try中第一行抛出,之后的逻辑块应该挪出

  3. 无意义的 raise

如果存在 finally,它将指定一个‘清理’处理程序。 try 子句会被执行,包括任何 except 和 else 子句。 如果在这些子句中发生任何未处理的异常,该异常会被临时保存。 如果 finally 子句执行了 return, break 或 continue 语句,则被保存的异常会被丢弃。

try:
    ...
except requests.exceptions.RequestException as e
    logger.error(traceback.format_exc())
    raise RequestDataError('API Request failed {}'.format(str(e))
finally:
    return user_label
  1. 没有用到的成员变量

  2. 函数名和变量无法正确的表达含义

    def check_redis(self, redis_key, input_text)
        """判断缓存中是否有模板"""
        ...

    def check_once(self, username, input_text):
        """判断是否要取消订阅和是否是帮助"""
        ...
  1. 编码规范
    if tmp["by_relation"] in [u'业务属于', u'维护人']:
  1. 部分语句过于啰嗦
if xx is not None:
  pass
answer = answer + data["opinion_type"]
if nick["Workspace"].get("status") is not None and nick["Workspace"]["status"] == "normal":
for i in range(len(t)):
    t[i].start()
record_num, record_data, user_list = t.record_false_job_result(user_id, biz_id)
return record_num, record_data, user_list
if 'start_timestamp' not in req.keys() or 'end_timestamp' not in req.keys():
  1. 对后台延迟不敏感

这里循环判断获取Redis连接判断key是否存在,会增加不少响应延迟

for ccid in ccid_list:
    is_exist, cash_dict = GetRedis().get_reids_key_exist(redis_key, warn_text
  1. 始终为 True 的分支
answer = self.no_match.get_text_answer(input_text)
ans_list = answer.split(u".")
if answer != []:
  ...
  1. 类设计不合理

基类 Record 的抽象只有更新时间,而其派生类中有大量共同的信息。 比如self.game_id = user_account.game_id if user_account else None等4语句出现在之后6个派生类中,且__init__() 均接受一个user_account参数,这些重复代码可以通过继承一个中间类来解决。

class RecordItem(object):
    def __init__(self):
        self.update_time = 0
 
 
class JudgeUserCoreRecord(RecordItem):
    def __init__(self, user_account=None):
        super(JudgeUserCoreRecord, self).__init__()
        # 核心用户信息
        self.game_id = user_account.game_id if user_account else Non
        self.user_id = user_account.user_id if user_account else None
        self.account_type = user_account.account_type if user_account else None
        self.account_info = user_account.account_info if user_account else None
  1. 更优雅的表达 if xx is None: xx = “xxx”
xx = xx or "xxx"
evidence_type = evidence_type or ''  # 处理None值
  1. 类的封装缺乏层次
class CaseCoreRecord(CaseRecordItem):
    def __init__(self):
        super(CaseCoreRecord, self).__init__()
        self.demandant_id = None
        self.defendant_id = None
        self.demandant_role = None
        self.defendant_role = None
        self.demandant_uid = None
        self.defendant_uid = None
        self.demandant_account_type = None
        self.defendant_account_type = None
        self.demandant_account_info = None
        self.defendant_account_info = None
  1. 条件判断
return False if resp == True else True
records = filter(lambda x: not xxx(x), records)

可以简化为

records = [for x i records if not xxx(x)]
  1. 计算量较大的重复语句

if 语句中使用了计算量相对较大的语句,而这个在之后是需要用的,在if条件满足之后又会计算一遍。影响性能也不利于可读性。

 if old_type_patt.match(line):
     old_type = int(old_type_patt.match(line).group(1).strip()
     if is_free_punish_patt.match(line):
if not limit.get('user_msg') or limit['user_msg'] == 'null':

可以简化为

if limit.get('user_msg', 'null) == 'null':
        md5 = evidence[evidence.find('md5'):].strip().split('&')[0].split('=')[1]

本段代码中实际是为了匹配 r".\bmd5=(xxx)[&$]." 这种情况,代码很不清晰,而且实际是会匹配到错误的字段的。

本函数中其他代码亦有此问题,滥用try

学到的: count() 方法用于统计某个元素在列表中出现的次数。

@property

將 class (類) 的方法轉換為 只能讀取的 屬性

class User:
    @property
    def password(self):
        raise AttributeError('password is not readable attribute')

    @password.setter
    def password(self, password):
        self.password_hash = generate_password_hash(password)

    def verify_password(self, password):
        return check_password_hash(self.password_hash, password)

应用场景:可以设置,不能读取

super(JudgeUserCoreInfo, self)

设计类与数据关联,

class JudgeUserSeasonRecord(RecordItem):
 
    def __init__(self):
        super(JudgeUserSeasonRecord, self).__init__()
        self.season_right_punish_case_num = 0
        self.season_total_case_num = 0
        self.season_right_case_num = 0
        self.season_wrong_case_num = 0
        self.season_accuracy = 0
 
    @staticmethod
    def init_value(record):
        record.season_accuracy = -1
        return record

record = get_cache_record(cache_record_cls=JudgeUserSeasonRecord)
if record is None:
    record = JudgeUserSeasonRecord.init_value(JudgeUserSeasonRecord())

class Cache(kv_client.KVJSONObject):
    """基于kv存储实现cache
 
    屏蔽kv存储中value的json格式,支持将类实例作为入参,并将实例中的属性字段存入cache中
    读取时,将cache中的内容作为类实例的属性进行存储
 
    """
    _KV_PREFIX = "gs:1029:court:cache:"
    _CACHE_FIELD = 'cache_info'
 
    def __init__(self, game_id, user_id, extra_prefix=None):
        """初始化cache
        :param game_id: 游戏id
        :param user_id: 用户id
        :param extra_prefix: 业务层指定的key前缀
        """
        self.user_id = user_id
        self.game_id = game_id
        key_prefix = self._KV_PREFIX + get_cache_env_prefix()
        if extra_prefix is not None:
            key_prefix += extra_prefix
        super(Cache, self).__init__("court", str(game_id) + '_' + str(user_id), key_prefix)
 
    @property
    def cache_info(self):
        """获取cache info信息"""
        if not self.value or not self.value.get(self._CACHE_FIELD):
            return None
        return self.value[self._CACHE_FIELD]
 
    def update_cache_info(self, cache_record, expire_sec):
        """更新cache内容
        :param cache_record: 需要写入cache的记录,需要具备__dict__属性, 对于属性为None的字段不进行更新
        :param expire_sec: 过期时间,单位为秒
        :return: None
        """
        if not self.value:
            self.value = {}
        if self._CACHE_FIELD not in self.value:
            self.value[self._CACHE_FIELD] = {}
        cache_info_dict = self.value[self._CACHE_FIELD]
        for key, val in cache_record.__dict__.items():
            if val is None:
                continue
            cache_info_dict[key] = val
        self.save_value(expire=expire_sec)
 
    def get_cache_record(self, cache_record_cls):
        """获取cache中的记录
 
        读取cache中的内容,然后存入cache_record_cls对应的实例中
        存储的字段通过cached_record_cls包含的属性进行限定,如果
        cached)_record_cls中所需的字段在cache中不存在,则用None填充
 
        :param cache_record_cls: 需要将cache中的内容写入类实例记录类型
        :return: cache_record_cls对应的实例
        """
        cache_info_dict = self.cache_info
        if cache_info_dict is None:
            return None
        cache_record = cache_record_cls()
        for key in cache_record.__dict__:
            setattr(cache_record, key, cache_info_dict.get(key, None))
        return cache_record

使用 super() 的好处

  1. 如果只有单一继承关系,super() 可以使代码更可维护,比如修改的基类名的时候,只需要修改首航定义中的基类即可,而其余部分已经用 super() 代替的则可以不用修改。
  2. 同时继承多个基类以及存在多层继承关系时,可以一次性调用所有基类的指定函数
  3. 当存在菱形继承的时候,能够确保每个类的指定方法只调用一次
  4. 在类C 中,super() 等价于 super(C, self),而如果要针对指定类调用其所有父类的指定函数,则可以使用 super(ParentType, self)
  5. 如果多个基类的指定函数参数不同,会如何处理?
class A(object):
    def __init__(self):
        super().__init__()
        print('A')


class B(A):
    def __init__(self):
        super().__init__()
        print('B')


class C(A):
    def __init__(self):
        super().__init__()
        print('C')


class D(C, B):
    def __init__(self):
        super().__init__()
        print('D')


class E(D):
    def __init__(self):
        super(C, self).__init__()
        print('E')


d = D()
print('###')
e = E()

https://stackoverflow.com/questions/21639788/difference-between-super-and-calling-superclass-directly

@PiyushKansal Inside a class C, super() does the same thing as super(C, self). Python 3 introduced this shorcut that you don’t need to specify the parameters for the default cases. In most cases (around 99%) you want to use just super(). By passing a different type to it, e.g. super(ParentType, self), you would be skipping types in the MRO, so that’s probably not what you want to do.

redis-py 中的pipeline支持多条命令一次性提交,可以选择是否以事务方式进行

kv_ins = kv_client.RedisClient.get_instance(name)
# 批量操作不使用事务属性(互娱分布式redis目前不支持事务)
pipe = kv_ins.client.pipeline(False)
for incr_item in incr_list:
    inc_key = incr_item['inc_key']
    delta_num = incr_item['delta_num']
    expire = incr_item['expire']
    pipe_key = inc_key
    app_log.info('incr key:%s|%s|%s', pipe_key, delta_num, expire)
    pipe.incrby(name=pipe_key, amount=delta_num)
    expire = None if int(expire) < 0 else expire
    if expire is not None:
        pipe.expire(name=pipe_key, time=expire)
pipe.execute()

参考资料:https://github.com/redis/redis-py#pipelines

使用 uuid 库生成唯一 ID

https://docs.python.org/3/library/uuid.html

uuid.uuid4() Generate a random UUID.

异常处理中抛出原来捕获的异常

except:
    processing ...
    raise

协程

https://www.aeracode.org/2018/02/19/python-async-simplified/

https://docs.python.org/zh-cn/3/library/asyncio-task.html

https://docs.python.org/zh-cn/3/library/asyncio-eventloop.html#creating-futures-and-tasks

简单地调用一个协程并不会使其被调度执行。比如:

async def main(): ...
main() # 不能运行

运行只能通过 asyncio.run(main()) 等方式

        for i in res_list:
            rule = i['name']
            for res in i["result"]:
                metric = res["metric"]
                for value in res["values"]:
                    timestamp = value[0]
                    val = value[1]
                    utc_date = self.pktime_to_utc(timestamp)
                    point_list.append(Point(activity_name) \
                        .tag("metric", metric) \
                        .field(rule, val) \
                        .time(utc_date, WritePrecision.S))

如何更优雅的转换上边这段代码

    def __init__(self, config_file, filter_chain=[]):
        ...

为什么需要修改成下边这样?

    def __init__(self, config_file, filter_chain=None):
        # TODO validate config file format
        if filter_chain is None:
            filter_chain = []
        ...

模块和包

值传递与引用传递

As we know, in Python, “Object references are passed by value”.

listA = [0]
listB = listA
listB.append(1)
print(listA)

https://robertheaton.com/2014/02/09/pythons-pass-by-object-reference-as-explained-by-philip-k-dick/

自定义工具包

Python Package Index (PyPI).

https://packaging.python.org/en/latest/tutorials/packaging-projects/

https://github.com/panzhongxian/markdown_image_replacer

https://python-packaging-tutorial.readthedocs.io/en/latest/setup_py.html

https://www.freecodecamp.org/news/how-to-create-and-upload-your-first-python-package-to-pypi/

https://towardsdatascience.com/how-to-upload-your-python-package-to-pypi-de1b363a1b3

python3 -m twine upload –repository testpypi dist/*