2016微软苏州SDE实习生面试

今天下午参加了微软的远程面试,在这里记录一下当作是面经吧。

在线笔试

4月6号晚上参加了在线笔试,总人数将近6000人,跟往年一样还是4道题。第一题毫无意外是水题,题目描述如下:

Steven loves reading book on his phone. The book he reads now consists of N paragraphs and the i-th paragraph contains ai characters.

Steven wants to make the characters easier to read, so he decides to increase the font size of characters. But the size of Steven’s phone screen is limited. Its width is W and height is H. As a result, if the font size of characters is S then it can only show ⌊W / S⌋ characters in a line and ⌊H / S⌋ lines in a page. (⌊x⌋ is the largest integer no more than x)

So here’s the question, if Steven wants to control the number of pages no more than P, what’s the maximum font size he can set? Note that paragraphs must start in a new line and there is no empty line between paragraphs.

这道题基本上属于阅读理解题,先假设段落之间不换行的情况下算出字号的上界,然后再不断减小字号,判断是否符合题目要求。碰到第一个符合要求的就是最大的字号了。

第一题最后有2000多人AC,基本上属于送分题。然后接下来的3题就开始拉差距了。

第二题类似于实现iptables的检测功能,题目描述如下:

Little Hi runs a web server. Sometimes he has to deny access from a certain set of malicious IP addresses while his friends are still allow to access his server. To do this he writes N rules in the configuration file which look like:

allow 1.2.3.4/30
deny 1.1.1.1
allow 127.0.0.1
allow 123.234.12.23/3
deny 0.0.0.0/0
Each rule is in the form: allow | deny address or allow | deny address/mask.

When there comes a request, the rules are checked in sequence until the first match is found. If no rule is matched the request will be allowed. Rule and request are matched if the request address is the same as the rule address or they share the same first mask digits when both written as 32bit binary number.

For example IP “1.2.3.4” matches rule “allow 1.2.3.4” because the addresses are the same. And IP “128.127.8.125” matches rule “deny 128.127.4.100/20” because 10000000011111110000010001100100 (128.127.4.100 as binary number) shares the first 20 (mask) digits with 10000000011111110000100001111101 (128.127.8.125 as binary number).

Now comes M access requests. Given their IP addresses, your task is to find out which ones are allowed and which ones are denied.

输入和数据范围是:

Line 1: two integers N and M.
Line 2-N+1: one rule on each line.
Line N+2-N+M+1: one IP address on each line.
All addresses are IPv4 addresses(0.0.0.0 - 255.255.255.255). 0 <= mask <= 32.
For 40% of the data: 1 <= N, M <= 1000.
For 100% of the data: 1 <= N, M <= 100000.

看到60%的数据是比较大规模的时候我就知道坑来了,估计$O(n^2)$的算法是不能AC的,应该要做出$O(nlogn)$的才行。但是这也没办法啊,刷题刷得太少,一下子没想到最优的算法,只好先硬着头皮写$O(n^2)$的解法。最naive的思路就是每读入一个IP,就从规则列表中从前往后匹配,如果匹配到了,就根据规则返回结果。如果全部都没匹配到,就返回deny。提交代码之后,果然TLE了,不过幸好hiloCoder上还会按照通过的样例数量给分,前面40%的小数据用$O(n^2)$的算法没有问题,拿到了40分。笔试结束之后,跟同学讨论就发现这其实就是Trie树的经典题目。看来还是要努力提高姿势水平啊!

第三题是机器人走迷宫的题目,题目描述如下:

You work as an intern at a robotics startup. Today is your company’s demo day. During the demo your company’s robot will be put in a maze and without any information about the maze, it should be able to find a way out.

The maze consists of N * M grids. Each grid is either empty(represented by ‘.’) or blocked by an obstacle(represented by ‘b’). The robot will be release at the top left corner and the exit is at the bottom right corner.

Unfortunately some sensors on the robot go crazy just before the demo starts. As a result, the robot can only repeats two operations alternatively: keep moving to the right until it can’t and keep moving to the bottom until it can’t. At the beginning, the robot keeps moving to the right.

rrrrbb..
…r…. ====> The robot route with broken sensors is marked by ‘r’.
…rrb..
…bb…

While the FTEs(full-time employees) are busy working on the sensors, you try to save the demo day by rearranging the maze in such a way that even with the broken sensors the robot can reach the exit successfully. You can change a grid from empty to blocked and vice versa. So as not to arouse suspision, you want to change as few grids as possible. What is the mininum number?

大致题意就是机器人突然坏了,只能向右走一直到走不动为止,或者向下走一直到走不动为止。在迷宫原有的墙壁设置当中,你需要最少改变多少块墙壁的设置,才能让机器人从左上角走到右下角。

看到这个描述,脑子里觉得这应该是DP的题目,然而还是因为姿势水平不够高,到最后都没想出来DP的递推式……

最后一题的题意是在MineCraft这种游戏中搭建积木,根据输入的积木坐标判断这种搭建顺序是否合法。主要的限制条件就是:

  1. 新的积木要么放在平地上,要么放在已有的积木上,不能出现悬空的情况。
  2. 新的积木放置的位置必须要从无穷远处可以直接看到,不能出现新的积木塞进一个空的立方体内这种情况。

原文描述我也贴一下:

Little Hi is playing a sandbox voxel game. In the game the whole world is constructed by massive 1x1x1 cubes. The edges of cubes are parallel to the coordinate axes and the coordinates (x, y, z) of the center of each cube are integers.

At the beginning there is nothing but plane ground in the world. The ground consists of all the cubes of z=0. Little Hi needs to build everything by placing cubes one by one following the rules:

  1. The newly placed cube must be adjacent to the ground or a previously placed cube. Two cubes are adjacent if and only if they share a same face.
  1. The newly placed cube must be accessible from outside which means by moving in 6 directions(up, down, left, right, forward, backward) there is a path from a very far place - say (1000, 1000, 1000) in this problem - to this cube without passing through ground or other cubes.

Given a sequence of cubes Little Hi wants to know if he can build the world by placing the cubes in such order.

这道题是彻底没有头绪,因为以前做过的几何题太少了,完全不知道几何题常用的那些算法。

笔试结束之后,第一题AC了2000多人,第二题AC了200多人,第三题AC的比第二题还多,大约是300多人,最后一题只AC了几十人。算了一下第二题和第三题AC的人加起来都600多人了,我估计自己拿140分应该是没戏了,就当是来学习学习吧。

面试

就在完全没抱有能够面试的想法的前提下,4月17号晚上竟然收到面试通知邮件了,而且是在QQ邮箱的垃圾邮件里面找到的。幸好我还有检查垃圾邮件的习惯,不然真的坑死爹了。

面试初定的时间是4月26号早上9点,但是平常这个时间都还没回到实验室,所以就回复了邮件希望改一下面试时间。HR第二天就回复了我说面试时间调整到了4月26号下午2点。

微软的面试在自己家的Skype for Business平台上面进行,所以可以很方便地一边问答,一边写代码。

今天下午2点,大约等了10分钟,一面的面试官加入到在线会议的房间开始面试。基本自我介绍之后,面试官说他其实也是中大毕业的,能碰到师兄实在是令人放松了一些。他问了一下我是哪个老师的学生之后,竟然还知道吴迪老师的学生是在513实验室,看来导师的知名度很高呢!聊完前面的基本情况,按照套路就该写代码了。一面就只写了一道题——“判断字符串是否为回文串”。字符串由ASCII码表内的字符构成,判断的时候仅考虑a-z、A-Z和0-9的字符,其他字符一律跳过,而且判断的时候不区分大小写。这是标准的LeetCode原题,而且也不难,要是做不出来就真的没前途了。不过判断输入是否为空和字符串是否为空串这些细节还是要注意的。写完之后,再问了一下面试官的部门,知道了他是Office 365小组的。他还大致介绍了一下微软苏州的部门,主要就是Bing、Office 365和SharePoint。

一面结束之后等了大概半个小时,二面的面试官就进来面试了。开始的时候还是自我介绍,顺便介绍了一下联通的交往圈划分项目和分布式课程的Spark大作业。关于联通的项目,面试官主要问了我们是如何解决虚拟世界中的联系和实际生活中的联系的对应关系,因为现在很多人都用微信等社交工具联系,通话记录并不等同实际生活中的联系;还问了一下我们怎么保证画出来的圈子不受某些突发事件的影响。这两个问题其实在项目前期已经跟联通讨论过好几次,我回答说,因为只有通话记录,所以我们的想法是尽可能准确地根据这部分数据划分出圈子,然后联通方面会根据URL等项目来对我们的圈子进行优化。而避免突发事件影响圈子的稳定性则是通过每一个月计算前三个月的数据的这种滑动窗口的方法来保证。聊完这些,就又该开始codingle。第一题又是LeetCode上的原题,输入一个数字,返回在Excel对应列显示的字符串。例如输入1,那么Excel中第一列显示为A列;输入26,显示为Z列;输入27,就显示为AA列;如此类推。做起来也是很容易,不断取余生成低位的字符,跟原有的低位字符拼接,然后x /= 26直到x为0就退出循环。

第二题其实也是比较常见的,输入一棵二叉搜索树,转换为排序的双向链表。之前在《剑指Offer》上面看过这道题,不过记得不是很清楚了,还是得现场再思考一次。主要思路就是利用中序遍历的方法,当我们遍历到根节点的时候,它的左子树已经转换成排好序的链表,并且最后一个节点是当前最大的节点。那么把链表最后一个节点和根节点连接起来,然后再去遍历右子树,再把根节点与右子树中最小的节点连接起来。最简便的方式自然是用递归来实现。不过我写的版本好像有点绕,面试官看了之后说感觉好像有问题,但是又说不出来是哪里有问题,我自己解释了一下把自己也绕进去了。以后还是得多注意一下。

第三题是一道数学题,假设你有三个骰子,你从第一个骰子开始掷,掷到一个数字之后,你可以选择这个数字作为最后结果,也可以放弃这个数字,继续掷第二个骰子。同理继续下去。问:你能获得的点数的数学期望是多少(好像是问这个,记得不太清楚了……)

想了一下之后没想到怎么算,面试官看时间也不多了,就开始简化问题然后引导我来思考和回答。当然在掷到1的时候,你肯定会继续掷。掷到6的时候,你肯定就停止了。问题主要是在掷到2~5的时候要怎么选择。分析了一下之后,发现影响接下来的决策的因素主要有两个:

  1. 当前掷到的点数
  2. 剩下可以掷的骰子数量

面试官说,那其实你就是要实现一个程序,输入当前的这些状态,然后输出是否应该继续掷骰子。面试官问到这里的时候就说差不多了,希望我自己之后也能够继续思考一下这个问题,并且用程序实现。他还提醒说计算概率的时候也有比较多坑,希望我能注意一下。

最后,就是让我提几个问题。我问了一下面试官的部门,原来他还是Office 365部门的,不过做的主要是Foundation Team的后台部分。然后又问了一下在MS实习是不是主要都用的是微软的那一套东西来做开发,面试官说其实不少部分也是要用Linux/Unix,或者做APP开发的要做iOS和Android的,并不是全都是C#、VS这一套,甚至有些时候需要自己实现一门语言。

面试到这里基本上告一段落,接下来就看看能不能继续往前进一步咯~